Paper Digest: ICML 2023 Highlights

June 27, 2023July 3, 2023 admin

To search or review papers within ICML-2023 related to a specific topic, please use the search by venue and review by venue services. To browse papers by author, here is a list of top authors (ICML-2023). You may also like to explore our “Best Paper” Digest (ICML), which lists the most influential ICML papers since 2004.

The Internationl Conference on Machine Learning (ICML) is one of the top machine learning conferences in the world. In 2023, it is to be held in Hawaii, US.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

Based in New York, Paper Digest is dedicated to helping people generate contents & reason over unstructured data. Different from black-box approaches, we build deep models on semantics, which allows results to be produced with explainations. Such models power this website, and are behind our services including “search engine”, “summarization”, “question answering”, and “literature review”.

If you do not want to miss interesting academic papers, you are welcome to sign up our daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
New York City, New York, 10017
team@paperdigest.org

TABLE 1: Paper Digest: ICML 2023 Highlights

	Paper	Author(s)
1	Bayesian Estimation of Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel, more efficient Bayesian approach that brings privacy estimates within the reach of practitioners.	Santiago Zanella-Beguelin; Lukas Wutschitz; Shruti Tople; Ahmed Salem; Victor Rühle; Andrew Paverd; Mohammad Naseri; Boris Köpf; Daniel Jones;
2	Adaptive Estimation of Graphical Models Under Total Positivity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an adaptive multiple-stage estimation method, which refines the estimate by solving a weighted $\ell_1$-regularized problem in each stage.	Jiaxi Ying; José Vinícius De Miranda Cardoso; Daniel P. Palomar;
3	GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a distributed, GPU-centric experience replay system, GEAR, designed to perform scalable reinforcement learning (RL) with large sequence models (such as transformers).	Hanjing Wang; Man-Kit Sit; Congjie He; Ying Wen; Weinan Zhang; Jun Wang; Yaodong Yang; Luo Mai;
4	Disentangled Multi-Fidelity Deep Bayesian Active Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel framework called Disentangled Multi-fidelity Deep Bayesian Active Learning (D-MFDAL), which learns the surrogate models conditioned on the distribution of functions at multiple fidelities.	Dongxia Wu; Ruijia Niu; Matteo Chinazzi; Yian Ma; Rose Yu;
5	Understand and Modularize Generator Optimization in ELECTRA-style Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the effectiveness of ELECTRA-style pre-training, their performance is dependent on the careful selection of the model size for the auxiliary generator, leading to high trial-and-error costs. In this paper, we present the first systematic study of this problem.	Chengyu Dong; Liyuan Liu; Hao Cheng; Jingbo Shang; Jianfeng Gao; Xiaodong Liu;
6	NeRFool: Uncovering The Vulnerability of Generalizable Neural Radiance Fields Against Adversarial Perturbations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we present NeRFool, which to the best of our knowledge is the first work that sets out to understand the adversarial robustness of GNeRF.	Yonggan Fu; Ye Yuan; Souvik Kundu; Shang Wu; Shunyao Zhang; Celine Lin;
7	Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study representation learning in partially observable Markov Decision Processes (POMDPs), where the agent learns a decoder function that maps a series of high-dimensional raw observations to a compact representation and uses it for more efficient exploration and planning.	Jiacheng Guo; Zihao Li; Huazheng Wang; Mengdi Wang; Zhuoran Yang; Xuezhou Zhang;
8	Block Subsampled Randomized Hadamard Transform for Nyström Approximation on Distributed Architectures Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This article introduces a novel structured random matrix composed blockwise from subsampled randomized Hadamard transforms (SRHTs).	Oleg Balabanov; Matthias Beaupère; Laura Grigori; Victor Lederer;
9	Unconstrained Online Learning with Unbounded Losses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a new setting for online learning with unbounded domains and non-Lipschitz losses.	Andrew Jacobsen; Ashok Cutkosky;
10	Optimistic Planning By Regularized Dynamic Programming Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new method for optimistic planning in infinite-horizon discounted Markov decision processes based on the idea of adding regularization to the updates of an otherwise standard approximate value iteration procedure.	Antoine Moulin; Gergely Neu;
11	Autoregressive Diffusion Model for Graph Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an autoregressive diffusion* model for graph generation.*	Lingkai Kong; Jiaming Cui; Haotian Sun; Yuchen Zhuang; B. Aditya Prakash; Chao Zhang;
12	Differentiable Tree Operations Promote Compositional Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In the context of structure-to-structure transformation tasks, learning sequences of discrete symbolic operations poses significant challenges due to their non-differentiability. To facilitate the learning of these symbolic sequences, we introduce a differentiable tree interpreter that compiles high-level symbolic tree operations into subsymbolic matrix operations on tensors.	Paul Soulos; Edward J Hu; Kate McCurdy; Yunmo Chen; Roland Fernandez; Paul Smolensky; Jianfeng Gao;
13	Can Neural Network Memorization Be Localized? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that rather than being confined to individual layers, memorization is a phenomenon confined to a small set of neurons in various layers of the model.	Pratyush Maini; Michael Curtis Mozer; Hanie Sedghi; Zachary Chase Lipton; J Zico Kolter; Chiyuan Zhang;
14	Domain Adaptation for Time Series Under Feature and Label Shifts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present RAINCOAT, the first model for both closed-set and universal domain adaptation on complex time series.	Huan He; Owen Queen; Teddy Koker; Consuelo Cuevas; Theodoros Tsiligkaridis; Marinka Zitnik;
15	Towards Sustainable Learning: Coresets for Data-efficient Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve the efficiency and sustainability of learning deep models, we propose CREST, the first scalable framework with rigorous theoretical guarantees to identify the most valuable examples for training non-convex models, particularly deep networks.	Yu Yang; Kang Hao; Baharan Mirzasoleiman;
16	On Enhancing Expressive Power Via Compositions of Single Fixed-Size ReLU Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper explores the expressive power of deep neural networks through the framework of function compositions.	Shijun Zhang; Jianfeng Lu; Hongkai Zhao;
17	Contextual Reliability: When Different Features Matter in Different Contexts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formalize a new setting called contextual reliability which accounts for the fact that the right features to use may vary depending on the context.Our work theoretically and empirically demonstrates the advantages of ENP over existing methods and provides new benchmarks for contextual reliability.	Gaurav Rohit Ghosal; Amrith Setlur; Daniel S. Brown; Anca Dragan; Aditi Raghunathan;
18	On Data Manifolds Entailed By Structural Causal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we characterize the data manifolds entailed by structural causal models.	Ricardo Dominguez-Olmedo; Amir-Hossein Karimi; Georgios Arvanitidis; Bernhard Schölkopf;
19	Are Neurons Actually Collapsed? On The Fine-Grained Structure in Neural Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent work has observed an intriguing Neural Collapse” phenomenon in well-trained neural networks, where the last-layer representations of training samples with the same label collapse into each other. This appears to suggest that the last-layer representations are completely determined by the labels, and do not depend on the intrinsic structure of input distribution. We provide evidence that this is not a complete description, and that the apparent collapse hides important fine-grained structure in the representations.	Yongyi Yang; Jacob Steinhardt; Wei Hu;
20	Fast Sampling of Diffusion Models Via Operator Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we use neural operators, an efficient method to solve the probability flow differential equations, to accelerate the sampling process of diffusion models.	Hongkai Zheng; Weili Nie; Arash Vahdat; Kamyar Azizzadenesheli; Anima Anandkumar;
21	Unsupervised Out-of-Distribution Detection with Diffusion Inpainting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unsupervised out-of-distribution detection (OOD) seeks to identify out-of-domain data by learning only from unlabeled in-domain data. We present a novel approach for this task — Lift, Map, Detect (LMD) — that leverages recent advancement in diffusion models.	Zhenzhen Liu; Jin Peng Zhou; Yufan Wang; Kilian Q Weinberger;
22	Sequence Modeling with Multiresolution Convolutional Memory Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We instead take inspiration from wavelet-based multiresolution analysis to define a new building block for sequence modeling, which we call a MultiresLayer.	Jiaxin Shi; Ke Alexander Wang; Emily Fox;
23	The Hessian Perspective Into The Nature of Convolutional Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove tight upper bounds (with linear activations), which closely follow the empirical trend of the Hessian rank and in practice also hold for more general settings.	Sidak Pal Singh; Thomas Hofmann; Bernhard Schölkopf;
24	Towards Learning Geometric Eigen-Lengths Crucial for Fitting Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it remains obscure and underexplored if learning systems can be equipped with similar capabilities of automatically discovering such key geometric quantities from doing tasks. In this work, we therefore for the first time formulate and propose a novel learning problem on this question and set up a benchmark suite including tasks, data, and evaluation metrics for studying the problem.	Yijia Weng; Kaichun Mo; Ruoxi Shi; Yanchao Yang; Leonidas Guibas;
25	Improved Learning-Augmented Algorithms for The Multi-Option Ski Rental Problem Via Best-Possible Competitive Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present improved learning-augmented algorithms for the multi-option ski rental problem.	Yongho Shin; Changyeol Lee; Gukryeol Lee; Hyung-Chan An;
26	On Regularization and Inference with Label Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we compare two common strategies for encoding label constraints in a machine learning pipeline, regularization with constraints and constrained inference, by quantifying their impact on model performance.	Kaifu Wang; Hangfeng He; Tin D. Nguyen; Piyush Kumar; Dan Roth;
27	Simple Disentanglement of Style and Content in Visual Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a simple post-processing framework to disentangle content and style in learned representations from pre-trained vision models.	Lilian Ngweta; Subha Maity; Alex Gittens; Yuekai Sun; Mikhail Yurochkin;
28	Beyond The Edge of Stability Via Two-step Gradient Updates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The incipient theoretical analysis of this phenomena has mainly focused in the overparametrised regime, where the effect of choosing a large learning rate may be associated to a `Sharpness-Minimisation’ implicit regularisation within the manifold of minimisers, under appropriate asymptotic limits. In contrast, in this work we directly examine the conditions for such unstable convergence, focusing on simple, yet representative, learning problems, via analysis of two-step gradient updates.	Lei Chen; Joan Bruna;
29	On The Role of Attention in Prompt-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we explore prompt-tuning for one-layer attention architectures and study contextual mixture-models where each input token belongs to a context-relevant or -irrelevant set.	Samet Oymak; Ankit Singh Rawat; Mahdi Soltanolkotabi; Christos Thrampoulidis;
30	Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose an ASR framework, dubbed Master-ASR, that, for the first time, simultaneously achieves strong multilingual scalability and low-resource adaptation ability thanks to its modularize-then-assemble strategy.	Zhongzhi Yu; Yang Zhang; Kaizhi Qian; Cheng Wan; Yonggan Fu; Yongan Zhang; Celine Lin;
31	Hyperbolic Representation Learning: Revisiting and Advancing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we first introduce a position-tracking mechanism to scrutinize existing prevalent hyperbolic models, revealing that the learned representations are sub-optimal and unsatisfactory. To address this, we propose a simple yet effective method, hyperbolic informed embedding (HIE), by incorporating cost-free hierarchical information deduced from the hyperbolic distance of the node to the origin (i.e., induced hyperbolic norm) to advance existing hyperbolic models.	Menglin YANG; min zhou; Zhitao Ying; yankai Chen; Irwin King;
32	Learning Belief Representations for Partially Observable Deep RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel deep, partially observable RL algorithm based on modelling belief states — a technique typically used when solving tabular POMDPs, but that has traditionally been difficult to apply to more complex environments.	Andrew Wang; Andrew C Li; Toryn Q. Klassen; Rodrigo Toro Icarte; Sheila A. McIlraith;
33	MABe22: A Multi-Species Multi-Task Benchmark for Learned Representations of Behavior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce MABe22, a large-scale, multi-agent video and trajectory benchmark to assess the quality of learned behavior representations.	Jennifer J. Sun; Markus Marks; Andrew Wesley Ulmer; Dipam Chakraborty; Brian Geuther; Edward Hayes; Heng Jia; Vivek Kumar; Sebastian Oleszko; Zachary Partridge; Milan Peelman; Alice Robie; Catherine E Schretter; Keith Sheppard; Chao Sun; Param Uttarwar; Julian Morgan Wagner; Erik Werner; Joseph Parker; Pietro Perona; Yisong Yue; Kristin Branson; Ann Kennedy;
34	Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Can we scale this SOTA method to ImageNet-1K and does its effectiveness on CIFAR transfer to ImageNet-1K? To answer these questions, we first propose a procedure to exactly compute the unrolled gradient with constant memory complexity, which allows us to scale MTT to ImageNet-1K seamlessly with $\sim 6$x reduction in memory footprint. We further discover that it is challenging for MTT to handle datasets with a large number of classes, and propose a novel soft label assignment that drastically improves its convergence.	Justin Cui; Ruochen Wang; Si Si; Cho-Jui Hsieh;
35	MultiAdam: Parameter-wise Scale-invariant Optimizer for Multiscale Training of Physics-informed Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an analysis of second-order non-homogeneous PDEs, which are classified into three categories and applicable to various common problems.	Jiachen Yao; Chang Su; Zhongkai Hao; Songming Liu; Hang Su; Jun Zhu;
36	Polynomial Preconditioning for Gradient Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new family of preconditioners generated by the symmetric polynomials.	Nikita Doikov; Anton Rodomanov;
37	Internally Rewarded Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we formally formulate IRRL and present a class of problems that belong to IRRL.	Mengdi Li; Xufeng Zhao; Jae Hee Lee; Cornelius Weber; Stefan Wermter;
38	Lottery Tickets in Evolutionary Optimization: On Sparse Backpropagation-Free Trainability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we establish the existence of highly sparse trainable initializations for evolution strategies (ES) and characterize qualitative differences compared to gradient descent (GD)-based sparse training.	Robert Tjarko Lange; Henning Sprekeler;
39	Slot-VAE: Object-Centric Scene Generation with Slot Attention Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the Slot-VAE, a generative model that integrates slot attention with the hierarchical VAE framework for object-centric structured scene generation.	Yanbo Wang; Letao Liu; Justin Dauwels;
40	Model-based Reinforcement Learning with Scalable Composite Policy Gradient Estimators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a scalable method, Total Propagation X (TPX) that improves over TP by changing the node used for IVW, and employing coordinate wise weighting.	Paavo Parmas; Takuma Seno; Yuma Aoki;
41	Predicting Rare Events By Shrinking Towards Proportional Odds Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present PRESTO, a relaxation of the proportional odds model for ordinal regression.	Gregory Faletto; Jacob Bien;
42	The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, the nature of such approximation factors—especially their optimal form in a given learning problem—is poorly understood. In this paper we study this question in linear off-policy value function estimation, where many open questions remain.	Philip Amortila; Nan Jiang; Csaba Szepesvari;
43	Internet Explorer: Targeted Representation Learning on The Open Web Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We suggest an alternate approach: rather than hoping our static datasets transfer to our desired tasks after large-scale pre-training, we propose dynamically utilizing the Internet to quickly train a small-scale model that does extremely well on a target dataset.	Alexander Cong Li; Ellis Langham Brown; Alexei A Efros; Deepak Pathak;
44	Robust and Private Stochastic Linear Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the stochastic linear bandit problem under the additional requirements of differential privacy, robustness* and batched observations.*	Vasileios Charisopoulos; Hossein Esfandiari; Vahab Mirrokni;
45	High-dimensional Location Estimation Via Norm Concentration for Subgamma Vectors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We build on the theory using smoothed* estimators to bound the error for finite $n$ in terms of $\mathcal I_r$, the Fisher information of the $r$-smoothed distribution.*	Shivam Gupta; Jasper C.H. Lee; Eric Price;
46	Action Matching: Learning Stochastic Dynamics from Samples Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In order to better understand the systems under observation, we would like to learn a model of the underlying process that allows us to propagate samples in time and thereby simulate entire individual trajectories. In this work, we propose Action Matching, a method for learning a rich family of dynamics using only independent samples from its time evolution.	Kirill Neklyudov; Rob Brekelmans; Daniel Severo; Alireza Makhzani;
47	Short-lived High-volume Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an $\ell$-Layered Sieve Policy that recursively refines the action space for $\ell\leq w$ times.	Su Jia; Nishant Oli; Ian Anderson; Paul Duff; Andrew A Li; Ramamoorthi Ravi;
48	Learning Temporally AbstractWorld Models Without Online Experimentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an approach for simultaneously learning sets of skills and temporally abstract, skill-conditioned world models purely from offline data, enabling agents to perform zero-shot online planning of skill sequences for new tasks.	Benjamin Freed; Siddarth Venkatraman; Guillaume Adrien Sartoretti; Jeff Schneider; Howie Choset;
49	Active Policy Improvement from Multiple Black-box Oracles Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce MAPS and MAPS-SE, a class of policy improvement algorithms that perform imitation learning from multiple suboptimal oracles.	Xuefeng Liu; Takuma Yoneda; Chaoqi Wang; Matthew Walter; Yuxin Chen;
50	Stochastic Gradient Descent-Induced Drift of Representation in A Two-Layer Neural Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite being observed in the brain and in artificial networks, the mechanisms of drift and its implications are not fully understood. Motivated by recent experimental findings of stimulus-dependent drift in the piriform cortex, we use theory and simulations to study this phenomenon in a two-layer linear feedforward network.	Farhad Pashakhanloo; Alexei Koulakov;
51	IRNeXt: Rethinking Convolutional Network Design for Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we excavate the potential of the convolutional neural network (CNN) and show that our CNN-based model can receive comparable or better performance than Transformer models with low computation overhead on several image restoration tasks.	Yuning Cui; Wenqi Ren; Sining Yang; Xiaochun Cao; Alois Knoll;
52	TabLeak: Tabular Data Leakage in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A successful attack for tabular data must address two key challenges unique to the domain: (i) obtaining a solution to a high-variance mixed discrete-continuous optimization problem, and (ii) enabling human assessment of the reconstruction as unlike for image and text data, direct human inspection is not possible. In this work we address these challenges and propose TabLeak, the first comprehensive reconstruction attack on tabular data.	Mark Vero; Mislav Balunovic; Dimitar Iliev Dimitrov; Martin Vechev;
53	Can Large Language Models Reason About Program Invariants? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the application of large language models to invariant prediction, finding that models trained on source code and fine-tuned for invariant generation can perform invariant prediction as static rather than dynamic analysis.	Kexin Pei; David Bieber; Kensen Shi; Charles Sutton; Pengcheng Yin;
54	Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood Estimation for Latent Gaussian Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversion.	Alexander Lin; Bahareh Tolooshams; Yves Atchade; Demba E. Ba;
55	Stable and Consistent Prediction of 3D Characteristic Orientation Via Invariant Residual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel method to decouple the shape geometry and semantics of the input point cloud to achieve both stability and consistency.	Seungwook Kim; Chunghyun Park; Yoonwoo Jeong; Jaesik Park; Minsu Cho;
56	Deep Generative Symbolic Regression with Monte-Carlo-Tree-Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method which provides the best of both worlds, based on a Monte-Carlo Tree Search procedure using a context-aware neural mutation model, which is initially pre-trained to learn promising mutations, and further refined from successful experiences in an online fashion.	Pierre-Alexandre Kamienny; Guillaume Lample; sylvain lamprier; Marco Virgolin;
57	Mixing Predictions for Online Metric Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since the performance of each predictor may vary over time, it is desirable to use not the single best predictor as a benchmark, but rather a dynamic combination which follows different predictors at different times. We design algorithms that combine predictions and are competitive against such dynamic combinations for a wide class of online problems, namely, metrical task systems.	Antonios Antoniadis; Christian Coester; Marek Elias; Adam Polak; Bertrand Simon;
58	Approximate Causal Effect Identification Under Weak Confounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the effect of weak confounding’ on causal estimands.	Ziwei Jiang; Lai Wei; Murat Kocaoglu;
59	Bootstrap in High Dimension with Low Computation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the use of bootstraps in high-dimensional environments with a small number of resamples.	Henry Lam; Zhenyuan Liu;
60	Image Shortcut Squeezing: Countering Perturbative Availability Poisons with Compression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current research adopts the belief that practical and effective approaches to countering such poisons do not exist. In this paper, we argue that it is time to abandon this belief.	Zhuoran Liu; Zhengyu Zhao; Martha Larson;
61	Proper Losses for Discrete Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We initiate the study of proper losses for evaluating generative models in the discrete setting.	Dhamma Kimpara; Rafael Frongillo; Bo Waggoner;
62	Private Federated Learning with Autotuned Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose new techniques for reducing communication in private federated learning without the need for setting or tuning compression rates.	Enayat Ullah; Christopher A. Choquette-Choo; Peter Kairouz; Sewoong Oh;
63	SpotEM: Efficient Video Search for Episodic Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose SpotEM, an approach to achieve efficiency for a given EM method while maintaining good accuracy.	Santhosh Kumar Ramakrishnan; Ziad Al-Halah; Kristen Grauman;
64	DRCFS: Doubly Robust Causal Feature Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose DRCFS, a doubly robust feature selection method for identifying the causal features even in nonlinear and high dimensional settings.	Francesco Quinzan; Ashkan Soleymani; Patrick Jaillet; Cristian R. Rojas; Stefan Bauer;
65	Open-Vocabulary Universal Image Segmentation with MaskCLIP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we tackle an emerging computer vision task, open-vocabulary universal image segmentation, that aims to perform semantic/instance/panoptic segmentation (background semantic labeling + foreground instance segmentation) for arbitrary categories of text-based descriptions in inference time.	Zheng Ding; Jieke Wang; Zhuowen Tu;
66	Universal Physics-Informed Neural Networks: Symbolic Differential Operator Discovery with Sparse Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we perform symbolic discovery of differential operators in a situation where there is sparse experimental data.	Lena Podina; Brydon Eastman; Mohammad Kohandel;
67	Partial Optimality in Cubic Correlation Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we focus on establishing partial optimality conditions for the special case of complete graphs and cubic objective functions.	David Stein; Silvia Di Gregorio; Bjoern Andres;
68	InfoDiffusion: Representation Learning Using Information Maximizing Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose InfoDiffusion, an algorithm that augments diffusion models with low-dimensional latent variables that capture high-level factors of variation in the data.	Yingheng Wang; Yair Schiff; Aaron Gokaslan; Weishen Pan; Fei Wang; Christopher De Sa; Volodymyr Kuleshov;
69	A Large-Scale Study of Probabilistic Calibration in Neural Network Regression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While neural network miscalibration has been studied primarily in classification, we investigate this in the less-explored domain of regression.	Victor Dheur; Souhaib Ben Taieb;
70	Global Optimality of Elman-type RNNs in The Mean-field Regime Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We analyze Elman-type recurrent neural networks (RNNs) and their training in the mean-field regime.	Andrea Agazzi; Jianfeng Lu; Sayan Mukherjee;
71	Semi-Autoregressive Energy Flows: Exploring Likelihood-Free Training of Normalizing Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies the likelihood-free training of flows and proposes the energy objective, an alternative sample-based loss based on proper scoring rules.	Phillip Si; Zeyi Chen; Subham Sekhar Sahoo; Yair Schiff; Volodymyr Kuleshov;
72	When Do Minimax-fair Learning and Empirical Risk Minimization Coincide? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove that if the hypothesis class is sufficiently expressive and the group information is recoverable from the features, ERM and minimax-fairness learning formulations indeed have the same performance on the worst-off group.	Harvineet Singh; Matthäus Kleindessner; Volkan Cevher; Rumi Chunara; Chris Russell;
73	Theoretical Behavior of XAI Methods in The Presence of Suppressor Variables Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that the majority of the studied approaches will attribute non-zero importance to a non-class-related suppressor feature in the presence of correlated noise.	Rick Wilming; Leo Kieslich; Benedict Clark; Stefan Haufe;
74	Probabilistic Imputation for Time-series Classification with Missing Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel probabilistic framework for classification with multivariate time series data with missing values.	SeungHyun Kim; Hyunsu Kim; Eunggu Yun; Hwangrae Lee; Jaehun Lee; Juho Lee;
75	MAHALO: Unifying Offline Reinforcement Learning and Imitation Learning from Observations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a generic approach to offline PLfO, called Modality-agnostic Adversarial Hypothesis Adaptation for Learning from Observations (MAHALO).	Anqi Li; Byron Boots; Ching-An Cheng;
76	Robust Counterfactual Explanations for Neural Networks With Probabilistic Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, models can often change significantly in the parameter space with little to no change in their predictions or accuracy on the given dataset. In this work, we introduce a mathematical abstraction termed naturally-occurring model change, which allows for arbitrary changes in the parameter space such that the change in predictions on points that lie on the data manifold is limited.	Faisal Hamman; Erfaun Noorani; Saumitra Mishra; Daniele Magazzeni; Sanghamitra Dutta;
77	Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a general framework for implementing Stackelberg equilibria search as a multi-agent RL problem, allowing a wide range of algorithmic design choices.	Matthias Gerstgrasser; David C. Parkes;
78	Multi-Agent Learning from Learners Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Nevertheless, some recent works move away from the optimality assumption to study the Learning from a Learner (LfL) problem, where the challenge is inferring the reward function of a learning agent from a sequence of demonstrations produced by progressively improving policies. In this work, we take one of the initial steps in addressing the multi-agent version of this problem and propose a new algorithm, MA-LfL (Multiagent Learning from a Learner).	Mine Melodi Caliskan; Francesco Chini; Setareh Maghsudi;
79	Cut Your Losses with Squentropy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose the squentropy loss, which is the sum of two terms: the cross-entropy loss and the average square loss over the incorrect classes.We provide an extensive set of experiment on multi-class classification problems showing that the squentropy loss outperforms both the pure cross-entropy and rescaled square losses in terms of the classification accuracy.	Like Hui; Mikhail Belkin; Stephen Wright;
80	The Statistical Scope of Multicalibration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We make a connection between multicalibration and property elicitation and show that (under mild technical conditions) it is possible to produce a multicalibrated predictor for a continuous scalar property $\Gamma$ if and only if $\Gamma$ is elicitable.	Georgy Noarov; Aaron Roth;
81	Sparse Learning of Dynamical Systems in RKHS: An Operator-Theoretic Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a method for sparse learning of transfer operators from $\beta$-mixing stochastic processes, in both discrete and continuous time, and provide sample complexity analysis extending existing theoretical guarantees for learning from non-sparse, i.i.d. data.	Boya Hou; Sina Sanjari; Nathan Dahlin; Subhonmesh Bose; Umesh Vaidya;
82	Conformal Prediction with Missing Values Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study conformal prediction with missing values in the covariates — a setting that brings new challenges to uncertainty quantification.	Margaux Zaffran; Aymeric Dieuleveut; Julie Josse; Yaniv Romano;
83	Weakly Supervised Regression with Interval Targets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide a thorough study on RIT.	Xin Cheng; Yuzhou Cao; Ximing Li; Bo An; Lei Feng;
84	Controllable Neural Symbolic Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these methods, in their current form, lack the capability to incorporate user-defined prior knowledge, which is often required in natural sciences and engineering fields. To overcome this limitation, we propose a novel neural symbolic regression method, named Neural Symbolic Regression with Hypothesis (NSRwH) that enables the explicit incorporation of assumptions about the expected structure of the ground-truth expression into the prediction process.	Tommaso Bendinelli; Luca Biggio; Pierre-Alexandre Kamienny;
85	DSGD-CECA: Decentralized SGD with Communication-Optimal Exact Consensus Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The state-of-the-art approach uses the dynamic one-peer exponential-2 topology, achieving faster training times and improved scalability than the ring, grid, torus, and hypercube topologies. However, this approach requires a power-of-2 number of agents, which is impractical at scale. In this paper, we remove this restriction and propose Decentralized SGD with Communication-optimal Exact Consensus Algorithm (DSGD-CECA), which works for any number of agents while still achieving state-of-the-art properties.	Lisang Ding; Kexin Jin; Bicheng Ying; Kun Yuan; Wotao Yin;
86	How Do Transformers Learn Topic Structure: Towards A Mechanistic Understanding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In our paper, we provide fine-grained mechanistic understanding of how transformers learn “semantic structure”, understood as capturing co-occurrence structure of words.	Yuchen Li; Yuanzhi Li; Andrej Risteski;
87	Subset-Based Instance Optimality in Private Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new definition of instance optimality for differentially private estimation algorithms.	Travis Dick; Alex Kulesza; Ziteng Sun; Ananda Theertha Suresh;
88	A Statistical Perspective on Retrieval-Based Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a formal treatment of retrieval-based models to characterize their performance via a novel statistical perspective.	Soumya Basu; Ankit Singh Rawat; Manzil Zaheer;
89	Adaptive IMLE for Few-shot Pretraining-free Generative Modelling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a more generalized formulation of IMLE which includes the original formulation as a special case, and we prove that the theoretical guarantees hold under weaker conditions.	Mehran Aghabozorgi; Shichong Peng; Ke Li;
90	Polyhedral Complex Extraction from ReLU Networks Using Edge Subdivision Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead of regions, we propose to subdivide edges, leading to a novel method for polyhedral complex extraction.	Arturs Berzins;
91	Task-Specific Skill Localization in Fine-tuned Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Thus fine-tuning allows the model to quickly pick up task-specific skills, but there has been limited study of where these newly-learnt skills reside inside the massive model. This paper introduces the term skill localization for this problem and proposes a solution.	Abhishek Panigrahi; Nikunj Saunshi; Haoyu Zhao; Sanjeev Arora;
92	Learning Unforeseen Robustness from Out-of-distribution Data Using Equivariant Domain Translator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we learn unforeseen robustness by harnessing the variations in the abundant out-of-distribution data.	Sicheng Zhu; Bang An; Furong Huang; Sanghyun Hong;
93	Fast Federated Machine Unlearning with Nonlinear Functional Theory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a fast FMU algorithm, FFMU, for improving the FMU efficiency while maintaining the unlearning quality.	Tianshi Che; Yang Zhou; Zijie Zhang; Lingjuan Lyu; Ji Liu; Da Yan; Dejing Dou; Jun Huan;
94	FARE: Provably Fair Representation Learning with Practical Certificates Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we address that challenge and introduce FARE (Fairness with Restricted Encoders), the first FRL method with practical fairness certificates.	Nikola Jovanović; Mislav Balunovic; Dimitar Iliev Dimitrov; Martin Vechev;
95	Effective Minkowski Dimension of Deep Nonparametric Regression: Function Approximation and Statistical Theories Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a relaxed assumption that the input data are concentrated around a subset of $\mathbb{R}^d$ denoted by $\mathcal{S}$, and the intrinsic dimension of $\mathcal{S}$ can be characterized by a new complexity notation — effective Minkowski dimension.	Zixuan Zhang; Minshuo Chen; Mengdi Wang; Wenjing Liao; Tuo Zhao;
96	BiRT: Bio-inspired Replay in Vision Transformers for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, representation rehearsal in vision transformers lacks diversity, resulting in overfitting and consequently, performance drops significantly compared to raw image rehearsal. Therefore, we propose BiRT, a novel representation rehearsal-based continual learning approach using vision transformers.	Kishaan Jeeveswaran; Prashant Shivaram Bhat; Bahram Zonooz; Elahe Arani;
97	On The Importance of Feature Decorrelation for Unsupervised Representation Learning in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, an important challenge of this approach is the representational collapse, where the subspace of the latent representations collapses into a low-dimensional manifold. To address this issue, we propose a novel URL framework that causally predicts future states while increasing the dimension of the latent manifold by decorrelating the features in the latent space.	Hojoon Lee; Koanho Lee; Dongyoon Hwang; Hyunho Lee; Byungkun Lee; Jaegul Choo;
98	Trainability, Expressivity and Interpretability in Gated Neural ODEs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel measure of expressivity which probes the capacity of a neural network to generate complex trajectories.	Timothy Doyeon Kim; Tankut Can; Kamesh Krishnamurthy;
99	Feature Learning in Deep Classifiers Through Intermediate Neural Collapse Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we conduct an empirical study of the feature learning process in deep classifiers.	Akshay Rangamani; Marius Lindegaard; Tomer Galanti; tomaso a poggio;
100	Q-Flow: Generative Modeling for Differential Equations of Open Quantum Dynamics with Normalizing Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, we model the Q function seamlessly with off-the-shelf* deep generative models such as normalizing flows.*	Owen M Dugan; Peter Y. Lu; Rumen Dangovski; Di Luo; Marin Soljacic;
101	Unearthing InSights Into Mars: Unsupervised Source Separation with Limited Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While data-driven methods have shown great promise in source separation, they often require large amounts of data, which rarely exists in planetary space missions. To address this challenge, we propose an unsupervised source separation scheme for domains with limited data access that involves solving an optimization problem in the wavelet scattering covariance representation space—an interpretable, low-dimensional representation of stationary processes.	Ali Siahkoohi; Rudy Morel; Maarten V. de Hoop; Erwan Allys; Grégory Sainton; Taichi Kawamura;
102	Doubly Optimal No-Regret Learning in Monotone Games Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose the accelerated optimistic gradient* (AOG) algorithm, the first doubly optimal no-regret learning algorithm for smooth monotone games.*	Yang Cai; Weiqiang Zheng;
103	The Test of Tests: A Framework for Differentially Private Hypothesis Testing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a generic framework for creating differentially private versions of any hypothesis test in a black-box way.	Zeki Kazan; Kaiyan Shi; Adam Groce; Andrew Bray;
104	Multi-Symmetry Ensembles: Improving Diversity and Generalization Via Opposing Symmetries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present Multi-Symmetry Ensembles (MSE), a framework for constructing diverse ensembles by capturing the multiplicity of hypotheses along symmetry axes, which explore the hypothesis space beyond stochastic perturbations of model weights and hyperparameters.	Charlotte Loh; Seungwook Han; Shivchander Sudalairaj; Rumen Dangovski; Kai Xu; Florian Wenzel; Marin Soljacic; Akash Srivastava;
105	When Is Realizability Sufficient for Off-Policy Reinforcement Learning? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, Bellman completeness is a requirement that is much stronger than realizability and that is deemed to be too strong to hold in practice. In this work, we relax this structural assumption and analyze the statistical complexity of off-policy reinforcement learning when only realizability holds for the prescribed function class.	Andrea Zanette;
106	Hidden Symmetries of ReLU Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we prove that, for any network architecture where no layer is narrower than the input, there exist parameter settings with no hidden symmetries.	Elisenda Grigsby; Kathryn Lindsey; David Rolnick;
107	Bootstrapped Representations in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, it is unclear which features these algorithms capture and how they relate to those from other auxiliary-task-based approaches. In this paper, we address this gap and provide a theoretical characterization of the state representation learnt by temporal difference learning (Sutton, 1988).	Charline Le Lan; Stephen Tu; Mark Rowland; Anna Harutyunyan; Rishabh Agarwal; Marc G Bellemare; Will Dabney;
108	New Metrics and Search Algorithms for Weighted Causal DAGs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, under some standard assumptions, we study causal graph discovery via _adaptive interventions with node-dependent interventional costs_.	Davin Choo; Kirankumar Shiragur;
109	Exact Inference in High-order Structured Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the problem of inference in high-order structured prediction tasks.	Chuyang Ke; Jean Honorio;
110	Path Neural Networks: Expressive and Accurate Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Path Neural Networks (PathNNs), a model that updates node representations by aggregating paths emanating from nodes.	Gaspard Michel; Giannis Nikolentzos; Johannes F. Lutzeyer; Michalis Vazirgiannis;
111	The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of temporal-difference-based policy evaluation in reinforcement learning.	Mark Rowland; Yunhao Tang; Clare Lyle; Remi Munos; Marc G Bellemare; Will Dabney;
112	Restoration-Degradation Beyond Linear Diffusions: A Non-Asymptotic Analysis For DDIM-type Samplers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a framework for non-asymptotic analysis of deterministic samplers used for diffusion generative modeling.	Sitan Chen; Giannis Daras; Alex Dimakis;
113	On Bridging The Gap Between Mean Field and Finite Width Deep Random Multilayer Perceptron with Batch Normalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the role of depth in the concentration of mean-field predictions for Gram matrices of hidden representations in deep multilayer perceptron (MLP) with batch normalization (BN) at initialization.	Amir Joudaki; Hadi Daneshmand; Francis Bach;
114	Hyperparameters in Reinforcement Learning and How To Tune Them Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that hyperparameter choices in RL can significantly affect the agent’s final performance and sample efficiency, and that the hyperparameter landscape can strongly depend on the tuning seed which may lead to overfitting.	Theresa Eimer; Marius Lindauer; Roberta Raileanu;
115	Towards Constituting Mathematical Structures for Learning to Optimize Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we derive the basic mathematical conditions that successful update rules commonly satisfy.	Jialin Liu; Xiaohan Chen; Zhangyang Wang; Wotao Yin; HanQin Cai;
116	Identifiability and Generalizability in Constrained Inverse Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Two main challenges in Reinforcement Learning (RL) are designing appropriate reward functions and ensuring the safety of the learned policy. To address these challenges, we present a theoretical framework for Inverse Reinforcement Learning (IRL) in constrained Markov decision processes.	Andreas Schlaginhaufen; Maryam Kamgarpour;
117	Adaptive Whitening in Neural Populations with Gain-modulating Interneurons Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing neural circuit models of adaptive whitening operate by modifying synaptic interactions; however, such modifications would seem both too slow and insufficiently reversible. Motivated by the extensive neuroscience literature on gain modulation, we propose an alternative model that adaptively whitens its responses by modulating the gains of individual neurons.	Lyndon Duong; David Lipshutz; David Heeger; Dmitri Chklovskii; Eero P Simoncelli;
118	Optimizing The Collaboration Structure in Cross-Silo Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose FedCollab, a novel FL framework that alleviates negative transfer by clustering clients into non-overlapping coalitions based on their distribution distances and data quantities.	Wenxuan Bao; Haohan Wang; Jun Wu; Jingrui He;
119	Smart Initial Basis Selection for Linear Programs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a learning-based approach for initial basis selection.	Zhenan Fan; Xinglu Wang; Oleksandr Yakovenko; Abdullah Ali Sivas; Owen Ren; Yong Zhang; Zirui Zhou;
120	From Relational Pooling to Subgraph GNNs: A Universal Framework for More Expressive Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Starting from RP, we propose to explicitly assign labels to nodes as additional features to improve graph isomorphism distinguishing power of message passing neural networks.	Cai Zhou; Xiyuan Wang; Muhan Zhang;
121	The Unreasonable Effectiveness of Few-shot Learning for Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that with only 5 examples of high-quality translation data shown at inference, a transformer decoder-only model trained solely with self-supervised learning, is able to match specialized supervised state-of-the-art models as well as more general commercial translation systems.	Xavier Garcia; Yamini Bansal; Colin Cherry; George Foster; Maxim Krikun; Melvin Johnson; Orhan Firat;
122	Auxiliary Learning As An Asymmetric Bargaining Game Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel approach, named AuxiNash, for balancing tasks in auxiliary learning by formalizing the problem as generalized bargaining game with asymmetric task bargaining power.	Aviv Shamsian; Aviv Navon; Neta Glazer; Kenji Kawaguchi; Gal Chechik; Ethan Fetaya;
123	Additive Causal Bandits with Unknown Graph Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To remedy this, we adopt an additional additive assumption on the outcome which allows us to solve the problem by casting it as an additive combinatorial linear bandit problem with full-bandit feedback. We propose a novel action-elimination algorithm for this setting, show how to apply this algorithm to the causal bandit problem, provide sample complexity bounds, and empirically validate our findings on a suite of randomly generated causal models, effectively showing that one does not need to explicitly learn the parents of the outcome to identify the best intervention.	Alan Malek; Virginia Aglietti; Silvia Chiappa;
124	Mitigating Spurious Correlations in Multi-modal Models During Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel approach to address spurious correlations during fine-tuning for a given domain of interest.	Yu Yang; Besmira Nushi; Hamid Palangi; Baharan Mirzasoleiman;
125	Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this incline in complexity, and furthermore the increase in the dimensions of the observation came at the cost of volatility that can be taken advantage of via adversarial attacks (i.e. moving along worst-case directions in the observation space). To solve this policy instability problem we propose a novel method to detect the presence of these non-robust directions via local quadratic approximation of the deep neural policy loss.	Ezgi Korkmaz; Jonah Brown-Cohen;
126	Efficient Exploration Via Epistemic-Risk-Seeking Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose a new, differentiable optimistic objective that when optimized yields a policy that provably explores efficiently, with guarantees even under function approximation.	Brendan O’Donoghue;
127	Variational Sparse Inverse Cholesky Approximation for Latent Gaussian Processes Via Double Kullback-Leibler Minimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To achieve scalable and accurate inference for latent Gaussian processes, we propose a variational approximation based on a family of Gaussian distributions whose covariance matrices have sparse inverse Cholesky (SIC) factors.	Jian Cao; Myeongjong Kang; Felix Jimenez; Huiyan Sang; Florian Tobias Schaefer; Matthias Katzfuss;
128	Improved Policy Evaluation for Randomized Trials of Algorithmic Resource Allocation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our key contribution is to present a new estimator leveraging our proposed novel concept, that involves retrospective reshuffling of participants across experimental arms at the end of an RCT.	Aditya Mate; Bryan Wilder; Aparna Taneja; Milind Tambe;
129	Efficient Quantum Algorithms for Quantum Optimal Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present efficient quantum algorithms that are exponentially faster than classical algorithms for solving the quantum optimal control problem.	Xiantao Li; Chunhao Wang;
130	SLAMB: Accelerated Large Batch Training with Sparse Communication Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we combine sparsification-based gradient compression with the layer-wise adaptive moments optimizer for large batch training (LAMB).	Hang Xu; Wenxuan Zhang; Jiawei Fei; Yuzhe Wu; TingWen Xie; Jun Huang; Yuchen Xie; Mohamed Elhoseiny; Panos Kalnis;
131	Cooperation in The Latent Space: The Benefits of Adding Mixture Components in Variational Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show how the mixture components cooperate when they jointly adapt to maximize the ELBO.	Oskar Kviman; Ricky Molén; Alexandra Hotti; Semih Kurt; Víctor Elvira; Jens Lagergren;
132	Learning to Incentivize Information Acquisition: Proper Scoring Rules Meet Principal-Agent Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the incentivized information acquisition problem, where a principal hires an agent to gather information on her behalf.	Siyu Chen; Jibang Wu; Yifan Wu; Zhuoran Yang;
133	High Probability Convergence of Stochastic Gradient Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we describe a generic approach to show convergence with high probability for both stochastic convex and non-convex optimization with sub-Gaussian noise.	Zijian Liu; Ta Duy Nguyen; Thien Hang Nguyen; Alina Ene; Huy Nguyen;
134	Towards Understanding and Reducing Graph Structural Noise for GNNs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we focus on understanding and alleviating the effect of graph structural noise on GNN performance.	Mingze Dong; Yuval Kluger;
135	On The Convergence of Gradient Flow on Multi-layer Linear Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the convergence of gradient flow on a multi-layer linear model with a loss function of the form $f(W_1W_2\cdots W_L)$.	Hancheng Min; Rene Vidal; Enrique Mallada;
136	Single Point-Based Distributed Zeroth-Order Optimization with A Non-Convex Stochastic Objective Function Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a zero-order distributed optimization method based on a one-point estimate of the gradient tracking technique.	Elissa Mhanna; Mohamad Assaad;
137	Beyond Exponentially Fast Mixing in Average-Reward Reinforcement Learning Via Multi-Level Monte Carlo Actor-Critic Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an RL methodology attuned to the mixing time by employing a multi-level Monte Carlo estimator for the critic, the actor, and the average reward embedded within an actor-critic (AC) algorithm.	Wesley Suttle; Amrit Bedi; Bhrij Patel; Brian M. Sadler; Alec Koppel; Dinesh Manocha;
138	Neural Network Accelerated Implicit Filtering: Integrating Neural Network Surrogates With Provably Convergent Derivative Free Optimization Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce neural network accelerated implicit filtering (NNAIF), a novel family of methods for solving noisy derivative free (i.e. black box, zeroth order) optimization problems.	Brian Irwin; Eldad Haber; Raviv Gal; Avi Ziv;
139	Stochastic Marginal Likelihood Gradients Using Neural Tangent Kernels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, estimating a single hyperparameter gradient requires a pass through the entire dataset, limiting the scalability of such algorithms. In this work, we overcome this issue by introducing lower bounds to the linearized Laplace approximation of the marginal likelihood.	Alexander Immer; Tycho F.A. van der Ouderaa; Mark van der Wilk; Gunnar Ratsch; Bernhard Schölkopf;
140	Constrained Optimization Via Exact Augmented Lagrangian and Randomized Iterative Sketching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This class of problems appears widely in a variety of applications in machine learning and engineering, ranging from constrained deep neural networks, to optimal control, to PDE-constrained optimization. We develop an adaptive inexact Newton method for this problem class.	Ilgee Hong; Sen Na; Michael W. Mahoney; mladen kolar;
141	FAENet: Frame Averaging Equivariant GNN for Materials Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce (1) a flexible, model-agnostic framework based on stochastic frame averaging that enforces E(3) equivariance or invariance, without any architectural constraints; (2) FAENet: a simple, fast and expressive GNN that leverages stochastic frame averaging to process geometric information without constraints.	Alexandre AGM Duval; Victor Schmidt; Alex Hernández-García; Santiago Miret; Fragkiskos D. Malliaros; Yoshua Bengio; David Rolnick;
142	Maximal Initial Learning Rates in Deep ReLU Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the maximal initial learning rate $\eta^{\ast}$ – the largest learning rate at which a randomly initialized neural network can successfully begin training and achieve (at least) a given threshold accuracy.	Gaurav Iyer; Boris Hanin; David Rolnick;
143	Accelerated Cyclic Coordinate Dual Averaging with Extrapolation for Composite Convex Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by a recent success in analyzing an extrapolated cyclic scheme for generalized variational inequalities, we propose an Accelerated Cyclic Coordinate Dual Averaging with Extrapolation (A-CODER) method for composite convex optimization, where the objective function can be expressed as the sum of a smooth convex function accessible via a gradient oracle and a convex, possibly nonsmooth, function accessible via a proximal oracle.	Cheuk Yin Lin; Chaobing Song; Jelena Diakonikolas;
144	A Robust Test for The Stationarity Assumption in Sequential Decision Making Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a model-based doubly robust procedure for testing the stationarity assumption and detecting change points in offline RL settings with certain degree of homogeneity.	Jitao Wang; Chengchun Shi; Zhenke Wu;
145	Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies score approximation, estimation, and distribution recovery of diffusion models, when data are supported on an unknown low-dimensional linear subspace.	Minshuo Chen; Kaixuan Huang; Tuo Zhao; Mengdi Wang;
146	Why Target Networks Stabilise Temporal Difference Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet a complete theoretical explanation for the effectiveness of target networks remains elusive. In this work, we provide an analysis of this popular class of algorithms, to finally answer the question: “why do target networks stabilise TD learning”?	Mattie Fellows; Matthew J. A. Smith; Shimon Whiteson;
147	Simplified Temporal Consistency Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we show that, surprisingly, a simple representation learning approach relying only on a latent dynamics model trained by latent temporal consistency is sufficient for high-performance RL.	Yi Zhao; Wenshuai Zhao; Rinu Boney; Juho Kannala; Joni Pajarinen;
148	Generalized Implicit Follow-The-Regularized-Leader Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new class of online learning algorithms, generalized implicit Follow-The-Regularized-Leader (FTRL), that expands the scope of FTRL framework.	Keyi Chen; Francesco Orabona;
149	On Pitfalls of Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite the sheer number of existing methods, the inconsistent experimental conditions and lack of standardization in prior literature make it difficult to measure their actual efficacies and progress. To address this issue, we present a large-scale open-sourced Test-Time Adaptation Benchmark, dubbed TTAB, which includes nine state-of-the-art algorithms, a diverse array of distribution shifts, and two comprehensive evaluation protocols.	Hao Zhao; Yuejiang Liu; Alexandre Alahi; Tao Lin;
150	Algorithms for Bounding Contribution for Histogram Estimation Under User-level Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose algorithms to choose the best user contribution bound for histogram estimation under both bounded and unbounded domain settings.	Yuhan Liu; Ananda Theertha Suresh; Wennan Zhu; Peter Kairouz; Marco Gruteser;
151	Are Gaussian Data All You Need? The Extents and Limits of Universality in High-Dimensional Generalized Linear Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this manuscript we consider the problem of generalized linear estimation on Gaussian mixture data with labels given by a single-index model.	Luca Pesce; Florent Krzakala; Bruno Loureiro; Ludovic Stephan;
152	Model-based Offline Reinforcement Learning with Count-based Conservatism Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a model-based offline reinforcement learning method that integrates count-based conservatism, named $\texttt{Count-MORL}$.	Byeongchan Kim; Min-hwan Oh;
153	One-shot Imitation in A Non-Stationary Environment Via Multi-Modal Skill Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To tackle the problem, we explore the compositionality of complex tasks, and present a novel skill-based imitation learning framework enabling one-shot imitation and zero-shot adaptation; from a single demonstration for a complex unseen task, a semantic skill sequence is inferred and then each skill in the sequence is converted into an action sequence optimized for environmental hidden dynamics that can vary over time.	Sangwoo Shin; Daehee Lee; Minjong Yoo; Woo Kyung Kim; Honguk Woo;
154	Structural Re-weighting Improves Graph Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work examines different impacts of distribution shifts caused by either graph structure or node attributes and identifies a new type of shift, named conditional structure shift (CSS), which current GDA approaches are provably sub-optimal to deal with. A novel approach, called structural reweighting (StruRW), is proposed to address this issue and is tested on synthetic graphs, four benchmark datasets, and a new application in HEP.	Shikun Liu; Tianchun Li; Yongbin Feng; Nhan Tran; Han Zhao; Qiang Qiu; Pan Li;
155	Hypervolume Knowledge Gradient: A Lookahead Approach for Multi-Objective Bayesian Optimization with Partial Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a general one-step lookahead acquisition function based on the Knowledge Gradient that addresses the complex question of what to evaluate when and at which design points in a principled Bayesian decision-theoretic fashion.	Sam Daulton; Maximilian Balandat; Eytan Bakshy;
156	Machine Learning Force Fields with Data Cost Aware Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Even for the most data-efficient MLFFs, reaching chemical accuracy can require hundreds of frames of force and energy labels generated by expensive quantum mechanical algorithms, which may scale as $O(n^3)$ to $O(n^7)$, with $n$ proportional to the number of basis functions. To address this issue, we propose a multi-stage computational framework — ASTEROID, which lowers the data cost of MLFFs by leveraging a combination of cheap inaccurate data and expensive accurate data.	Alexander Bukharin; Tianyi Liu; Shengjie Wang; Simiao Zuo; Weihao Gao; Wen Yan; Tuo Zhao;
157	Continuous Spatiotemporal Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, a limitation of transformers in modeling continuous dynamical systems is that they are fundamentally discrete time and space models and thus have no guarantees regarding continuous sampling. To address this challenge, we present the Continuous Spatiotemporal Transformer (CST), a new transformer architecture that is designed for modeling of continuous systems.	Antonio Henrique de Oliveira Fonseca; Emanuele Zappala; Josue Ortega Caro; David van Dijk;
158	ED-Batch: Efficient Automatic Batching of Dynamic Neural Networks Via Learned Finite State Machines Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide an approach for batching dynamic DNNs based on finite state machines, which enables the automatic discovery of batching policies specialized for each DNN via reinforcement learning.	Siyuan Chen; Pratik Pramod Fegade; Tianqi Chen; Phillip Gibbons; Todd Mowry;
159	Emergent Asymmetry of Precision and Recall for Measuring Fidelity and Diversity of Generative Models in High Dimensions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given their central role in comparing and improving generative models, understanding their limitations are crucially important. To that end, in this work, we identify a critical flaw in the common approximation of these metrics using k-nearest-neighbors, namely, that the very interpretations of fidelity and diversity that are assigned to Precision and Recall can fail in high dimensions, resulting in very misleading conclusions.	Mahyar Khayatkhoei; Wael AbdAlmageed;
160	An SDE for Modeling SAM: Theory and Insights Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the SAM (Sharpness-Aware Minimization) optimizer which has recently attracted a lot of interest due to its increased performance over more classical variants of stochastic gradient descent. Our main contribution is the derivation of continuous-time models (in the form of SDEs) for SAM and two of its variants, both for the full-batch and mini-batch settings.	Enea Monzio Compagnoni; Luca Biggio; Antonio Orvieto; Frank Norbert Proske; Hans Kersting; Aurelien Lucchi;
161	Simple and Fast Group Robustness By Automatic Feature Reweighting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Automatic Feature Reweighting (AFR), an extremely simple and fast method for updating the model to reduce the reliance on spurious features.	Shikai Qiu; Andres Potapczynski; Pavel Izmailov; Andrew Gordon Wilson;
162	MG-GNN: Multigrid Graph Neural Networks for Learning Multilevel Domain Decomposition Methods Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose multigrid graph neural networks (MG-GNN), a novel GNN architecture for learning optimized parameters in two-level DDMs.	Ali Taghibakhshi; Nicolas Nytko; Tareq Uz Zaman; Scott MacLachlan; Luke Olson; Matthew West;
163	Linearly Constrained Bilevel Optimization: A Smoothed Implicit Gradient Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, in this work, we develop an implicit gradient-based approach, which is easy to implement, and is suitable for machine learning applications.	Prashant Khanduri; Ioannis Tsaknakis; Yihua Zhang; Jia Liu; Sijia Liu; Jiawei Zhang; Mingyi Hong;
164	Partially Observable Multi-agent RL with (Quasi-)Efficiency: The Blessing of Information Sharing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To circumvent the known hardness results and the use of computationally intractable oracles, we propose to leverage the potential information-sharing among agents, a standard practice in empirical MARL and a common model for multi-agent control systems with communications.	Xiangyu Liu; Kaiqing Zhang;
165	Revisiting Structured Variational Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we revisit SVAEs using modern machine learning tools and demonstrate their advantages over more general alternatives in terms of both accuracy and efficiency.	Yixiu Zhao; Scott Linderman;
166	Tied-Augment: Controlling Representation Similarity Improves Data Augmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a general framework called Tied-Augment, which improves the efficacy of data augmentation in a wide range of applications by adding a simple term to the loss that can control the similarity of representations under distortions.	Emirhan Kurtuluş; Zichao Li; Yann Dauphin; Ekin Dogus Cubuk;
167	Equivariance with Learned Canonicalization Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Symmetry-based neural networks often constrain the architecture in order to achieve invariance or equivariance to a group of transformations. In this paper, we propose an alternative that avoids this architectural constraint by learning to produce canonical representations of the data.	Sékou-Oumar Kaba; Arnab Kumar Mondal; Yan Zhang; Yoshua Bengio; Siamak Ravanbakhsh;
168	Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for Test-Time Policy Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a human-in-the-loop framework to leverage feedback from the end user to quickly identify and augment task-irrelevant visual state concepts.	Andi Peng; Aviv Netanyahu; Mark K Ho; Tianmin Shu; Andreea Bobu; Julie Shah; Pulkit Agrawal;
169	Applied Online Algorithms with Heterogeneous Predictors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate how to more effectively utilize historical datasets and application domain knowledge by intentionally using predictors of different* quantities.*	Jessica Maghakian; Russell Lee; Mohammad Hajiesmaili; Jian Li; Ramesh Sitaraman; Zhenhua Liu;
170	Ewald-based Long-Range Message Passing for Molecular Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While this focus on locality is a useful inductive bias, it also impedes the learning of long-range interactions such as electrostatics and van der Waals forces. To address this drawback, we propose Ewald message passing: a nonlocal Fourier space scheme which limits interactions via a cutoff on frequency instead of distance, and is theoretically well-founded in the Ewald summation method.	Arthur Kosmala; Johannes Gasteiger; Nicholas Gao; Stephan Günnemann;
171	Future-conditioned Unsupervised Pretraining for Decision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to utilize generalized future conditioning to enable efficient unsupervised pretraining from reward-free and sub-optimal offline data.	Zhihui Xie; Zichuan Lin; Deheng Ye; QIANG FU; Yang Wei; Shuai Li;
172	A Model-Based Method for Minimizing CVaR and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a variant of the stochastic prox-linear method for minimizing the Conditional Value-at-Risk (CVaR) objective.	Si Yi Meng; Robert M. Gower;
173	Extrapolative Controlled Sequence Generation Via Iterative Refinement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Instead, in this work, we propose Iterative Controlled Extrapolation (ICE) which iteratively makes local edits to a sequence to enable extrapolation.	Vishakh Padmakumar; Richard Yuanzhe Pang; He He; Ankur P Parikh;
174	General Sequential Episodic Memory Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a class of General Sequential Episodic Memory Models (GSEMM) that, in the adiabatic limit, exhibit a dynamic energy surface, leading to a series of meta-stable states capable of encoding memory sequences.	Arjun Karuvally; Terrence Sejnowski; Hava T Siegelmann;
175	Low Complexity Homeomorphic Projection to Ensure Neural-Network Solution Feasibility for Optimization Over (Non-)Convex Set Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose homeomorphic projection as a low-complexity scheme to guarantee NN solution feasibility for optimization over a general set homeomorphic to a unit ball, covering all compact convex sets and certain classes of nonconvex sets.	Enming Liang; Minghua Chen; Steven Low;
176	LEVER: Learning to Verify Language-to-Code Generation with Execution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results.	Ansong Ni; Srini Iyer; Dragomir Radev; Veselin Stoyanov; Wen-tau Yih; Sida Wang; Xi Victoria Lin;
177	End-to-end Training of Deep Boltzmann Machines By Unbiased Contrastive Divergence with Local Mode Initialization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose to use a coupling based on the Metropolis-Hastings (MH) and to initialize the state around a local mode of the target distribution.	Shohei Taniguchi; Masahiro Suzuki; Yusuke Iwasawa; Yutaka Matsuo;
178	MPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present mPLUG-2, a new unified paradigm with modularized design for multi-modal pretraining, which can benefit from modality collaboration while addressing the problem of modality entanglement.	Haiyang Xu; Qinghao Ye; Ming Yan; Yaya Shi; Jiabo Ye; Yuanhong Xu; Chenliang Li; Bin Bi; Qi Qian; Wei Wang; Guohai Xu; Ji Zhang; Songfang Huang; Fei Huang; Jingren Zhou;
179	Kernel QuantTree Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Kernel QuantTree (KQT), a non-parametric change detection algorithm that monitors multivariate data through a histogram.	Diego Stucchi; Paolo Rizzo; Nicolò Folloni; Giacomo Boracchi;
180	Rethinking Backdoor Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a different approach to the backdoor attack problem.	Alaa Khaddaj; Guillaume Leclerc; Aleksandar Makelov; Kristian Georgiev; Hadi Salman; Andrew Ilyas; Aleksander Madry;
181	Adversarial Robustness of Amortized Bayesian Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a computationally efficient regularization scheme based on penalizing the Fisher information of the conditional density estimator, and show how it improves the adversarial robustness of amortized Bayesian inference.	Manuel Gloeckler; Michael Deistler; Jakob H. Macke;
182	Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, sequential decision making problems are often carried out in a batched manner, either due to the inherent nature of the problem or to serve the purpose of reducing communication and computation costs. In this work, we jointly study these problems in two popular settings, namely, stochastic multi-armed bandits (MABs) and infinite-horizon reinforcement learning (RL), where TS is used to learn the unknown reward distributions and transition dynamics, respectively.	Amin Karbasi; Nikki Lijing Kuang; Yian Ma; Siddharth Mitra;
183	Adversarially Robust PAC Learnability of Real-Valued Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Along the way, we introduce a novel agnostic sample compression scheme for real-valued functions, which may be of independent interest.	Idan Attias; Steve Hanneke;
184	Reprogramming Pretrained Language Models for Antibody Sequence Infilling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Specifically, we introduce ReprogBert in which a pretrained English language model is repurposed for protein sequence infilling – thus considers cross-language adaptation using less data.	Igor Melnyk; Vijil Chenthamarakshan; Pin-Yu Chen; Payel Das; Amit Dhurandhar; Inkit Padhi; Devleena Das;
185	Text-To-Concept (and Back) Via Cross-Model Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We observe that the mapping between an image’s representation in one model to its representation in another can be learned surprisingly well with just a linear layer, even across diverse models. Building on this observation, we propose text-to-concept, where features from a fixed pretrained model are aligned linearly to the CLIP space, so that text embeddings from CLIP’s text encoder become directly comparable to the aligned features.	Mazda Moayeri; Keivan Rezaei; Maziar Sanjabi; Soheil Feizi;
186	Multi-channel Autobidding with Budget and ROI Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Further, we show that the advertiser can achieve the global optimal conversion when she only optimizes over per-channel budgets. In light of this finding, under a bandit feedback setting that mimics real-world scenarios where advertisers have limited information on ad auctions in each channels and how channels procure ads, we present an efficient learning algorithm that produces per-channel budgets whose resulting conversion approximates that of the global optimal problem.	Yuan Deng; Negin Golrezaei; Patrick Jaillet; Jason Cheuk Nam Liang; Vahab Mirrokni;
187	Statistical Indistinguishability of Learning Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the similarity of outcomes of learning rules through the lens of the Total Variation (TV) distance of distributions.	Alkis Kalavasis; Amin Karbasi; Shay Moran; Grigoris Velegkas;
188	Online Learning with Feedback Graphs: The True Shape of Regret Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we define a new quantity $R^$, called the problem complexity, and prove that the minimax regret is proportional to $R^$ for any graph and time horizon $T$.	Tomáš Kocák; Alexandra Carpentier;
189	$H$-Consistency Bounds for Pairwise Misranking Loss Surrogates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a detailed study of $H$-consistency bounds* for score-based ranking.*	Anqi Mao; Mehryar Mohri; Yutao Zhong;
190	From Perception to Programs: Regularize, Overparameterize, and Amortize Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We develop techniques for synthesizing neurosymbolic programs. Such programs mix discrete symbolic processing with continuous neural computation. We relax this mixed …	Hao Tang; Kevin Ellis;
191	Learning to Jump: Thinning and Thickening Latent Counts for Generative Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose learning to jump as a general recipe for generative modeling of various types of data.	Tianqi Chen; Mingyuan Zhou;
192	Identifying Interpretable Subspaces in Image Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Automatic Feature Explanation using Contrasting Concepts (FALCON), an interpretability framework to explain features of image representations.	Neha Kalibhat; Shweta Bhardwaj; C. Bayan Bruss; Hamed Firooz; Maziar Sanjabi; Soheil Feizi;
193	Neural Markov Jump Processes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we introduce an alternative, variational inference algorithm for Markov jump processes which relies on neural ordinary differential equations, and is trainable via back-propagation.	Patrick Seifner; Ramses J Sanchez;
194	PFGM++: Unlocking The Potential of Physics-Inspired Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a new family of physics-inspired generative models termed PFGM++ that unifies diffusion models and Poisson Flow Generative Models (PFGM).	Yilun Xu; Ziming Liu; Yonglong Tian; Shangyuan Tong; Max Tegmark; Tommi S. Jaakkola;
195	Parallel Neurosymbolic Integration with Concordia Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Concordia, a framework overcoming the limitations of prior art.	Jonathan Feldstein; Modestas Jurčius; Efthymia Tsamoura;
196	Conditional Tree Matching for Inference-Time Adaptation of Tree Prediction Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present CTreeOT, a convergent, differentiable algorithm for matching two trees when each tree is conditioned on some input.	Harshit Varma; Abhijeet Awasthi; Sunita Sarawagi;
197	Uncertainty Estimation for Molecules: Desiderata and Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By our analysis, we conclude that none of the previous works satisfies all criteria. To fill this gap, we propose Localized Neural Kernel (LNK) a Gaussian Process (GP)-based extension to existing GNNs satisfying the desiderata.	Tom Wollschläger; Nicholas Gao; Bertrand Charpentier; Mohamed Amine Ketata; Stephan Günnemann;
198	Nearly-Optimal Hierarchical Clustering for Well-Clustered Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents two efficient hierarchical clustering (HC) algorithms with respect to Dasgupta’s cost function.	Steinar Laenen; Bogdan Adrian Manghiuc; He Sun;
199	Posterior Sampling for Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Posterior Sampling for Reinforcement Learning is such a model-based algorithm that has attracted significant interest due to its performance in the tabular setting. This paper introduces Posterior Sampling for Deep Reinforcement Learning (PSDRL), the first truly scalable approximation of Posterior Sampling for Reinforcement Learning that retains its model-based essence.	Remo Sasso; Michelangelo Conserva; Paulo Rauber;
200	Beam Tree Recursive Cells Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Beam Tree Recursive Cell (BT-Cell) – a backpropagation-friendly framework to extend Recursive Neural Networks (RvNNs) with beam search for latent structure induction.	Jishnu Ray Chowdhury; Cornelia Caragea;
201	Revisiting Sampling for Combinatorial Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we revisit the idea of using sampling for combinatorial optimization, motivated by the significant recent advances of gradient-based discrete MCMC and new techniques for parallel neighborhood exploration on accelerators.	Haoran Sun; Katayoon Goshvadi; Azade Nova; Dale Schuurmans; Hanjun Dai;
202	Memory-Based Meta-Learning on Non-Stationary Distributions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The goal of this work is to investigate how far this interpretation can be realized by current sequence prediction models and training regimes.	Tim Genewein; Gregoire Deletang; Anian Ruoss; Li Kevin Wenliang; Elliot Catt; Vincent Dutordoir; Jordi Grau-Moya; Laurent Orseau; Marcus Hutter; Joel Veness;
203	Improved Online Learning Algorithms for CTR Prediction in Ad Auctions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the online learning problem of revenue maximization in ad auctions, where the seller needs to learn the click-through rates (CTRs) of each ad candidate and charge the price of the winner through a pay-per-click manner.	Zhe Feng; Christopher Liaw; Zixin Zhou;
204	User-level Private Stochastic Convex Optimization with Optimal Rates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of differentially private (DP) stochastic convex optimization (SCO) under the notion of user-level differential privacy.	Raef Bassily; Ziteng Sun;
205	The Value of Out-of-Distribution Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show a counter-intuitive phenomenon: the generalization error of a task can be a non-monotonic function of the amount of OOD data.	Ashwin De Silva; Rahul Ramesh; Carey Priebe; Pratik Chaudhari; Joshua T Vogelstein;
206	SMURF-THP: Score Matching-based UnceRtainty QuantiFication for Transformer Hawkes Process Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, the existing methods fail to provide uncertainty quantification for model predictions, e.g., confidence interval for the predicted event’s arrival time. To address these issues, we propose SMURF-THP, a score-based method for learning Transformer Hawkes process and quantifying prediction uncertainty.	Zichong Li; Yanbo Xu; Simiao Zuo; Haoming Jiang; Chao Zhang; Tuo Zhao; Hongyuan Zha;
207	Linear Causal Disentanglement Via Interventions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study observed variables that are a linear transformation of a linear latent causal model.	Chandler Squires; Anna Seigal; Salil S Bhate; Caroline Uhler;
208	Infusing Lattice Symmetry Priors in Attention Mechanisms for Sample-Efficient Abstract Geometric Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, our study motivates a modification to the standard attention mechanism, where attention weights are scaled using soft masks generated by a convolutional network.	Mattia Atzeni; Mrinmaya Sachan; Andreas Loukas;
209	Counterfactual Identifiability of Bijective Causal Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study counterfactual identifiability in causal models with bijective generation mechanisms (BGM), a class that generalizes several widely-used causal models in the literature.	Arash Nasr-Esfahany; Mohammad Alizadeh; Devavrat Shah;
210	Tensor Decompositions Meet Control Theory: Learning General Mixtures of Linear Dynamical Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we give a new approach to learning mixtures of linear dynamical systems that is based on tensor decompositions.	Ainesh Bakshi; Allen Liu; Ankur Moitra; morris yau;
211	Exploring The Benefits of Training Expert Language Models Over Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we report surprising findings that show an expert LM trained on just a single task can outperform an MT LM trained with 300+ different tasks on 11 different unseen datasets and on 13 datasets of the BIG-bench benchmark by an average of 3.20% and 1.29%, respectively.	Joel Jang; Seungone Kim; Seonghyeon Ye; Doyoung Kim; Lajanugen Logeswaran; Moontae Lee; Kyungjae Lee; Minjoon Seo;
212	E$(n)$ Equivariant Message Passing Simplicial Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents $\mathrm{E}(n)$ Equivariant Message Passing Simplicial Networks (EMPSNs), a novel approach to learning on geometric graphs and point clouds that is equivariant to rotations, translations, and reflections.	Floor Eijkelboom; Rob Hesselink; Erik J Bekkers;
213	Generating Private Synthetic Data with Genetic Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, even when possible, these approaches impose a fundamental limitation in which modifications to the minimization problem become additional sources of error. Therefore, we propose Private-GSD, a private genetic algorithm based on zeroth-order optimization heuristics that do not require modifying the original objective; thus, it avoids the aforementioned limitations of first-order optimization.	Terrance Liu; Jingwu Tang; Giuseppe Vietri; Steven Wu;
214	Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an evolution strategies-based algorithm for estimating gradients in unrolled computation graphs, called ES-Single.	Paul Vicol;
215	Modeling Temporal Data As Continuous Functions with Stochastic Process Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we define suitable noise sources and introduce novel denoising and score-matching models.	Marin Biloš; Kashif Rasul; Anderson Schneider; Yuriy Nevmyvaka; Stephan Günnemann;
216	Actor-Critic Alignment for Offline-to-Online Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, they require online estimation of distribution divergence or density ratio. To avoid such complications, we propose deviating from existing actor-critic approaches that directly transfer the state-action value functions.	Zishun Yu; Xinhua Zhang;
217	Flexible Phase Dynamics for Bio-Plausible Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These Contrastive Learning (CL) algorithms are traditionally implemented with rigid, temporally non-local, and periodic learning dynamics, that could limit the range of physical systems capable of harnessing CL. In this study, we build on recent work exploring how CL might be implemented by biological or neurmorphic systems and show that this form of learning can be made temporally local, and can still function even if many of the dynamical requirements of standard training procedures are relaxed.	Ezekiel Williams; Colin Bredenberg; Guillaume Lajoie;
218	Optimal No-Regret Learning for One-Sided Lipschitz Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by applications in pricing and contract design, we study the maximization of one-sided Lipschitz functions, which only provide the (weaker) guarantee that they do not grow too quickly in one direction.	Paul Duetting; Guru Guruganesh; Jon Schneider; Joshua Ruizhi Wang;
219	Understanding The Distillation Process from Deep Generative Models to Tractable Probabilistic Circuits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it is still unclear what factors make this distillation work well. In this paper, we theoretically and empirically discover that the performance of a PC can exceed that of its teacher model.	Xuejie Liu; Anji Liu; Guy Van den Broeck; Yitao Liang;
220	Neural FIM for Learning Fisher Information Metrics from Point Cloud Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose neural FIM, a method for computing the Fisher information metric (FIM) from point cloud data – allowing for a continuous manifold model for the data.	Oluwadamilola Fasina; Guillaume Huguet; Alexander Tong; Yanlei Zhang; Guy Wolf; Maximilian Nickel; Ian Adelstein; Smita Krishnaswamy;
221	In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a solution, we introduce with NINCO a novel test OOD dataset, each sample checked to be ID free, which with its fine-grained range of OOD classes allows for a detailed analysis of an OOD detector’s strengths and failure modes, particularly when paired with a number of synthetic “OOD unit-tests”.We provide code and data at https://github.com/j-cb/NINCO.	Julian Bitterwolf; Maximilian Müller; Matthias Hein;
222	Variational Mixture of HyperGenerators for Learning Distributions Over Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel deep generative model, named VaMoH.	Batuhan Koyuncu; Pablo Sanchez Martin; Ignacio Peis; Pablo M. Olmos; Isabel Valera;
223	Robust Subtask Learning for Compositional Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Typically, a high-level task is decomposed into a sequence of subtasks and a separate policy is trained to perform each subtask. In this paper, we focus on the problem of training subtask policies in a way that they can be used to perform any task; here, a task is given by a sequence of subtasks.	Kishor Jothimurugan; Steve Hsu; Osbert Bastani; Rajeev Alur;
224	Strategic Classification with Unknown User Manipulations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel batch-learning setting in which we use unlabeled data from previous rounds to estimate the manipulation structure.	Tosca Lechner; Ruth Urner; Shai Ben-David;
225	LSDS++ : Dual Sampling for Accelerated K-means++ Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new variant named LSDS++, which improves the sampling efficiency of LocalSearch++ via a strategy called dual sampling.	Chenglin Fan; Ping Li; Xiaoyun Li;
226	Data-OOB: Out-of-bag Estimate As A Simple and Efficient Data Value Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: As a result, it has been recognized as infeasible to apply to large datasets. To address this issue, we propose Data-OOB, a new data valuation method for a bagging model that utilizes the out-of-bag estimate.	Yongchan Kwon; James Zou;
227	From Adaptive Query Release to Machine Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formalize the problem of machine unlearning as design of efficient unlearning algorithms corresponding to learning algorithms which perform a selection of adaptive queries from structured query classes.	Enayat Ullah; Raman Arora;
228	Label Differential Privacy and Private Training Data Release Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our goal is to enable learning of an accurate predictive model while protecting the privacy of each user’s label.	Robert Istvan Busa-Fekete; Andres Munoz medina; Umar Syed; Sergei Vassilvitskii;
229	Learning Dense Correspondences Between Photos and Sketches Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: What are the computational ingredients needed to support this ability? Towards answering this question, we make two contributions: first, we introduce a new sketch-photo correspondence benchmark, PSC6k, containing 150K annotations of 6250 sketch-photo pairs across 125 object categories, augmenting the existing Sketchy dataset with fine-grained correspondence metadata. Second, we propose a self-supervised method for learning dense correspondences between sketch-photo pairs, building upon recent advances in correspondence learning for pairs of photos.	Xuanchen Lu; Xiaolong Wang; Judith E Fan;
230	Conformalization of Sparse Generalized Linear Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on a sparse linear model with only a subset of variables for prediction and use numerical continuation techniques to approximate the solution path efficiently.	Etash Kumar Guha; Eugene Ndiaye; Xiaoming Huo;
231	Iterative Approximate Cross-Validation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new paradigm to efficiently approximate CV when the ERM problem is solved via an iterative first-order algorithm, without running until convergence.	Yuetian Luo; Zhimei Ren; Rina Barber;
232	DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, inspired by the convention in pharmaceutical practice, we decompose the ligand molecule into two parts, namely arms and scaffold, and propose a new diffusion model, DecompDiff, with decomposed priors over arms and scaffold.	Jiaqi Guan; Xiangxin Zhou; Yuwei Yang; Yu Bao; Jian Peng; Jianzhu Ma; Qiang Liu; Liang Wang; Quanquan Gu;
233	Correcting Discount-factor Mismatch in On-policy Policy Gradient Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel distribution correction to account for the discounted stationary distribution that can be plugged into many existing gradient estimators.	Fengdi Che; Gautham Vasan; A. Rupam Mahmood;
234	Locally Regularized Neural Differential Equations: Some Black Boxes Were Meant to Remain Closed! Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this manuscript, we use internal cost heuristics of adaptive differential equation solvers at stochastic time-points to guide the training towards learning a dynamical system that is easier to integrate.	Avik Pal; Alan Edelman; Christopher Vincent Rackauckas;
235	SinDDM: A Single Image Denoising Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we introduce a framework for training a DDM on a single image.	Vladimir Kulikov; Shahar Yadin; Matan Kleiner; Tomer Michaeli;
236	DRew: Dynamically Rewired Message Passing with Delay Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we propose a framework, applicable to any MPNN architecture, that performs a layer-dependent rewiring to ensure gradual densification of the graph.	Benjamin Gutteridge; Xiaowen Dong; Michael M. Bronstein; Francesco Di Giovanni;
237	Masked Trajectory Models for Prediction, Representation, and Control Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Masked Trajectory Models (MTM) as a generic abstraction for sequential decision making.	Philipp Wu; Arjun Majumdar; Kevin Stone; Yixin Lin; Igor Mordatch; Pieter Abbeel; Aravind Rajeswaran;
238	Predictive Flows for Faster Ford-Fulkerson Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent work has shown that leveraging learned predictions can improve the running time of algorithms for bipartite matching and similar combinatorial problems. In this work, we build on this idea to improve the performance of the widely used Ford-Fulkerson algorithm for computing maximum flows by seeding Ford-Fulkerson with predicted flows.	Sami Davies; Benjamin Moseley; Sergei Vassilvitskii; Yuyan Wang;
239	GLOBE-CE: A Translation Based Approach for Global Counterfactual Explanations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Meanwhile, practitioners are requesting more efficient and interactive explainability tools. We take this opportunity to propose Global & Efficient Counterfactual Explanations (GLOBE-CE), a flexible framework that tackles the reliability and scalability issues associated with current state-of-the-art, particularly on higher dimensional datasets and in the presence of continuous features.	Dan Ley; Saumitra Mishra; Daniele Magazzeni;
240	In Search of Insights, Not Magic Bullets: Towards Demystification of The Model Selection Dilemma in Heterogeneous Treatment Effect Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While some solutions have recently been investigated, systematic understanding of the strengths and weaknesses of different model selection criteria is still lacking. In this paper, instead of attempting to declare a global `winner’, we therefore empirically investigate success- and failure modes of different selection criteria.	Alicia Curth; Mihaela van der Schaar;
241	Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel dIffusion language modEl pre-training framework for text generation, which we call GENIE.	Zhenghao Lin; Yeyun Gong; yelong shen; Tong Wu; Zhihao Fan; Chen Lin; Nan Duan; Weizhu Chen;
242	Multisample Flow Matching: Straightening Flows with Minibatch Couplings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Multisample Flow Matching, a more general framework that uses non-trivial couplings between data and noise samples while satisfying the correct marginal constraints.	Aram-Alexandre Pooladian; Heli Ben-Hamu; Carles Domingo-Enrich; Brandon Amos; Yaron Lipman; Ricky T. Q. Chen;
243	Active Causal Structure Learning with Advice Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the problem of active causal structure learning with advice.	Davin Choo; Themistoklis Gouleakis; Arnab Bhattacharyya;
244	ReDi: Efficient Learning-Free Diffusion Inference Via Trajectory Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To accelerate the inference, we propose ReDi, a simple yet learning-free Retrieval-based Diffusion sampling framework.	Kexun Zhang; Xianjun Yang; William Yang Wang; Lei Li;
245	Looped Transformers As Programmable Computers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a framework for using transformer networks as universal computers by programming them with specific weights and placing them in a loop.	Angeliki Giannou; Shashank Rajput; Jy-yong Sohn; Kangwook Lee; Jason D. Lee; Dimitris Papailiopoulos;
246	Alternately Optimized Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a new optimization framework for semi-supervised learning on graphs from a multi-view learning perspective.	Haoyu Han; Xiaorui Liu; Haitao Mao; MohamadAli Torkamani; Feng Shi; Victor Lee; Jiliang Tang;
247	Graph Inductive Biases in Transformers Without Message Passing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: On the other hand, Graph Transformers without message-passing often perform poorly on smaller datasets, where inductive biases are more crucial. To bridge this gap, we propose the Graph Inductive bias Transformer (GRIT) — a new Graph Transformer that incorporates graph inductive biases without using message passing.	Liheng Ma; Chen Lin; Derek Lim; Adriana Romero-Soriano; Puneet K. Dokania; Mark Coates; Philip Torr; Ser-Nam Lim;
248	Complexity of Block Coordinate Descent with Proximal Regularization and Applications to Wasserstein CP-dictionary Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the block coordinate descent methods of Gauss-Seidel type with proximal regularization (BCD-PR), which is a classical method of minimizing general nonconvex objectives under constraints that has a wide range of practical applications.	Dohyun Kwon; Hanbaek Lyu;
249	Neural Wave Machines: Learning Spatiotemporally Structured Representations with Locally Coupled Oscillatory Recurrent Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: An intriguing hypothesis is that traveling waves serve to structure neural representations both in space and time, thereby acting as an inductive bias towards natural data. In this work, we investigate this hypothesis by introducing the Neural Wave Machine (NWM) — a locally coupled oscillatory recurrent neural network capable of exhibiting traveling waves in its hidden state.	T. Anderson Keller; Max Welling;
250	Principled Acceleration of Iterative Numerical Methods Using Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, a systematic study of these approaches and how they differ from meta-learning is lacking. In this paper, we propose a framework to analyze such learning-based acceleration approaches, where one can immediately identify a departure from classical meta-learning.	Sohei Arisaka; Qianxiao Li;
251	Efficient and Equivariant Graph Networks for Predicting Quantum Hamiltonian Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a SE(3)-equivariant network, named QHNet, that achieves efficiency and equivariance.	Haiyang Yu; Zhao Xu; Xiaofeng Qian; Xiaoning Qian; Shuiwang Ji;
252	Measuring The Impact of Programming Language Distribution Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current benchmarks for evaluating neural code models focus on only a small subset of programming languages, excluding many popular languages such as Go or Rust. To ameliorate this issue, we present the BabelCode framework for execution-based evaluation of any benchmark in any language.	Gabriel Orlanski; Kefan Xiao; Xavier Garcia; Jeffrey Hui; Joshua Howland; Jonathan Malmaud; Jacob Austin; Rishabh Singh; Michele Catasta;
253	LoSparse: Structured Compression of Large Language Models Based on Low-Rank and Sparse Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Transformer models have achieved remarkable results in various natural language tasks, but they are often prohibitively large, requiring massive memories and computational resources. To re- duce the size and complexity of these models, we propose LoSparse (Low-Rank and Sparse ap- proximation), a novel model compression tech- nique that approximates a weight matrix by the sum of a low-rank matrix and a sparse matrix.	Yixiao Li; Yifan Yu; Qingru Zhang; Chen Liang; Pengcheng He; Weizhu Chen; Tuo Zhao;
254	Node Embedding from Neural Hamiltonian Orbits in Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we model the embedding update of a node feature as a Hamiltonian orbit over time.	Qiyu Kang; Kai Zhao; Yang Song; Sijie Wang; Wee Peng Tay;
255	Multi-Task Differential Privacy Under Distribution Skew Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We give a systematic analysis of the problem, by studying how to optimally allocate a user’s privacy budget among tasks. We propose a generic algorithm, based on an adaptive reweighting of the empirical loss, and show that in the presence of distribution skew, this gives a quantifiable improvement of excess empirical risk.	Walid Krichene; Prateek Jain; Shuang Song; Mukund Sundararajan; Abhradeep Guha Thakurta; Li Zhang;
256	A/B Testing in Network Data with Covariate-Adaptive Randomization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new adaptive procedure to balance both the network and the covariates.	Jialu Wang; Ping Li; Feifang Hu;
257	Towards Bridging The Gaps Between The Right to Explanation and The Right to Be Forgotten Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Intuitively, enforcing the right to be forgotten may trigger model updates which in turn invalidate previously provided explanations, thus violating the right to explanation. In this work, we investigate the technical implications arising due to the interference between the two aforementioned regulatory principles, and propose the first algorithmic framework to resolve the tension between them.	Satyapriya Krishna; Jiaqi Ma; Himabindu Lakkaraju;
258	General Covariance Data Augmentation for Neural PDE Solvers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To alleviate the problem, we propose a computationally cheap augmentation strategy based on general covariance and simple random coordinate transformations.	Fanaskov Vladimir; Tianchi Yu; Alexander Rudikov; Ivan Oseledets;
259	Deep Anomaly Detection Under Labeling Budget Constraints Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we determine a set of theoretical conditions under which anomaly scores generalize from labeled queries to unlabeled data.	Aodong Li; Chen Qiu; Marius Kloft; Padhraic Smyth; Stephan Mandt; Maja Rudolph;
260	On Kinetic Optimal Probability Paths for Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we investigate the space of Gaussian probability paths, which includes diffusion paths as an instance, and look for an optimal member in some useful sense.	Neta Shaul; Ricky T. Q. Chen; Maximilian Nickel; Matthew Le; Yaron Lipman;
261	Off-Policy Average Reward Actor-Critic with Deterministic Policy Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present both on-policy and off-policy deterministic policy gradient theorems for the average reward performance criterion.	Naman Saxena; Subhojyoti Khastagir; Shishir N Y; Shalabh Bhatnagar;
262	Coarse-to-Fine: A Hierarchical Diffusion Model for Molecule Generation in 3D Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Fragment-based molecule generation is a promising strategy, however, it is nontrivial to be adapted for 3D non-autoregressive generations because of the combinational optimization problems. In this paper, we utilize a coarse-to-fine strategy to tackle this problem, in which a Hierarchical Diffusion-based model (i.e. HierDiff) is proposed to preserve the validity of local segments without relying on autoregressive modeling.	Bo Qiang; Yuxuan Song; Minkai Xu; Jingjing Gong; Bowen Gao; Hao Zhou; Wei-Ying Ma; Yanyan Lan;
263	PASTA: Pessimistic Assortment Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, based on the principle of pessimism, we propose a novel algorithm called Pessimistic ASsortment opTimizAtion (PASTA for short), which can correctly identify the optimal assortment by only requiring the offline data to cover the optimal assortment under general settings.	Juncheng Dong; Weibin Mo; Zhengling Qi; Cong Shi; Ethan X Fang; Vahid Tarokh;
264	Near-optimal Conservative Exploration in Reinforcement Learning Under Episode-wise Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper investigates conservative exploration in reinforcement learning where the performance of the learning agent is guaranteed to be above a certain threshold throughout the learning process.	Donghao Li; Ruiquan Huang; Cong Shen; Jing Yang;
265	Double-Weighting for Covariate Shift Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the performance of such approaches can be poor under support mismatch or when the above ratios take large values. We propose a minimax risk classification (MRC) approach for covariate shift adaptation that avoids such limitations by weighting both training and testing samples.	Jose Ignacio Segovia; Santiago Mazuelas; Anqi Liu;
266	Near-Optimal Algorithms for Private Online Optimization in The Realizable Regime Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider online learning problems in the realizable setting, where there is a zero-loss solution, and propose new Differentially Private (DP) algorithms that obtain near-optimal regret bounds.	Hilal Asi; Vitaly Feldman; Tomer Koren; Kunal Talwar;
267	Leveraging Proxy of Training Data for Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose two lightweight yet informative proxies of the training data and a TTA method fully exploiting them.	Juwon Kang; Nayeong Kim; Donghyeon Kwon; Jungseul Ok; Suha Kwak;
268	The Monge Gap: A Regularizer to Learn All Transport Maps Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More generally, we question the relevance of using Brenier’s result, which only applies to densities, to constrain the architecture of candidate maps fitted on samples. Motivated by these limitations, we propose a radically different approach to estimating OT maps: Given a cost $c$ and a reference measure $\rho$, we introduce a regularizer, the Monge gap $\mathcal{M}^c_{\rho}(T)$ of a map $T$.	Théo Uscidda; marco cuturi;
269	Dirichlet Diffusion Score Model for Biological Sequence Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To develop generative SDE models for discrete data such as biological sequences, here we introduce a diffusion process defined in the probability simplex space with stationary distribution being the Dirichlet distribution.	Pavel Avdeyev; Chenlai Shi; Yuhao Tan; Kseniia Dudnyk; Jian Zhou;
270	Compositional Score Modeling for Simulation-Based Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new method based on conditional score modeling that enjoys the benefits of both approaches.	Tomas Geffner; George Papamakarios; Andriy Mnih;
271	Local Vertex Colouring Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we investigate the expressivity of GNNs from the perspective of graph search.	Shouheng Li; Dongwoo Kim; Qing Wang;
272	The Flan Collection: Designing Data and Methods for Effective Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the design decision of publicly available instruction tuning methods, by reproducing and breaking down the development of Flan 2022 (Chung et al., 2022).	Shayne Longpre; Le Hou; Tu Vu; Albert Webson; Hyung Won Chung; Yi Tay; Denny Zhou; Quoc V Le; Barret Zoph; Jason Wei; Adam Roberts;
273	Diffusion Models for Black-Box Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Denoising Diffusion Optimization Models (DDOM), a new inverse approach for offline black-box optimization based on diffusion models.	Siddarth Krishnamoorthy; Satvik Mehul Mashkaria; Aditya Grover;
274	NUNO: A General Framework for Learning Parametric PDEs with Non-Uniform Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, when faced with real-world physical data, which are often highly non-uniformly distributed, it is challenging to use mesh-based techniques such as the FFT. To address this, we introduce the Non-Uniform Neural Operator (NUNO), a comprehensive framework designed for efficient operator learning with non-uniform data.	Songming Liu; Zhongkai Hao; Chengyang Ying; Hang Su; Ze Cheng; Jun Zhu;
275	Unit Scaling: Out-of-the-Box Low-Precision Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present unit scaling, a paradigm for designing deep learning models that simplifies the use of low-precision number formats.	Charlie Blake; Douglas Orr; Carlo Luschi;
276	Polarity Is All You Need to Learn and Transfer Faster Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we investigate the role of weight polarity: development processes initialize NIs with advantageous polarity configurations; as NIs grow and learn, synapse magnitudes update, yet polarities are largely kept unchanged.	Qingyang Wang; Michael Alan Powell; Eric W Bridgeford; Ali Geisa; Joshua T Vogelstein;
277	Training-Free Neural Active Learning with Initialization-Robustness Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce our expected variance with Gaussian processes (EV-GP) criterion for neural active learning, which is theoretically guaranteed to select data points which lead to trained NNs with both (a) good predictive performances and (b) initialization robustness.	Apivich Hemachandra; Zhongxiang Dai; Jasraj Singh; See-Kiong Ng; Bryan Kian Hsiang Low;
278	Generative Decoding of Visual Stimuli Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by that fact, we introduce a novel neural network architecture for the problem of neural decoding.	Eleni Miliotou; Panagiotis Kyriakis; Jason D Hinman; Andrei Irimia; Paul Bogdan;
279	Forget Unlearning: Towards True Data-Deletion in Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these, we propose a sound deletion guarantee and show that ensuring the privacy of existing records is necessary for the privacy of deleted records.	Rishav Chourasia; Neil Shah;
280	Learning Rate Schedules in The Presence of Distribution Shift Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For general convex loss functions, we propose new learning rate schedules that are robust to distribution shift, and give upper and lower bounds for the regret that only differ by constants.	Matthew Fahrbach; Adel Javanmard; Vahab Mirrokni; Pratik Worah;
281	Extending Conformal Prediction to Hidden Markov Models with Exact Validity Via De Finetti’s Theorem for Markov Chains Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We generalize conformal prediction to the Hidden Markov Model (HMM) framework where the assumption of exchangeability is not valid.	Buddhika Nettasinghe; Samrat Chatterjee; Ramakrishna Tipireddy; Mahantesh M Halappanavar;
282	Gradient Descent Converges Linearly for Logistic Regression on Separable Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that running gradient descent with variable learning rate guarantees loss $f(x) ≤ 1.1 \cdot f(x^*)+\epsilon$ for the logistic regression objective, where the error $\epsilon$ decays exponentially with the number of iterations and polynomially with the magnitude of the entries of an arbitrary fixed solution $x$.	Kyriakos Axiotis; Maxim Sviridenko;
283	Approximate Stein Classes for Truncated Density Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose approximate Stein classes, which in turn leads to a relaxed Stein identity for truncated density estimation.	Daniel James Williams; Song Liu;
284	Pareto Manifold Learning: Tackling Multiple Tasks Via Ensembles of Single-task Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we conjecture that the Pareto Front admits a linear parameterization in parameter space, which leads us to propose Pareto Manifold Learning, an ensembling method in weight space.	Nikolaos Dimitriadis; Pascal Frossard; François Fleuret;
285	Mitigating Propagation Failures in Physics-informed Neural Networks Using Retain-Resample-Release (R3) Sampling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we provide a novel perspective of failure modes of PINNs by hypothesizing that training PINNs relies on successful propagation of solution from initial and/or boundary condition points to interior points.	Arka Daw; Jie Bu; Sifan Wang; Paris Perdikaris; Anuj Karpatne;
286	DualHSIC: HSIC-Bottleneck and Alignment for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By appropriately leveraging inter-task relationships, we propose a novel CL method, named DualHSIC, to boost the performance of existing rehearsal-based methods in a simple yet effective way.	Zifeng Wang; Zheng Zhan; Yifan Gong; Yucai Shao; Stratis Ioannidis; Yanzhi Wang; Jennifer Dy;
287	Certifying Ensembles: A General Certification Theory with S-Lipschitzness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we generalise Lipschitz continuity by introducing S-Lipschitz classifiers, which we use to analyse the theoretical robustness of ensembles.	Aleksandar Petrov; Francisco Eiras; Amartya Sanyal; Philip Torr; Adel Bibi;
288	Regions of Reliability in The Evaluation of Multivariate Probabilistic Forecasts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we provide the first systematic finite-sample study of proper scoring rules for time series forecasting evaluation.	Étienne Marcotte; Valentina Zantedeschi; Alexandre Drouin; Nicolas Chapados;
289	AbODE: Ab Initio Antibody Design Using Conjoined ODEs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, this co-design of the amino acid sequence and the 3D structure subsumes and accentuates, some central challenges from multiple tasks including protein folding (sequence to structure), inverse folding (structure to sequence), and docking (binding). We strive to surmount these challenges with a new generative model AbODE that extends graph PDEs to accommodate both contextual information and external interactions.	Yogesh Verma; Markus Heinonen; Vikas Garg;
290	TGRL: An Algorithm for Teacher Guided Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Teacher Guided Reinforcement Learning (TGRL), a principled approach to dynamically balance following the teacher’s guidance and leveraging RL.	Idan Shenfeld; Zhang-Wei Hong; Aviv Tamar; Pulkit Agrawal;
291	Learning-augmented Private Algorithms for Multiple Quantile Release Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: When applying differential privacy to sensitive data, we can often improve performance using external information such as other sensitive data, public data, or human priors. We propose to use the learning-augmented algorithms (or algorithms with predictions) framework—previously applied largely to improve time complexity or competitive ratios—as a powerful way of designing and analyzing privacy-preserving methods that can take advantage of such external information to improve utility.	Mikhail Khodak; Kareem Amin; Travis Dick; Sergei Vassilvitskii;
292	Adaptively Weighted Data Augmentation Consistency Regularization for Robust Optimization Under Concept Shift Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To exploit data features of label-sparse samples more efficiently, we propose an adaptively weighted online optimization algorithm — AdaWAC — to incorporate data augmentation consistency regularization in sample reweighting.	Yijun Dong; Yuege Xie; Rachel Ward;
293	Spatial Implicit Neural Representations for Global-Scale Species Mapping Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we use Spatial Implicit Neural Representations (SINRs) to jointly estimate the geographical range of 47k species simultaneously.	Elijah Cole; Grant Van Horn; Christian Lange; Alexander Shepard; Patrick Leary; Pietro Perona; Scott Loarie; Oisin Mac Aodha;
294	Adaptive Coordination in Social Embodied Rearrangement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Prior ZSC approaches struggle to generalize in our complex and visually rich setting, and on further analysis, we find that they fail to generate diverse coordination behaviors at training time. To counter this, we propose Behavior Diversity Play (BDP), a novel ZSC approach that encourages diversity through a discriminability objective.	Andrew Szot; Unnat Jain; Dhruv Batra; Zsolt Kira; Ruta Desai; Akshara Rai;
295	OMS-DPM: Optimizing The Model Schedule for Diffusion Probabilistic Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we reveal an overlooked dimension—model schedule—for optimizing the trade-off between generation quality and speed.	Enshu Liu; Xuefei Ning; Zinan Lin; Huazhong Yang; Yu Wang;
296	Prefer to Classify: Improving Text Classifiers Via Auxiliary Preference Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a novel multi-task learning framework, called prefer-to-classify (P2C), which can enjoy the cooperative effect of learning both the given classification task and the auxiliary preferences.	Jaehyung Kim; Jinwoo Shin; Dongyeop Kang;
297	On Preemption and Learning in Stochastic Scheduling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study single-machine scheduling of jobs, each belonging to a job type that determines its duration distribution.	Nadav Merlis; Hugo Richard; Flore Sentenac; Corentin Odic; Mathieu Molina; Vianney Perchet;
298	A Modern Look at The Relationship Between Sharpness and Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: But does it really capture generalization in modern practical settings? We comprehensively explore this question in a detailed study of various definitions of adaptive sharpness in settings ranging from training from scratch on ImageNet and CIFAR-10 to fine-tuning CLIP on ImageNet and BERT on MNLI.	Maksym Andriushchenko; Francesco Croce; Maximilian Müller; Matthias Hein; Nicolas Flammarion;
299	Linear CNNs Discover The Statistical Structure of The Dataset Using Only The Most Dominant Frequencies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We here present a stepping stone towards a deeper understanding of convolutional neural networks (CNNs) in the form of a theory of learning in linear CNNs.	Hannah Pinson; Joeri Lenaerts; Vincent Ginis;
300	Properties of The Mallows Model Depending on The Number of Alternatives: A Warning for An Experimentalist Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We empirically and theoretically analyze how the properties of rankings sampled from the Mallows model change when increasing the number of alternatives.	Niclas Boehmer; Piotr Faliszewski; Sonja Kraiczy;
301	Improved Analysis of Score-based Generative Modeling: User-Friendly Bounds Under Minimal Smoothness Assumptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We give an improved theoretical analysis of score-based generative modeling.	Hongrui Chen; Holden Lee; Jianfeng Lu;
302	Extending Kernel PCA Through Dualization: Sparsity, Robustness and Fast Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The goal of this paper is to revisit Kernel Principal Component Analysis (KPCA) through dualization of a difference of convex functions.	Francesco Tonin; Alex Lambert; Panagiotis Patrinos; Johan Suykens;
303	Horizon-free Learning for Markov Decision Processes and Games: Stochastically Bounded Rewards and Improved Bounds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We establish a novel generic algorithm that achieves no-horizon dependence in terms of sample complexity for both Markov Decision Processes (MDP) and Games, via reduction to a good-conditioned auxiliary Markovian environment, in which only “important” state-action pairs are preserved.	Shengshi Li; Lin Yang;
304	Accelerated Primal-Dual Methods for Convex-Strongly-Concave Saddle Point Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate a primal-dual (PD) method for the saddle point problem (SPP) that uses a linear approximation of the primal function instead of the standard proximal step, resulting in a linearized PD (LPD) method.	Mohammad Khalafi; Digvijay Boob;
305	Are Random Decompositions All We Need in High Dimensional Bayesian Optimisation? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We find that data-driven learners of decompositions can be easily misled towards local decompositions that do not hold globally across the search space. Then, we formally show that a random tree-based decomposition sampler exhibits favourable theoretical guarantees that effectively trade off maximal information gain and functional mismatch between the actual black-box and its surrogate as provided by the decomposition.	Juliusz Krzysztof Ziomek; Haitham Bou Ammar;
306	Individually Fair Learning with One-Sided Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider an online learning problem with one-sided feedback, in which the learner is able to observe the true label only for positively predicted instances.	Yahav Bechavod; Aaron Roth;
307	User-defined Event Sampling and Uncertainty Quantification in Diffusion Models for Physical Dynamical Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop a probabilistic approximation scheme for the conditional score function which provably converges to the true distribution as the noise level decreases.	Marc Anton Finzi; Anudhyan Boral; Andrew Gordon Wilson; Fei Sha; Leonardo Zepeda-Nunez;
308	Exphormer: Sparse Transformers for Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Exphormer, a framework for building powerful and scalable graph transformers.	Hamed Shirzad; Ameya Velingker; Balaji Venkatachalam; Danica J. Sutherland; Ali Kemal Sinop;
309	Supervised Metric Learning to Rank for Retrieval Via Contextual Similarity Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Many metric learning loss functions focus on learning a correct ranking of training samples, but strongly overfit semantically inconsistent labels and require a large amount of data. To address these shortcomings, we propose a new metric learning method, called contextual loss, which optimizes contextual similarity in addition to cosine similarity.	Christopher Liao; Theodoros Tsiligkaridis; Brian Kulis;
310	Optimal Stochastic Non-smooth Non-convex Optimization Through Online-to-Non-convex Conversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present new algorithms for optimizing non-smooth, non-convex stochastic objectives based on a novel analysis technique.	Ashok Cutkosky; Harsh Mehta; Francesco Orabona;
311	Federated Heavy Hitter Recovery Under Linear Sketching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose efficient algorithms based on local subsampling and invertible bloom look-up tables (IBLTs).	Adria Gascon; Peter Kairouz; Ziteng Sun; Ananda Theertha Suresh;
312	Coupled Variational Autoencoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the Coupled Variational Auto-Encoder (C-VAE), which formulates the VAE problem as one of Optimal Transport (OT) between the prior and data distributions.	Xiaoran Hao; Patrick Shafto;
313	Conditionally Strongly Log-Concave Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: There is a growing gap between the impressive results of deep image generative models and classical algorithms that offer theoretical guarantees. The former suffer from mode collapse or memorization issues, limiting their application to scientific data. The latter require restrictive assumptions such as log-concavity to escape the curse of dimensionality. We partially bridge this gap by introducing conditionally strongly log-concave (CSLC) models, which factorize the data distribution into a product of conditional probability distributions that are strongly log-concave.	Florentin Guth; Etienne Lempereur; Joan Bruna; Stéphane Mallat;
314	BNN-DP: Robustness Certification of Bayesian Neural Networks Via Dynamic Programming Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce BNN-DP, an efficient algorithmic framework for analysis of adversarial robustness of Bayesian Neural Networks (BNNs).	Steven Adams; Andrea Patane; Morteza Lahijanian; Luca Laurenti;
315	Nonlinear Causal Discovery with Latent Confounders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a nonlinear causal model involving hidden confounders.	David Kaltenpoth; Jilles Vreeken;
316	Toward Efficient Gradient-Based Value Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To resolve the adverse effect of poor conditioning of MSBE on gradient based methods, we propose a low complexity batch-free proximal method that approximately follows the Gauss-Newton direction and is asymptotically robust to parameterization.	Arsalan Sharifnassab; Richard S. Sutton;
317	RACE: Improve Multi-Agent Reinforcement Learning with Representation Asymmetry and Collaborative Evolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a hybrid framework, Representation Asymmetry and Collaboration Evolution (RACE), which combines EA and MARL for efficient collaboration.	Pengyi Li; Jianye HAO; Hongyao Tang; YAN ZHENG; Xian Fu;
318	Revisiting Data-Free Knowledge Distillation with Poisoned Teachers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we make the first effort to uncover the security risk of data-free KD w.r.t. untrusted pre-trained models.	Junyuan Hong; Yi Zeng; Shuyang Yu; Lingjuan Lyu; Ruoxi Jia; Jiayu Zhou;
319	Fast As CHITA: Neural Network Pruning with Combinatorial Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel optimization-based pruning framework that considers the combined effect of pruning (and updating) multiple weights subject to a sparsity constraint.	Riade Benbaki; Wenyu Chen; Xiang Meng; Hussein Hazimeh; Natalia Ponomareva; Zhe Zhao; Rahul Mazumder;
320	Provably Learning Diverse Features in Multi-View Data with Midpoint Mixup Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In recent years, Mixup has become a standard primitive used in the training of state-of-the-art image classification models due to its demonstrated benefits over empirical risk minimization with regards to generalization and robustness. In this work, we try to explain some of this success from a feature learning perspective.	Muthu Chidambaram; Xiang Wang; Chenwei Wu; Rong Ge;
321	Dual Propagation: Accelerating Contrastive Hebbian Learning with Dyadic Neurons Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by lifted neural networks and compartmental neuron models we propose a simple energy based compartmental neuron model, termed dual propagation, in which each neuron is a dyad with two intrinsic states.	Rasmus Høier; D. Staudt; Christopher Zach;
322	Identifying Useful Learnwares for Heterogeneous Label Spaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we make an attempt to improve the effectiveness of RKME specification for heterogeneous label spaces, where the learnware market does not contain a model that has the same label space as the user’s task, by considering a class-specific model specification explicitly, along with a class-wise learnware identification method.	Lan-Zhe Guo; Zhi Zhou; Yu-Feng Li; Zhi-Hua Zhou;
323	Linear Time GPs for Inferring Latent Trajectories from Neural Spike Trains Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose cvHM, a general inference framework for latent GP models leveraging Hida-Matérn kernels and conjugate computation variational inference (CVI).	Matthew Dowling; Yuan Zhao; Il Memming Park;
324	Fair Neighbor Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a framework of fair neighbor embedding, the Fair Neighbor Retrieval Visualizer, which formulates fair nonlinear dimensionality reduction as an information retrieval task whose performance and fairness are quantified by information retrieval criteria.	Jaakko Peltonen; Wen Xu; Timo Nummenmaa; Jyrki Nummenmaa;
325	Blackout Diffusion: Generative Diffusion Models in Discrete-State Spaces Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we develop a theoretical formulation for arbitrary discrete-state Markov processes in the forward diffusion process using exact (as opposed to variational) analysis.	Javier E. Santos; Zachary R Fox; Nicholas Lubbers; Yen Ting Lin;
326	Few-Sample Feature Selection Via Feature Manifold Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a new method for few-sample supervised feature selection (FS).	David Cohen; Tal Shnitzer; Yuval Kluger; Ronen Talmon;
327	Generalized Reductions: Making Any Hierarchical Clustering Fair and Balanced with Low Cost Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work vastly improves the previous $O(n^{5/6}poly\log(n))$ fair approximation for cost to a near polylogarithmic $O(n^\delta poly\log(n))$ fair approximation for any constant $\delta\in(0,1)$.	Marina Knittel; Max Springer; John P Dickerson; MohammadTaghi Hajiaghayi;
328	Distribution Free Prediction Sets for Node Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show through experiments on standard benchmark datasets using popular GNN models that our approach provides tighter and better calibrated prediction sets than a naive application of conformal prediction.	Jase Clarkson;
329	Bandit Online Linear Optimization with Hints and Queries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study variants of the online linear optimization (OLO) problem with bandit feedback, where the algorithm has access to external information about the unknown cost vector.	Aditya Bhaskara; Ashok Cutkosky; Ravi Kumar; Manish Purohit;
330	Federated Online and Bandit Convex Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We aim to minimize the average regret on $M$ machines working in parallel over $T$ rounds with $R$ intermittent communications.	Kumar Kshitij Patel; Lingxiao Wang; Aadirupa Saha; Nathan Srebro;
331	Test-time Adaptation with Slot-Centric Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent slot-centric generative models attempt to decompose scenes into entities in a self-supervised manner by reconstructing pixels. Drawing upon these two lines of work, we propose Slot-TTA, a semi-supervised slot-centric scene decomposition model that at test time is adapted per scene through gradient descent on reconstruction or cross-view synthesis objectives.	Mihir Prabhudesai; Anirudh Goyal; Sujoy Paul; Sjoerd van Steenkiste; Mehdi S. M. Sajjadi; Gaurav Aggarwal; Thomas Kipf; Deepak Pathak; Katerina Fragkiadaki;
332	Representation-Driven Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a representation-driven framework for reinforcement learning.	Ofir Nabati; Guy Tennenholtz; Shie Mannor;
333	Improving L1-Certified Robustness Via Randomized Smoothing By Leveraging Box Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, current techniques are not able to utilize the fact that any adversarial example has to lie in the image space, that is $[0,1]^d$; otherwise, one can trivially detect it. To address this suboptimality, we derive new certification formulae which lead to significant improvements in the certified $\ell_1$-robustness without the need of adapting the classifiers or change of smoothing distributions.	Vaclav Voracek; Matthias Hein;
334	LIV: Language-Image Representations and Rewards for Robotic Control Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Language-Image Value learning (LIV), a unified objective for vision-language representation and reward learning from action-free videos with text annotations.	Yecheng Jason Ma; Vikash Kumar; Amy Zhang; Osbert Bastani; Dinesh Jayaraman;
335	MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present MultiDiffusion, a unified framework that enables versatile and controllable image generation, using a pre-trained text-to-image diffusion model, without any further training or finetuning.	Omer Bar-Tal; Lior Yariv; Yaron Lipman; Tali Dekel;
336	On The Relationship Between Explanation and Prediction: A Causal View Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More specifically, we study the relationship between E and Y by measuring the treatment effect when intervening on their causal ancestors, i.e., on hyperparameters and inputs used to generate saliency-based Es or Ys.	Amir-Hossein Karimi; Krikamol Muandet; Simon Kornblith; Bernhard Schölkopf; Been Kim;
337	RGE: A Repulsive Graph Rectification for Node Classification Via Influence Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In the same vein, we observe that edge groups connecting to the same train node exhibit significant differences in their influences, hence no matter how negative each is, removing them at once may have a rather negative effect as a group. Based on this motivation, we propose a new edge-removing strategy, Repulsive edge Group Elimination (RGE), that preferentially removes edges with no interference in groups.	Jaeyun Song; SungYub Kim; Eunho Yang;
338	Provable Multi-instance Deep AUC Maximization with Stochastic Pooling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We address a neglected yet non-negligible computational challenge of MIL in the context of DAM, i.e., bag size is too large to be loaded into GPU memory for backpropagation, which is required by the standard pooling methods of MIL. To tackle this challenge, we propose variance-reduced stochastic pooling methods in the spirit of stochastic optimization by formulating the loss function over the pooled prediction as a multi-level compositional function.	Dixian Zhu; Bokun Wang; Zhi Chen; Yaxing Wang; Milan Sonka; Xiaodong Wu; Tianbao Yang;
339	Generative Pretraining for Black-Box Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose BONET, a generative framework for pretraining a novel model-based optimizer using offline datasets.	Satvik Mehul Mashkaria; Siddarth Krishnamoorthy; Aditya Grover;
340	Representer Point Selection for Explaining Regularized High-dimensional Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel class of sample-based explanations we term high-dimensional representers, that can be used to explain the predictions of a regularized high-dimensional model in terms of importance weights for each of the training samples.	Che-Ping Tsai; Jiong Zhang; Hsiang-Fu Yu; Eli Chien; Cho-Jui Hsieh; Pradeep Kumar Ravikumar;
341	Efficient Displacement Convex Optimization with Particle Gradient Descent Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Particle gradient descent, which uses particles to represent a probability measure and performs gradient descent on particles in parallel, is widely used to optimize functions of probability measures. This paper considers particle gradient descent with a finite number of particles and establishes its theoretical guarantees to optimize functions that are displacement convex in measures.	Hadi Daneshmand; Jason D. Lee; Chi Jin;
342	Generalized-Smooth Nonconvex Optimization Is As Efficient As Smooth Nonconvex Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a notion of $\alpha$-symmetric generalized-smoothness that substantially extends the existing notions and covers many important functions such as high-order polynomials and exponential functions.	Ziyi Chen; Yi Zhou; Yingbin Liang; Zhaosong Lu;
343	Improved Techniques for Maximum Likelihood Estimation for Diffusion ODEs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose several improved techniques for maximum likelihood estimation for diffusion ODEs, including both training and evaluation perspectives.	Kaiwen Zheng; Cheng Lu; Jianfei Chen; Jun Zhu;
344	Compressing Tabular Data Via Latent Variable Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Data used for analytics and machine learning often take the form of tables with categorical entries. We introduce a family of lossless compression algorithms for such data that proceed in four steps: (i) Estimate latent variables associated to rows and columns; (ii) Partition the table in blocks according to the row/column latents; (iii) Apply a sequential (e.g. Lempel-Ziv) coder to each of the blocks; (iv) Append a compressed encoding of the latents.	Andrea Montanari; Eric Weiner;
345	A Conditional Normalizing Flow for Accelerated Multi-Coil MR Imaging Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We instead focus on sampling from the posterior distribution, which provides more comprehensive information for downstream inference tasks.	Jeffrey Wen; Rizwan Ahmad; Philip Schniter;
346	Learnability and Algorithm for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies the challenging continual learning (CL) setting of Class Incremental Learning (CIL).	Gyuhak Kim; Changnan Xiao; Tatsuya Konishi; Bing Liu;
347	Online Local Differential Private Quantile Inference Via Self-normalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on binary inquiries, we developed an algorithm to estimate population quantiles under Local Differential Privacy (LDP).	Yi Liu; Qirui Hu; Lei Ding; Linglong Kong;
348	Primal and Dual Analysis of Entropic Fictitious Play for Finite-sum Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide a concise primal-dual analysis of EFP in the setting where the learning problem exhibits a finite-sum structure.	Atsushi Nitanda; Kazusato Oko; Denny Wu; Nobuhito Takenouchi; Taiji Suzuki;
349	Invariant Slot Attention: Object Discovery with Slot-Centric Reference Frames Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a simple yet highly effective method for incorporating spatial symmetries via slot-centric reference frames.	Ondrej Biza; Sjoerd van Steenkiste; Mehdi S. M. Sajjadi; Gamaleldin Fathy Elsayed; Aravindh Mahendran; Thomas Kipf;
350	On Provable Copyright Protection for Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: There is a growing concern that learned conditional generative models may output samples that are substantially similar to some copyrighted data $C$ that was in their training set. We give a formal definition of near access-freeness (NAF) and prove bounds on the probability that a model satisfying this definition outputs a sample similar to $C$, even if $C$ is included in its training set.	Nikhil Vyas; Sham M. Kakade; Boaz Barak;
351	Differential Privacy, Linguistic Fairness, and Training Data Influence: Impossibility and Possibility Theorems for Multilingual Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that multilingual compression and linguistic fairness are compatible with differential privacy, but that differential privacy is at odds with training data influence sparsity, an objective for transparency.	Phillip Rust; Anders Søgaard;
352	Sequential Strategic Screening Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In other words, we introduce the combination of strategic classificationwith screening processes.	Lee Cohen; Saeed Sharifi -Malvajerdi; Kevin Stangl; Ali Vakilian; Juba Ziani;
353	Investigating The Role of Model-Based Learning in Exploration and Transfer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate transfer learning in the context of model-based agents.	Jacob C Walker; Eszter Vértes; Yazhe Li; Gabriel Dulac-Arnold; Ankesh Anand; Theophane Weber; Jessica B Hamrick;
354	Unveiling The Latent Space Geometry of Push-Forward Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building on recent developments in geometric measure theory, we prove a sufficient condition for optimality in the case where the dimension of the latent space is larger than the number of modes.	Thibaut Issenhuth; Ugo Tanielian; Jeremie Mary; David Picard;
355	Identification of The Adversary from A Single Adversarial Example Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, instead of enhancing the robustness, we take the investigator’s perspective and propose a new framework to trace the first compromised model copy in a forensic investigation manner.	Minhao Cheng; Rui Min; Haochen Sun; Pin-Yu Chen;
356	K-SHAP: Policy Clustering Algorithm for Anonymous Multi-Agent State-Action Pairs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Policy Clustering algorithm, called K-SHAP, that learns to group anonymous state-action pairs according to the agent policies.	Andrea Coletta; Svitlana Vyetrenko; Tucker Balch;
357	The Acquisition of Physical Knowledge in Generative Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We outline an approach that allows us to examine two distinct hypotheses of human development — stochastic optimization and complexity increase.	Luca M. Schulze Buschoff; Eric Schulz; Marcel Binz;
358	Towards Understanding Generalization of Macro-AUC in Multi-label Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To establish it, technically, we propose a new (and more general) McDiarmid-type concentration inequality, which may be of independent interest.	Guoqiang Wu; Chongxuan Li; Yilong Yin;
359	The Persistent Laplacian for Data Science: Evaluating Higher-Order Persistent Spectral Representations of Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we provide the first investigation into the efficacy of the persistence Laplacian as an embedding of data for downstream classification and regression tasks.	Thomas Davies; Zhengchao Wan; Ruben Sanchez-Garcia;
360	DUET: 2D Structured and Approximately Equivariant Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose 2D strUctured and EquivarianT representations (coined DUET), which are 2d representations organized in a matrix structure, and equivariant with respect to transformations acting on the input data.	Xavier Suau; Federico Danieli; T. Anderson Keller; Arno Blaas; Chen Huang; Jason Ramapuram; Dan Busbridge; Luca Zappella;
361	Forward-Backward Gaussian Variational Inference Via JKO in The Bures-Wasserstein Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop the (Stochastic) Forward-Backward Gaussian Variational Inference (FB-GVI) algorithm to solve Gaussian VI.	Michael Ziyang Diao; Krishna Balasubramanian; Sinho Chewi; Adil Salim;
362	Optimal Convergence Rates for Agnostic Nyström Kernel Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, there lacks a unified analysis for Nyström approximation, and the asymptotical minimax optimality for Nyström methods usually require a strict condition, assuming that the target regression lies exactly in the hypothesis space. In this paper, to tackle these problems, we provide a refined generalization analysis for Nyström approximation in the agnostic setting, where the target regression may be out of the hypothesis space.	Jian Li; Yong Liu; Weiping Wang;
363	SeMAIL: Eliminating Distractors in Visual Imitation Via Separated Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Following the convention of MBIL research, existing algorithms are highly deceptive by task-irrelevant information, especially moving distractors in videos. To tackle this problem, we propose a new algorithm – named Separated Model-based Adversarial Imitation Learning (SeMAIL) – decoupling the environment dynamics into two parts by task-relevant dependency, which is determined by agent actions, and training separately.	Shenghua Wan; Yucen Wang; Minghao Shao; Ruying Chen; De-Chuan Zhan;
364	Combinatorial Neural Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The score of an arm is an unknown function of the arm’s feature. Approximating this unknown score function with deep neural networks, we propose algorithms: Combinatorial Neural UCB ($\texttt{CN-UCB}$) and Combinatorial Neural Thompson Sampling ($\texttt{CN-TS}$).	Taehyun Hwang; Kyuwook Chai; Min-hwan Oh;
365	Variational Autoencoding Neural Operators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we present Variational Autoencoding Neural Operators (VANO), a general strategy for making a large class of operator learning architectures act as variational autoencoders.	Jacob H Seidman; Georgios Kissas; George J. Pappas; Paris Perdikaris;
366	Who Needs to Know? Minimal Knowledge for Optimal Coordination Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that there is a well-defined dichotomy between strategically relevant and irrelevant information.	Niklas Lauffer; Ameesh Shah; Micah Carroll; Michael D Dennis; Stuart Russell;
367	Efficient Parametric Approximations of Neural Network Function Space Distance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a Linearized Activation Function TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks.	Nikita Dhawan; Sicong Huang; Juhan Bae; Roger Baker Grosse;
368	Predicting Ordinary Differential Equations with Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a transformer-based sequence-to-sequence model that recovers scalar ordinary differential equations (ODEs) in symbolic form from irregularly sampled and noisy observations of a single solution trajectory.	Sören Becker; Michal Klein; Alexander Neitz; Giambattista Parascandolo; Niki Kilbertus;
369	A Unifying Framework to The Analysis of Interaction Methods Using Synergy Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents a unifying framework for game-theory-inspired attribution and $k^\text{th}$-order interaction methods.	Daniel Lundstrom; Meisam Razaviyayn;
370	Stein Variational Goal Generation for Adaptive Exploration in Multi-Goal Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this context, a curriculum over goals helps agents learn by adapting training tasks to their current capabilities. In this work, we propose Stein Variational Goal Generation (SVGG), which samples goals of intermediate difficulty for the agent, by leveraging a learned predictive model of its goal reaching capabilities.	Nicolas Castanet; Olivier Sigaud; sylvain lamprier;
371	Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein Loss Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We consider a deep matrix factorization model of covariance matrices trained with the Bures-Wasserstein distance.	Pierre Bréchet; Katerina Papagiannouli; Jing An; Guido Montufar;
372	Shortest Edit Path Crossover: A Theory-driven Solution to The Permutation Problem in Evolutionary Neural Architecture Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents the first theoretical analysis of the behaviors of mutation, crossover and RL in black-box NAS, and proposes a new crossover operator based on the shortest edit path (SEP) in graph space.	Xin Qiu; Risto Miikkulainen;
373	Fisher Information Embedding for Node and Graph Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel attention-based node embedding framework for graphs.	Dexiong Chen; Paolo Pellizzoni; Karsten Borgwardt;
374	Efficient Latency-Aware CNN Depth Compression Via Two-Stage Dynamic Programming Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel depth compression algorithm which targets general convolution operations.We propose a subset selection problem that replaces inefficient activation layers with identity functions and optimally merges consecutive convolution operations into shallow equivalent convolution operations for efficient end-to-end inference latency.	Jinuk Kim; Yeonwoo Jeong; Deokjae Lee; Hyun Oh Song;
375	Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Git-Theta’s design and features and include an example use-case of Git-Theta where a pre-trained model is continually adapted and modified.	Nikhil Kandpal; Brian Lester; Mohammed Muqeeth; Anisha Mascarenhas; Monty Evans; Vishal Baskaran; Tenghao Huang; Haokun Liu; Colin Raffel;
376	Finite-Sample Analysis of Learning High-Dimensional Single ReLU Neuron Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper considers the problem of learning single ReLU neuron with squared loss (a.k.a., ReLU regression) in the overparameterized regime, where the input dimension can exceed the number of samples. We analyze a Perceptron-type algorithm called GLM-tron [Kakade et al. 2011], and provide its dimension-free risk upper bounds for high-dimensional ReLU regression in both well-specified and misspecified settings.	Jingfeng Wu; Difan Zou; Zixiang Chen; Vladimir Braverman; Quanquan Gu; Sham M. Kakade;
377	Explaining Reinforcement Learning with Shapley Values Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a theoretical analysis of explaining reinforcement learning using Shapley values, following a principled approach from game theory for identifying the contribution of individual players to the outcome of a cooperative game.	Daniel Beechey; Thomas M. S. Smith; Özgür Şimşek;
378	Naive Imputation Implicitly Regularizes High-dimensional Linear Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the impact of imputation in a high-dimensional linear model with MCAR missing data.	Alexis Ayme; Claire Boyer; Aymeric Dieuleveut; Erwan Scornet;
379	Coin Sampling: Gradient-Based Bayesian Inference Without Learning Rates Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a suite of new particle-based methods for scalable Bayesian inference based on coin betting, which are entirely learning-rate free.	Louis Sharrock; Christopher Nemeth;
380	Distribution Free Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a Distribution Free Domain Generalization (DFDG) procedure for classification by conducting standardization to avoid the dominance of a few domains in the training process.	Peifeng Tong; Wu Su; He Li; Jialin Ding; Haoxiang Zhan; Song Xi Chen;
381	Simple Hardware-Efficient Long Convolutions for Sequence Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study whether a simple alternative can match SSMs in performance and efficiency: directly learning long convolutions over the sequence.	Daniel Y Fu; Elliot L Epstein; Eric Nguyen; Armin W Thomas; Michael Zhang; Tri Dao; Atri Rudra; Christopher Re;
382	Neural Signature Kernels As Infinite-width-depth-limits of Controlled ResNets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by the paradigm of reservoir computing, we consider randomly initialized controlled ResNets defined as Euler-discretizations of neural controlled differential equations (Neural CDEs), a unified architecture which enconpasses both RNNs and ResNets.	Nicola Muca Cirone; Maud Lemercier; Cristopher Salvi;
383	Feature Programming for Multivariate Time Series Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the concept of programmable feature engineering for time series modeling and propose a feature programming framework.	Alex Daniel Reneau; Jerry Yao-Chieh Hu; Ammar Gilani; Han Liu;
384	Theory on Forgetting and Generalization of Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, there is a lack of understanding on what factors are important and how they affect catastrophic forgetting and generalization performance. To fill this gap, our theoretical analysis, under overparameterized linear models, provides the first-known explicit form of the expected forgetting and generalization error for a general CL setup with an arbitrary number of tasks.	Sen Lin; Peizhong Ju; Yingbin Liang; Ness Shroff;
385	PAL: Program-aided Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present Program-Aided Language models (PAL): a novel approach that uses the LLM to read natural language problems and generate programs as the intermediate reasoning steps, but offloads the solution step to a runtime such as a Python interpreter.	Luyu Gao; Aman Madaan; Shuyan Zhou; Uri Alon; Pengfei Liu; Yiming Yang; Jamie Callan; Graham Neubig;
386	Data-Efficient Contrastive Self-supervised Learning: Most Beneficial Examples for Supervised Learning Contribute The Least Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nevertheless, quantifying the value of examples for SSL has remained an open question. In this work, we address this problem for the first time, by proving that examples that contribute the most to contrastive SSL are those that have the most similar augmentations to other examples, in expectation.	Siddharth Joshi; Baharan Mirzasoleiman;
387	A Critical View of Vision-Based Long-Term Dynamics Prediction Under Environment Misalignment Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, despite its success, the model’s capability can be compromised under conditions of environment misalignment. In this paper, we investigate two challenging conditions for environment misalignment: Cross-Domain and Cross-Context by proposing four datasets that are designed for these challenges: SimB-Border, SimB-Split, BlenB-Border, and BlenB-Split.	Hanchen Xie; Jiageng Zhu; Mahyar Khayatkhoei; Jiazhi Li; Mohamed E. Hussein; Wael AbdAlmageed;
388	Learning Lightweight Object Detectors Via Multi-Teacher Progressive Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a simple yet surprisingly effective sequential approach to knowledge distillation that progressively transfers the knowledge of a set of teacher detectors to a given lightweight student.	Shengcao Cao; Mengtian Li; James Hays; Deva Ramanan; Yu-Xiong Wang; Liangyan Gui;
389	COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper explores an efficient method for compressing vision transformers to enrich the toolset for obtaining compact attention-based vision models.	Jinqi Xiao; Miao Yin; Yu Gong; Xiao Zang; Jian Ren; Bo Yuan;
390	Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, an agent has access to two sets of trajectories: labelled trajectories containing state, action and reward triplets at every timestep, along with unlabelled trajectories that contain only state and reward information. For this setting, we develop and study a simple meta-algorithmic pipeline that learns an inverse dynamics model on the labelled data to obtain proxy-labels for the unlabelled data, followed by the use of any offline RL algorithm on the true and proxy-labelled trajectories.	Qinqing Zheng; Mikael Henaff; Brandon Amos; Aditya Grover;
391	Emergence of Adaptive Circadian Rhythms in Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the emergence of circadian-like rhythms in deep reinforcement learning agents.	aqeel labash; Florian Stelzer; Daniel Majoral; Raul Vicente;
392	Robust Situational Reinforcement Learning in Face of Context Disturbances Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods on robust RL aim at learning robust policies against the deviations of the entire system dynamics. To tackle this problem, this paper proposes the framework of robust situational Markov decision process (RS-MDP) which captures the possible deviations of context transitions explicitly.	Jinpeng Zhang; Yufeng Zheng; Chuheng Zhang; Li Zhao; Lei Song; Yuan Zhou; Jiang Bian;
393	QAS-Bench: Rethinking Quantum Architecture Search and A Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, beyond a particular domain, we formulate the QAS problem into two basic (and relatively even ideal) tasks: i) arbitrary quantum circuit (QC) regeneration given a target QC; ii) approximating an arbitrary unitary (oracle).Based on these two tasks, we generate a public QAS benchmark including 900 random QCs and 400 random unitary matrices which is still missing in the literature.	Xudong Lu; Kaisen Pan; Ge Yan; Jiaming Shan; Wenjie Wu; Junchi Yan;
394	Half-Hop: A Graph Upsampling Approach for Slowing Down Message Passing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a simple yet general framework for improving learning in message passing neural networks.	Mehdi Azabou; Venkataramana Ganesh; Shantanu Thakoor; Chi-Heng Lin; Lakshmi Sathidevi; Ran Liu; Michal Valko; Petar Veličković; Eva L Dyer;
395	Neuro-Symbolic Continual Learning: Knowledge, Reasoning Shortcuts and Concept Rehearsal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Neuro-Symbolic Continual Learning, where a model has to solve a sequence of neuro-symbolic tasks, that is, it has to map sub-symbolic inputs to high-level concepts and compute predictions by reasoning consistently with prior knowledge.	Emanuele Marconato; Gianpaolo Bontempo; ELISA FICARRA; Simone Calderara; Andrea Passerini; Stefano Teso;
396	Group Equivariant Fourier Neural Operators for Partial Differential Equations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we extend group convolutions to the frequency domain and design Fourier layers that are equivariant to rotations, translations, and reflections by leveraging the equivariance property of the Fourier transform.	Jacob Helwig; Xuan Zhang; Cong Fu; Jerry Kurtin; Stephan Wojtowytsch; Shuiwang Ji;
397	Evaluating Unsupervised Denoising Requires Unsupervised Metrics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose two novel metrics: the unsupervised mean squared error (MSE) and the unsupervised peak signal-to-noise ratio (PSNR), which are computed using only noisy data.	Adria Marcos Morales; Matan Leibovich; Sreyas Mohan; Joshua Lawrence Vincent; Piyush Haluai; Mai Tan; Peter Crozier; Carlos Fernandez-Granda;
398	Rethinking Weak Supervision in Helping Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the empirical evidence showing that semi-supervised labels improve the representations of contrastive learning, it remains unknown if noisy supervised information can be directly used in training instead of after manual denoising. Therefore, to explore the mechanical differences between semi-supervised and noisy-labeled information in helping contrastive learning, we establish a unified theoretical framework of contrastive learning under weak supervision.	Jingyi Cui; Weiran Huang; Yifei Wang; Yisen Wang;
399	Near-Optimal Quantum Coreset Construction Algorithms for Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We give quantum algorithms that find coresets for $k$-clustering in $\mathbb{R}^d$ with $\tilde{O}(\sqrt{nk}d^{3/2})$ query complexity.	Yecheng Xue; Xiaoyu Chen; Tongyang Li; Shaofeng H.-C. Jiang;
400	On Heterogeneous Treatment Effects in Heterogeneous Causal Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We establish the theoretical forms of HCEs and derive their properties at the individual level in both linear and nonlinear models.	Richard A Watson; Hengrui Cai; Xinming An; Samuel McLean; Rui Song;
401	Bilevel Optimization with Coupled Decision-Dependent Distributions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the inclusion of decision-dependent distributions in bilevel optimization.	Songtao Lu;
402	Differentially Private Distributed Bayesian Linear Regression with MCMC Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel Bayesian inference framework for distributed differentially private linear regression.	Barış Alparslan; Sinan Yıldırım; Ilker Birbil;
403	Nearly Optimal Competitive Ratio for Online Allocation Problems with Two-sided Resource Constraints and Finite Requests Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the online allocation problem of maximizing the overall revenue subject to both lower and upper bound constraints.	Qixin Zhang; Wenbing Ye; Zaiyi Chen; Haoyuan Hu; Enhong Chen; Yu Yang;
404	InGram: Inductive Knowledge Graph Embedding Via Relation Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose an INductive knowledge GRAph eMbedding method, InGram, that can generate embeddings of new relations as well as new entities at inference time.	Jaejun Lee; Chanyoung Chung; Joyce Jiyoung Whang;
405	Topological Point Cloud Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present Topological Point Cloud Clustering (TPCC), a new method to cluster points in an arbitrary point cloud based on their contribution to global topological features.	Vincent Peter Grande; Michael T Schaub;
406	Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In practice, however, data often come from several heterogeneous but related sources. Motivated by this gap, this work aims at rigorously understanding offline RL with multiple datasets that are collected from randomly perturbed versions of the target task instead of from itself.	Chengshuai Shi; Wei Xiong; Cong Shen; Jing Yang;
407	Trapdoor Normalization with Irreversible Ownership Verification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a deep model watermark with an irreversible ownership verification scheme: Trapdoor Normalization (TdN), inspired by the trapdoor function in traditional cryptography.	Hanwen Liu; Zhenyu Weng; Yuesheng Zhu; Yadong MU;
408	Margin-based Sampling in High Dimensions: When Being Active Is Less Efficient Than Staying Passive Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While existing works offer different explanations in the low-dimensional regime, this paper shows that the underlying mechanism is entirely different in high dimensions: we prove for logistic regression that PL outperforms margin-based AL even for noiseless data and when using the Bayes optimal decision boundary for sampling.	Alexandru Tifrea; Jacob Clarysse; Fanny Yang;
409	Random Matrix Analysis to Balance Between Supervised and Unsupervised Learning Under The Low Density Separation Assumption Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a theoretical framework to analyze semi-supervised classification under the low density separation assumption in a high-dimensional regime.	Vasilii Feofanov; Malik Tiomoko; Aladin Virmaux;
410	On The Initialization of Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the variance of forward and backward propagation across GNN layers and show that the variance instability of GNN initializations comes from the combined effect of the activation function, hidden dimension, graph structure and message passing.	Jiahang Li; Yakun Song; Xiang song; David Wipf;
411	When Does Privileged Information Explain Away Label Noise? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we investigate the role played by different properties of the PI in explaining away label noise.	Guillermo Ortiz-Jimenez; Mark Collier; Anant Nawalgaria; Alexander D’Amour; Jesse Berent; Rodolphe Jenatton; Effrosyni Kokiopoulou;
412	A Category-theoretical Meta-analysis of Definitions of Disentanglement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose that the concepts of the cartesian and monoidal products should serve as the core of disentanglement.	Yivan Zhang; Masashi Sugiyama;
413	Quantifying The Variability Collapse of Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel metric, named Variability Collapse Index (VCI), to quantify the variability collapse phenomenon in the NC paradigm.	Jing Xu; Haoxiong Liu;
414	Global Optimality for Euclidean CCCP Under Riemannian Convexity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study geodesically convex (g-convex) problems that can be written as a difference of Euclidean convex functions.	Melanie Weber; Suvrit Sra;
415	Reinforcement Learning Can Be More Efficient with Multiple Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, in this work, we study whether directly incorporating multiple alternate reward formulations of the same task in a single agent can lead to faster learning.	Christoph Dann; Yishay Mansour; Mehryar Mohri;
416	Adversarial Learning of Distributional Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we propose an adversarial learning framework for distributional reinforcement learning, which adopts the concept of influence measure from the statistics community.	Yang Sui; Yukun Huang; Hongtu Zhu; Fan Zhou;
417	Robust Explanation for Free or At The Cost of Faithfulness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, explanation methods are shown as vulnerable to adversarial perturbations, implying security concerns in high-stakes domains. In this paper, we investigate when robust explanations are necessary and what they cost.	Zeren Tan; Yang Tian;
418	Jump-Start Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks: a guide-policy, and an exploration-policy.	Ikechukwu Uchendu; Ted Xiao; Yao Lu; Banghua Zhu; Mengyuan Yan; Joséphine Simon; Matthew Bennice; Chuyuan Fu; Cong Ma; Jiantao Jiao; Sergey Levine; Karol Hausman;
419	Recovery Bounds on Class-Based Optimal Transport: A Sum-of-Norms Regularization Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For this purpose, we propose a convex OT program with a sum-of-norms regularization term, which provably recovers the underlying class structure under geometric assumptions.	Arman Rahbar; Ashkan Panahi; Morteza Haghir Chehreghani; Devdatt Dubhashi; Hamid Krim;
420	Multi-Task Off-Policy Learning from Bandit Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To solve the problem, we propose a hierarchical off-policy optimization algorithm HierOPO.	Joey Hong; Branislav Kveton; Manzil Zaheer; Sumeet Katariya; Mohammad Ghavamzadeh;
421	Learning Expressive Priors for Generalization and Uncertainty Estimation in Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel prior learning method for advancing generalization and uncertainty estimation in deep neural networks.	Dominik Schnaus; Jongseok Lee; Daniel Cremers; Rudolph Triebel;
422	Towards Stable and Efficient Adversarial Training Against $l_1$ Bounded Adversarial Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We address the problem of stably and efficiently training a deep neural network robust to adversarial perturbations bounded by an $l_1$ norm.	Yulun Jiang; Chen Liu; Zhichao Huang; Mathieu Salzmann; Sabine Süsstrunk;
423	Universal Morphology Control Via Contextual Modulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing methods utilize graph neural networks or transformers to handle heterogeneous state and action spaces across different morphologies, but pay little attention to the dependency of a robot’s control policy on its morphology context. In this paper, we propose a hierarchical architecture to better model this dependency via contextual modulation, which includes two key submodules: (1) Instead of enforcing hard parameter sharing across robots, we use hypernetworks to generate morphology-dependent control parameters; (2) We propose a fixed attention mechanism that solely depends on the morphology to modulate the interactions between different limbs in a robot.	Zheng Xiong; Jacob Beck; Shimon Whiteson;
424	CocktailSGD: Fine-tuning Foundation Models Over 500Mbps Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose CocktailSGD, a novel communication-efficient training framework that combines three distinct compression techniques — random sparsification, top-K sparsification, and quantization — to achieve much greater compression than each individual technique alone.	Jue WANG; Yucheng Lu; Binhang Yuan; Beidi Chen; Percy Liang; Christopher De Sa; Christopher Re; Ce Zhang;
425	Is Overfitting Necessary for Implicit Video Representation? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose a new paradigm in efficient INR for videos based on the idea of strong lottery ticket (SLT) hypothesis (Zhou et al., 2019), which demonstrates the possibility of finding an accurate subnetwork mask, called supermask, for a randomly initialized classification network without weight training.	Hee Min Choi; Hyoa Kang; Dokwan Oh;
426	Learning The Right Layers A Data-Driven Layer-Aggregation Strategy for Semi-Supervised Learning on Multilayer Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we assume a semi-supervised learning setting, where the class of a small percentage of nodes is initially provided, and we propose a parameter-free Laplacian-regularized model that learns an optimal nonlinear combination of the different layers from the available input labels.	Sara Venturini; Andrea Cristofari; Francesco Rinaldi; Francesco Tudisco;
427	Trading-Off Payments and Accuracy in Online Classification with Paid Stochastic Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce an online learning algorithm whose total cost after $T$ rounds exceeds that of a predictor which knows the productivity of all experts in advance by at most $\mathcal{O}\big(K^2(\ln T)\sqrt{T}\big)$ where $K$ is the number of experts.	Dirk van der Hoeven; Ciara Pike-Burke; Hao Qiu; Nicolò Cesa-Bianchi;
428	Provable Benefit of Mixup for Finding Optimal Decision Boundaries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate how pair-wise data augmentation techniques like Mixup affect the sample complexity of finding optimal decision boundaries in a binary linear classification problem.	Junsoo Oh; Chulhee Yun;
429	Covariate Balancing Using The Integral Probability Metric for Causal Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we consider to use the integral probability metric (IPM), which is a metric between two probability measures, for covariate balancing.	Insung Kong; Yuha Park; Joonhyuk Jung; Kwonsang Lee; Yongdai Kim;
430	Fair and Accurate Decision Making Through Group-Aware Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In some cases, these AI systems can be unfair by exhibiting bias or discrimination against certain social groups, which can have severe consequences in real life. Inspired by one of the most well-known human learning skills called grouping, we address this issue by proposing a novel machine learning (ML) framework where the ML model learns to group a diverse set of problems into distinct subgroups to solve each subgroup using its specific sub-model.	Ramtin Hosseini; Li Zhang; Bhanu Garg; Pengtao Xie;
431	Robust and Scalable Bayesian Online Changepoint Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes an online, provably robust, and scalable Bayesian approach for changepoint detection.	Matias Altamirano; Francois-Xavier Briol; Jeremias Knoblauch;
432	Uncertainty Estimation By Fisher Information-based Evidential Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, for high data uncertainty samples but annotated with the one-hot label, the evidence-learning process for those mislabeled classes is over-penalized and remains hindered. To address this problem, we propose a novel method, Fisher Information-based Evidential Deep Learning ($\mathcal{I}$-EDL).	Danruo DENG; Guangyong Chen; Yang YU; Furui Liu; Pheng-Ann Heng;
433	Network Effects in Performative Prediction Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies the multi-agent performative prediction (Multi-PP) games over multiplex networks.	Xiaolu Wang; Chung-Yiu Yau; Hoi To Wai;
434	MyoDex: A Generalizable Prior for Dexterous Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we take inspiration from how human dexterity builds on a diversity of prior experiences, instead of being acquired through a single task.	Vittorio Caggiano; Sudeep Dasari; Vikash Kumar;
435	The Case for 4-bit Precision: K-bit Inference Scaling Laws Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For example, a 30B 8-bit model and a 60B 4-bit model have the same number of bits but may have very different zero-shot accuracies. In this work, we study this trade-off by developing inference scaling laws of zero-shot performance in Large Language Models (LLMs) to determine the bit-precision and model size that maximizes zero-shot performance.	Tim Dettmers; Luke Zettlemoyer;
436	Explore and Exploit The Diverse Knowledge in Model Zoo for Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper argues that the knowledge contained in weaker models is valuable and presents a method for leveraging the diversity within the model zoo to improve out-of-distribution generalization capabilities.	Yimeng Chen; Tianyang Hu; Fengwei Zhou; Zhenguo Li; Zhi-Ming Ma;
437	Dividing and Conquering A BlackBox to A Mixture of Interpretable Models: Route, Interpret, Repeat Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to blur the distinction between a post hoc explanation of a Blackbox and constructing interpretable models.	Shantanu Ghosh; Ke Yu; Forough Arabshahi; kayhan Batmanghelich;
438	Adaptive Barrier Smoothing for First-Order Policy Gradient with Contact Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, optimization becomes challenging due to chaotic and non-smooth loss landscapes. To tackle this issue, we propose a novel approach called Adaptive Barrier Smoothing (ABS), which introduces a class of softened complementarity systems that correspond to barrier-smoothed objectives.	Shenao Zhang; Wanxin Jin; Zhaoran Wang;
439	Online Restless Bandits with Unobserved States Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose TSEETC, a learning algorithm based on Thompson Sampling with Episodic Explore-Then-Commit.	Bowen Jiang; Bo Jiang; Jian Li; TAO LIN; Xinbing Wang; Chenghu Zhou;
440	Analyzing Diffusion As Serial Reproduction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By identifying a correspondence between diffusion models and a well-known paradigm in cognitive science known as serial reproduction, whereby human agents iteratively observe and reproduce stimuli from memory, we show how the aforementioned properties of diffusion models can be explained as a natural consequence of this correspondence.	Raja Marjieh; Ilia Sucholutsky; Thomas A Langlois; Nori Jacoby; Thomas L. Griffiths;
441	Probably Anytime-Safe Stochastic Combinatorial Semi-Bandits Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by concerns about making online decisions that incur undue amount of risk at each time step, in this paper, we formulate the probably anytime-safe stochastic combinatorial semi-bandits problem.	Yunlong Hou; Vincent Tan; Zixin Zhong;
442	Trompt: Towards A Better Deep Neural Network for Tabular Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Trompt–which stands for Tabular Prompt–a novel architecture inspired by prompt learning of language models.	Kuan-Yu Chen; Ping-Han Chiang; Hsin-Rung Chou; Ting-Wei Chen; Tien-Hao Chang;
443	A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Prompt engineering typically requires hand-crafting a set of prompts for individual downstream tasks. In this work, we aim to automate this prompt engineering and improve zero-shot accuracy through prompt ensembling.	James Urquhart Allingham; Jie Ren; Michael W Dusenberry; Xiuye Gu; Yin Cui; Dustin Tran; Jeremiah Zhe Liu; Balaji Lakshminarayanan;
444	Multi-Objective GFlowNets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Multi-Objective GFlowNets (MOGFNs), a novel method for generating diverse Pareto optimal solutions, based on GFlowNets.	Moksh Jain; Sharath Chandra Raparthy; Alex Hernández-García; Jarrid Rector-Brooks; Yoshua Bengio; Santiago Miret; Emmanuel Bengio;
445	Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We empirically find that random convex combinations of the learned queries are still good for the corresponding models. We then propose to learn a convex combination with dynamic coefficients based on the high-level semantics of the image.	Yiming Cui; Linjie Yang; Haichao Yu;
446	Cramming: Training A Language Model on A Single GPU in One Day Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate the downstream performance achievable with a transformer-based language model trained completely from scratch with masked language modeling for a single day on a single consumer GPU.	Jonas Geiping; Tom Goldstein;
447	On Computing Optimal Tree Ensembles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent algorithmic advances allow to compute decision trees that are optimal for various measures such as their size or depth. We are not aware of such research for tree ensembles and aim to contribute to this area.	Christian Komusiewicz; Pascal Kunz; Frank Sommer; Manuel Sorge;
448	Kernel Logistic Regression Approximation of An Understandable ReLU Neural Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes an understandable neural network whose score function is modeled as an additive sum of univariate spline functions.	Marie Guyomard; Susana Barbosa; Lionel Fillatre;
449	On Coresets for Clustering in Small Dimensional Euclidean Spaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of constructing small coresets for $k$-Median in Euclidean spaces.	Lingxiao Huang; Ruiyuan Huang; Zengfeng Huang; Xuan Wu;
450	A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new GT-based Risk-Averse Equilibrium (RAE) that always produces a solution that minimises the potential variance in reward accounting for the strategy of other agents.	Oliver Slumbers; David Henry Mguni; Stefano B Blumberg; Stephen Marcus McAleer; Yaodong Yang; Jun Wang;
451	Learning Compiler Pass Orders Using Coreset and Normalized Value Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, instead of predicting passes sequentially, we directly learn a policy on the pass sequence space, which outperforms the default -Oz flag by an average of 4.5% over a large collection (4683) of unseen code repositories from diverse domains across 14 datasets.	Youwei Liang; Kevin Stone; Ali Shameli; Chris Cummins; Mostafa Elhoushi; Jiadong Guo; Benoit Steiner; Xiaomeng Yang; Pengtao Xie; Hugh James Leather; Yuandong Tian;
452	SGD with Large Step Sizes Learns Sparse Features Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We showcase important features of the dynamics of the Stochastic Gradient Descent (SGD) in the training of neural networks. We present empirical observations that commonly used large step sizes (i) may lead the iterates to jump from one side of a valley to the other causing loss stabilization, and (ii) this stabilization induces a hidden stochastic dynamics that biases it implicitly toward simple predictors.	Maksym Andriushchenko; Aditya Vardhan Varre; Loucas Pillaud-Vivien; Nicolas Flammarion;
453	Optimally-weighted Estimators of The Maximum Mean Discrepancy for Likelihood-Free Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel estimator for the MMD with significantly improved sample complexity.	Ayush Bharti; Masha Naslidnyk; Oscar Key; Samuel Kaski; Francois-Xavier Briol;
454	Input Uncertainty Propagation Through Trained Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address the problem of input uncertainty propagation through trained neural networks.	Paul Monchot; Loic Coquelin; Sébastien Julien Petit; Sébastien Marmin; Erwan Le Pennec; Nicolas Fischer;
455	GeCoNeRF: Few-shot Neural Radiance Fields Via Geometric Consistency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel framework to regularize Neural Radiance Field (NeRF) in a few-shot setting with a geometry-aware consistency regularization.	Min-Seop Kwak; Jiuhn Song; Seungryong Kim;
456	Learning Prescriptive ReLU Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a piecewise linear neural network model that can balance strong prescriptive performance and interpretability, which we refer to as the prescriptive ReLU network, or P-ReLU.	Wei Sun; Asterios Tsiourvas;
457	Achieving Linear Speedup in Non-IID Federated Bilevel Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, several important properties in federated learning such as the partial client participation and the linear speedup for convergence (i.e., the convergence rate and complexity are improved linearly with respect to the number of sampled clients) in the presence of non-i.i.d. datasets, still remain open. In this paper, we fill these gaps by proposing a new federated bilevel algorithm named FedMBO with a novel client sampling scheme in the federated hypergradient estimation.	Minhui Huang; Dewei Zhang; Kaiyi Ji;
458	Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Yet, the alignment between LLMs’ knowledge and the environment can be wrong and limit functional competence due to lack of grounding. In this paper, we study an approach (named GLAM) to achieve this alignment through functional grounding: we consider an agent using an LLM as a policy that is progressively updated as the agent interacts with the environment, leveraging online Reinforcement Learning to improve its performance to solve goals.	Thomas Carta; Clément ROMAC; Thomas Wolf; sylvain lamprier; Olivier Sigaud; Pierre-Yves Oudeyer;
459	In Search for A Generalizable Method for Source Free Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we apply existing SFDA techniques to a challenging set of naturally-occurring distribution shifts in bioacoustics, which are very different from the ones commonly studied in computer vision.	Malik Boudiaf; tom denton; Bart van Merrienboer; Vincent Dumoulin; Eleni Triantafillou;
460	Few-bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a systematic approach to compute optimal quantization of the retained gradients of the pointwise nonlinear functions with only a few bits per each element.	Georgii Sergeevich Novikov; Daniel Bershatsky; Julia Gusak; Alex Shonenkov; Denis Valerievich Dimitrov; Ivan Oseledets;
461	Learning Controllable Degradation for Real-World Super-Resolution Via Constrained Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To remedy the issue, we propose to generate realistic SR datasets for unseen degradation levels by exploring the latent space of real LR images and thereby producing more diverse yet realistic LR images with complex real-world artifacts.	Seobin Park; Dongjin Kim; Sungyong Baik; Tae Hyun Kim;
462	Synthetic Data, Real Errors: How (Not) to Publish and Use Synthetic Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we explore how the generative process affects the downstream ML task.	Boris van Breugel; Zhaozhi Qian; Mihaela van der Schaar;
463	Hypothesis Transfer Learning with Surrogate Classification Losses: Generalization Bounds Through Algorithmic Stability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The analysis of such a method from a theoretical perspective faces multiple challenges, particularly in classification tasks. This paper deals with this problem by studying the learning theory of HTL through algorithmic stability, an attractive theoretical framework for machine learning algorithms analysis.	Anass Aghbalou; Guillaume Staerman;
464	Deep Perturbation Learning: Enhancing The Network Performance Via Image Perturbations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unlike the existing works, in this paper, we introduce a novel framework Deep Perturbation Learning (DPL), the new insights into understanding image perturbations, to enhance the performance of networks rather than decrease the performance.	Zifan Song; Xiao Gong; Guosheng Hu; Cai Rong Zhao;
465	A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide a general framework for adapting discrete offline approximation algorithms into sublinear $\alpha$-regret methods that only require bandit feedback, achieving $\mathcal{O}\left(T^\frac{2}{3}\log(T)^\frac{1}{3}\right)$ expected cumulative $\alpha$-regret dependence on the horizon $T$.	Guanyu Nie; Yididiya Y Nadew; Yanhui Zhu; Vaneet Aggarwal; Christopher John Quinn;
466	Do Machine Learning Models Learn Statistical Rules Inferred from Data? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We thereby seek to infer statistical rules from the data and quantify the extent to which a model has learned them. We propose a framework SQRL that integrates logic-based methods with statistical inference to derive these rules from a model’s training data without supervision.	Aaditya Naik; Yinjun Wu; Mayur Naik; Eric Wong;
467	Thompson Sampling with Less Exploration Is Fast and Optimal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose $\epsilon$-Exploring Thompson Sampling ($\epsilon$-TS), a modified version of the Thompson Sampling (TS) algorithm for multi-armed bandits.	Tianyuan Jin; XIANGLIN YANG; Xiaokui Xiao; Pan Xu;
468	Generalized Disparate Impact for Configurable Fairness Solutions in ML Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We make two contributions in the field of AI fairness over continuous protected attributes.	Luca Giuliani; Eleonora Misino; Michele Lombardi;
469	On The Effectiveness of Offline RL for Dialogue Response Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This motivates use of sequence-level objectives for dialogue response generation. In this paper, we study the efficacy of various offline reinforcement learning (RL) methods to maximize such objectives.	Paloma Sodhi; Felix Wu; Ethan R. Elenberg; Kilian Q Weinberger; Ryan McDonald;
470	Computational Asymmetries in Robust Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: First, we prove that while attacking ReLU classifiers is $\mathit{NP}$-hard, ensuring their robustness at training time is $\Sigma^2_P$-hard (even on a single example). This asymmetry provides a rationale for the fact that robust classifications approaches are frequently fooled in the literature. Second, we show that inference-time robustness certificates are not affected by this asymmetry, by introducing a proof-of-concept approach named Counter-Attack (CA). Indeed, CA displays a reversed asymmetry: running the defense is $\mathit{NP}$-hard, while attacking it is $\Sigma_2^P$-hard.	Samuele Marro; Michele Lombardi;
471	Nearly-tight Bounds for Deep Kernel Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we prove novel and nearly-tight generalization bounds based on the uniform covering number and the Rademacher chaos complexity for deep (multiple) kernel machines.	Yifan Zhang; Min-Ling Zhang;
472	DoCoFL: Downlink Compression for Cross-Device Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Accordingly, we propose DoCoFL — a new framework for downlink compression in the cross-device setting.	Ron Dorfman; Shay Vargaftik; Yaniv Ben-Itzhak; Kfir Yehuda Levy;
473	KDEformer: Accelerating Transformers Via Kernel Density Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our proposed KDEformer can approximate the attention in sub-quadratic time with provable spectral norm bounds, while all prior results merely provide entry-wise error bounds.	Amir Zandieh; Insu Han; Majid Daliri; Amin Karbasi;
474	Probabilistic Attention-to-Influence Neural Models for Event Sequences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While neural sequence models are able to capture complex and potentially long-range historical dependencies, they often lack the interpretability of simpler models for event sequence dynamics. We provide a novel neural framework in such a setting – a probabilistic attention-to-influence neural model – which not only captures complex instance-wise interactions between events but also learns influencers for each event type of interest.	Xiao Shou; Debarun Bhattacharjya; Tian Gao; Dharmashankar Subramanian; Oktie Hassanzadeh; Kristin Bennett;
475	Conformal Prediction Sets for Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a conformal procedure to equip GNNs with prediction sets that come with distribution-free guarantees — the output set contains the true label with arbitrarily high probability.By leveraging the network homophily we construct sets with comparable or better efficiency (average size) and significantly improved singleton hit ratio (correct sets of size one).	Soroush H. Zargarbashi; Simone Antonelli; Aleksandar Bojchevski;
476	Flash: Concept Drift Adaptation in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel adaptive optimizer called Flash that simultaneously addresses both statistical heterogeneity and the concept drift issues.	Kunjal Panchal; Sunav Choudhary; Subrata Mitra; Koyel Mukherjee; Somdeb Sarkhel; Saayan Mitra; Hui Guan;
477	Input Perturbation Reduces Exposure Bias in Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we observe that a long sampling chain also leads to an error accumulation phenomenon, which is similar to the exposure bias problem in autoregressive text generation.	Mang Ning; Enver Sangineto; Angelo Porrello; Simone Calderara; Rita Cucchiara;
478	Efficient Training of Language Models Using Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study an efficient approach to train language models using few-shot learners.	Sashank J. Reddi; Sobhan Miryoosefi; Stefani Karp; Shankar Krishnan; Satyen Kale; Seungyeon Kim; Sanjiv Kumar;
479	NeuralStagger: Accelerating Physics-constrained Neural PDE Solver with Spatial-temporal Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a general acceleration methodology called NeuralStagger by spatially and temporally decomposing the original learning tasks into several coarser-resolution subtasks.	Xinquan Huang; Wenlei Shi; Qi Meng; Yue Wang; Xiaotian Gao; Jia Zhang; Tie-Yan Liu;
480	Out-of-Distribution Generalization of Federated Learning Via Implicit Invariant Relationships Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, learning invariant relationships is often in an explicit manner from data, representation, and distribution, which violates the federated principles of privacy-preserving and limited communication. In this paper, we propose FedIIR, which implicitly learns invariant relationships from parameter for out-of-distribution generalization, adhering to the above principles.	Yaming Guo; Kai Guo; Xiaofeng Cao; Tieru Wu; Yi Chang;
481	Differentially Private Sharpness-Aware Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate sharpness, a key factor in achieving better generalization, in private learning.	Jinseong Park; Hoki Kim; Yujin Choi; Jaewook Lee;
482	Monotonic Location Attention for Length Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce novel variants of location attention building on top of Dubois et al. (2020) to address the new diagnostic tasks.	Jishnu Ray Chowdhury; Cornelia Caragea;
483	Effective Structured Prompting By Meta-Learning and Representative Verbalizer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Combining meta-learning the prompt pool and RepVerb, we propose MetaPrompter for effective structured prompting.	Weisen Jiang; Yu Zhang; James Kwok;
484	End-to-end Differentiable Clustering with Associative Memories Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We uncover a novel connection between the AM dynamics and the inherent discrete assignment necessary in clustering to propose a novel unconstrained continuous relaxation of the discrete clustering problem, enabling end-to-end differentiable clustering with AM, dubbed ClAM.	Bishwajit Saha; Dmitry Krotov; Mohammed J Zaki; Parikshit Ram;
485	Near-Optimal Cryptographic Hardness of Agnostically Learning Halfspaces and ReLU Regression Under Gaussian Marginals Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the task of agnostically learning halfspaces under the Gaussian distribution.	Ilias Diakonikolas; Daniel Kane; Lisheng Ren;
486	PaLM-E: An Embodied Multimodal Language Model Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose embodied language models to directly incorporate real-world continuous sensor modalities into language models and thereby establish the link between words and percepts.	Danny Driess; Fei Xia; Mehdi S. M. Sajjadi; Corey Lynch; Aakanksha Chowdhery; brian ichter; Ayzaan Wahid; Jonathan Tompson; quan vuong; Tianhe Yu; Wenlong Huang; Yevgen Chebotar; Pierre Sermanet; Daniel Duckworth; Sergey Levine; Vincent Vanhoucke; Karol Hausman; Marc Toussaint; Klaus Greff; Andy Zeng; Igor Mordatch; Pete Florence;
487	Fighting Fire with Fire: Contrastive Debiasing Without Bias-free Data Via Generative Bias-transformation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Contrastive Debiasing via Generative Bias-transformation (CDvG) which is capable of operating without explicitly exploiting bias labels and bias-free samples.	Yeonsung Jung; Hajin Shim; June Yong Yang; Eunho Yang;
488	Image Generation with Shortest Path Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we hypothesize that the optimal procedure minimizes the length of the path taken when corrupting an image towards a given final state.	Ayan Das; Stathi Fotiadis; Anil Batra; Farhang Nabiei; FengTing Liao; Sattar Vakili; Da-shan Shiu; Alberto Bernacchia;
489	Deterministic Equivalent and Error Universality of Deep Random Features Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This manuscript considers the problem of learning a random Gaussian network function using a fully connected network with frozen intermediate layers and trainable readout layer.	Dominik Schröder; Hugo Cui; Daniil Dmitriev; Bruno Loureiro;
490	DP-Fast MH: Private, Fast, and Accurate Metropolis-Hastings for Large-Scale Bayesian Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study Metropolis-Hastings (MH), one of the most fundamental MCMC methods, for large-scale Bayesian inference under differential privacy.	Wanrong Zhang; Ruqi Zhang;
491	A Fast, Well-Founded Approximation to The Empirical Neural Tangent Kernel Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most existing applications have therefore used one of a handful of approximations yielding $N \times N$ kernel matrices, saving orders of magnitude of computation, but with limited to no justification. We prove that one such approximation, which we call sum of logits, converges to the true eNTK at initialization.	Mohamad Amin Mohamadi; Wonho Bae; Danica J. Sutherland;
492	DiscoBAX – Discovery of Optimal Intervention Sets in Genomic Experiment Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose DiscoBAX – a sample-efficient method for maximizing the rate of significant discoveries per experiment while simultaneously probing for a wide range of diverse mechanisms during a genomic experiment campaign.	Clare Lyle; Arash Mehrjou; Pascal Notin; Andrew Jesson; Stefan Bauer; Yarin Gal; Patrick Schwab;
493	On The Within-Group Fairness of Screening Classifiers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we argue that screening policies that use calibrated classifiers may suffer from an understudied type of within-group unfairness—they may unfairly treat qualified members within demographic groups of interest.	Nastaran Okati; Stratis Tsirtsis; Manuel Gomez Rodriguez;
494	UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes the Unified and Progressive Pruning (UPop) as a universal vison-language Transformer compression framework, which incorporates 1) unifiedly searching multimodal subnets in a continuous optimization space from the original model, which enables automatic assignment of pruning ratios among compressible modalities and structures; 2) progressively searching and retraining the subnet, which maintains convergence between the search and retrain to attain higher compression ratios.	Dachuan Shi; Chaofan Tao; Ying Jin; Zhendong Yang; Chun Yuan; Jiaqi Wang;
495	Learning Representations Without Compositional Assumptions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this assumption is not always valid for real-world tabular datasets with complex dependencies between feature sets, resulting in localized information that is harder to learn. To overcome this limitation, we propose a data-driven approach that learns feature set dependencies by representing feature sets as graph nodes and their relationships as learnable edges.	Tennison Liu; Jeroen Berrevoets; Zhaozhi Qian; Mihaela van der Schaar;
496	Optimization for Amortized Inverse Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an efficient amortized optimization scheme for inverse problems with a deep generative prior.	Tianci Liu; Tong Yang; Quan Zhang; Qi Lei;
497	Topologically Faithful Image Segmentation Via Induced Matching of Persistence Barcodes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the concept of induced matchings from persistent homology to achieve a spatially correct matching between persistence barcodes in a segmentation setting.	Nico Daniel Stucki; Johannes C. Paetzold; Suprosanna Shit; bjoern menze; Ulrich Bauer;
498	Fast Algorithms for Distributed K-Clustering with Outliers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the $k$-clustering problems with outliers in distributed setting.	Junyu Huang; Qilong Feng; Ziyun Huang; Jinhui Xu; Jianxin Wang;
499	GNOT: A General Neural Operator Transformer for Operator Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, there are several challenges for learning operators in practical applications like the irregular mesh, multiple input functions, and complexity of the PDEs’ solution. To address these challenges, we propose a general neural operator transformer (GNOT), a scalable and effective transformer-based framework for learning operators.	Zhongkai Hao; Zhengyi Wang; Hang Su; Chengyang Ying; Yinpeng Dong; Songming Liu; Ze Cheng; Jian Song; Jun Zhu;
500	A Kernelized Stein Discrepancy for Biological Sequences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we propose the “KSD-B”, a novel divergence measure for distributions over biological sequences that is based on the kernelized Stein discrepancy (KSD).	Alan Nawzad Amin; Eli N Weinstein; Debora Susan Marks;
501	Minimax Estimation of Discontinuous Optimal Transport Maps: The Semi-discrete Case Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of estimating the optimal transport map between two probability distributions, $P$ and $Q$ in $\mathbb{R}^d$, on the basis of i.i.d. samples.	Aram-Alexandre Pooladian; Vincent Divol; Jonathan Niles-Weed;
502	Best Arm Identification in Multi-Agent Multi-Armed Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate the problem of best arm identification in Multi-Agent Multi-Armed Bandits (MAMABs) where the rewards are defined through a factor graph.	Filippo Vannella; Alexandre Proutiere; Jaeseong Jeong;
503	Streaming Submodular Maximization with Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the problem of privately maximizing a submodular function in the streaming setting.	Anamay Chaturvedi; Huy Nguyen; Thy Dinh Nguyen;
504	What Do CNNs Learn in The First Layer and Why? A Linear Systems Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: It has previously been reported that the representation that is learned in the first layer of deep Convolutional Neural Networks (CNNs) is highly consistent across initializations and architectures. In this work, we quantify this consistency by considering the first layer as a filter bank and measuring its energy distribution.	Rhea Chowers; Yair Weiss;
505	Retrosynthetic Planning with Dual Value Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose a novel online training algorithm, called Planning with Dual Value Networks (PDVN), which alternates between the planning phase and updating phase.	Guoqing Liu; Di Xue; Shufang Xie; Yingce Xia; Austin Tripp; Krzysztof Maziarz; Marwin Segler; Tao Qin; Zongzhang Zhang; Tie-Yan Liu;
506	Paging with Succinct Predictions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study learning-augmented paging from the new perspective of requiring the least possible amount of predicted information.	Antonios Antoniadis; Joan Boyar; Marek Elias; Lene M. Favrholdt; Ruben Hoeksma; Kim S. Larsen; Adam Polak; Bertrand Simon;
507	Superhuman Fairness Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We instead re-cast fair machine learning as an imitation learning task by introducing superhuman fairness, which seeks to simultaneously outperform human decisions on multiple predictive performance and fairness measures.	Omid Memarrast; Linh Vu; Brian D Ziebart;
508	Constrained Phi-Equilibria Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce and computationally characterize constrained Phi-equilibria—a more general notion than constrained CEs—in normal-form games.	Martino Bernasconi; Matteo Castiglioni; Alberto Marchesi; Francesco Trovò; Nicola Gatti;
509	Expectation-Complete Graph Representations with Homomorphisms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate novel random graph embeddings that can be computed in expected polynomial time and that are able to distinguish all non-isomorphic graphs in expectation.	Pascal Welke; Maximilian Thiessen; Fabian Jogl; Thomas Gärtner;
510	Masked Bayesian Neural Networks : Theoretical Guarantee and Its Posterior Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new node-sparse BNN model which has good theoretical properties and is computationally feasible.	Insung Kong; Dongyoon Yang; Jongjin Lee; Ilsang Ohn; GYUSEUNG BAEK; Yongdai Kim;
511	Discover-Then-Rank Unlabeled Support Vectors in The Dual Space for Multi-Class Active Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to approach active learning (AL) from a novel perspective of discovering and then ranking potential support vectors by leveraging the key properties of the dual space of a sparse kernel max-margin predictor.	Dayou Yu; Weishi Shi; Qi Yu;
512	Communication-Constrained Bandits Under Additive Gaussian Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study a distributed stochastic multi-armed bandit where a client supplies the learner with communication-constrained feedback based on the rewards for the corresponding arm pulls.	Prathamesh Mayekar; Jonathan Scarlett; Vincent Tan;
513	Collaborative Causal Inference with Fair Incentives Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a reward scheme designed using the unique statistical properties that are required by causal inference to guarantee certain desirable incentive criteria (e.g., fairness, benefit) for the parties based on their contributions. To achieve this, we propose a data valuation function to value parties’ data for CCI based on the distributional closeness of its resulting treatment effect estimate to that utilizing the aggregated data from all parties.	Rui Qiao; Xinyi Xu; Bryan Kian Hsiang Low;
514	Performative Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the framework of performative reinforcement learning where the policy chosen by the learner affects the underlying reward and transition dynamics of the environment.	Debmalya Mandal; Stelios Triantafyllou; Goran Radanovic;
515	Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the approximation and estimation ability of Transformers as sequence-to-sequence functions with infinite dimensional inputs.	Shokichi Takakura; Taiji Suzuki;
516	GuardHFL: Privacy Guardian for Heterogeneous Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We put forth GuardHFL, the first-of-its-kind efficient and privacy-preserving HFL framework.	Hanxiao Chen; Meng Hao; Hongwei Li; Kangjie Chen; Guowen Xu; Tianwei Zhang; Xilin Zhang;
517	Overcoming Simplicity Bias in Deep Networks Using A Feature Sieve Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a direct, interventional method for addressing simplicity bias in DNNs, which we call the feature sieve.	Rishabh Tiwari; Pradeep Shenoy;
518	Exploring The Limits of Model-Targeted Indiscriminate Data Poisoning Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the notion of model poisoning reachability as a technical tool to explore the intrinsic limits of data poisoning attacks towards target parameters (i.e., model-targeted attacks).	Yiwei Lu; Gautam Kamath; Yaoliang Yu;
519	The Regret of Exploration and The Control of Bad Episodes in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The first contribution of this paper is the introduction of a new performance measure of a RL algorithm that is more discriminating than the regret, that we call the regret of exploration that measures the asymptotic cost of exploration. The second contribution is a new performance test (PT) to end episodes in RL optimistic algorithms.	Victor Boone; Bruno Gaujal;
520	The Wisdom of Hindsight Makes Language Models Better Instruction Followers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we consider an alternative approach: converting feedback to instruction by relabeling the original one and training the model for better alignment in a supervised manner.	Tianjun Zhang; Fangchen Liu; Justin Wong; Pieter Abbeel; Joseph E. Gonzalez;
521	STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We conjecture that learning N:M masks with Adam should take the critical regime of variance estimation into account. In light of this, we propose STEP, an Adam-aware recipe that learns N:M masks with two phases: first, STEP calculates a reliable variance estimate (precondition phase) and subsequently, the variance remains fixed and is used as a precondition to learn N:M masks (mask-learning phase).	Yucheng Lu; Shivani Agrawal; Suvinay Subramanian; Oleg Rybakov; Christopher De Sa; Amir Yazdanbakhsh;
522	Speeding Up Bellman Ford Via Minimum Violation Permutations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Its running time is governed by the order the algorithm examines vertices for iterative updates on the value of their shortest path. In this work we study this problem through the lens of ‘Algorithms with predictions,’ and show how to leverage auxiliary information from similar instances to improve the running time.	Silvio Lattanzi; Ola Svensson; Sergei Vassilvitskii;
523	Reflected Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To incorporate data constraints in a principled manner, we present Reflected Diffusion Models, which instead reverse a reflected stochastic differential equation evolving on the support of the data.	Aaron Lou; Stefano Ermon;
524	Compositional Exemplars for In-context Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we systematically formulate in-context example selection as a subset selection problem, and optimize it in an end-to-end fashion.	Jiacheng Ye; Zhiyong Wu; Jiangtao Feng; Tao Yu; Lingpeng Kong;
525	How Much Does Initialization Affect Generalization? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show on the contrary that, independently of architecture, SGD can itself be the cause of poor generalization if one does not ensure good initialization.	Sameera Ramasinghe; Lachlan Ewen MacDonald; Moshiur Farazi; Hemanth Saratchandran; Simon Lucey;
526	Context Consistency Regularization for Label Sparsity in Time Series Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, consistency regularization techniques have been used to generate artificial labels from unlabeled augmented instances. To fully exploit the sequential characteristic of time series in consistency regularization, we propose a novel method of data augmentation called context-attached augmentation, which adds preceding and succeeding instances to a target instance to form its augmented instance.	Yooju Shin; Susik Yoon; Hwanjun Song; Dongmin Park; Byunghyun Kim; Jae-Gil Lee; Byung Suk Lee;
527	Constrained Monotonic Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unfortunately, this construction does not work with popular non-saturated activation functions as it can only approximate convex functions. We show this shortcoming can be fixed by constructing two additional activation functions from a typical unsaturated monotonic activation function and employing each of them on the part of neurons.	Davor Runje; Sharath M Shankaranarayana;
528	Attributing Image Generative Models Using Latent Fingerprints Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper investigates the use of latent semantic dimensions as fingerprints, from where we can analyze the effects of design variables, including the choice of fingerprinting dimensions, strength, and capacity, on the accuracy-quality tradeoff.	Guangyu Nie; Changhoon Kim; Yezhou Yang; Yi Ren;
529	Principled Offline RL in The Presence of Rich Exogenous Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For example, a robot navigating in busy streets needs to ignore irrelevant information, such as other people walking in the background, textures of objects, or birds in the sky. In this paper, we focus on the setting with visually detailed exogenous information and introduce new offline RL benchmarks that offer the ability to study this problem.	Riashat Islam; Manan Tomar; Alex Lamb; Yonathan Efroni; Hongyu Zang; Aniket Rajiv Didolkar; Dipendra Misra; Xin Li; Harm van Seijen; Remi Tachet des Combes; John Langford;
530	Robust Non-Linear Feedback Coding Via Power-Constrained Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a new family of non-linear feedback codes that greatly enhance robustness to channel noise.	Junghoon Kim; Taejoon Kim; David Love; Christopher Brinton;
531	Discrete Continuous Optimization Framework for Simultaneous Clustering and Training in Mixture Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study a new framework of learning mixture models via automatic clustering called PRESTO, wherein we optimize a joint objective function on the model parameters and the partitioning, with each model tailored to perform well on its specific cluster.	Parth Vipul Sangani; Arjun Shashank Kashettiwar; Pritish Chakraborty; Bhuvan Reddy Gangula; Durga S; Ganesh Ramakrishnan; Rishabh K Iyer; Abir De;
532	Generating Language Corrections for Teaching Physical Control Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We design and build CORGI, a model trained to generate language corrections for physical control tasks, such as learning to ride a bike.	Megha Srivastava; Noah Goodman; Dorsa Sadigh;
533	Revisiting The Linear-Programming Framework for Offline RL with General Function Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we revisit the LP framework for offline RL, and provide a new reformulation that advances the existing results in several aspects, relaxing certain assumptions and achieving optimal statistical rates in terms of sample size.	Asuman E. Ozdaglar; Sarath Pattathil; Jiawei Zhang; Kaiqing Zhang;
534	Sampling-Based Accuracy Testing of Posterior Estimators for General Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Tests of Accuracy with Random Points (TARP) coverage testing as a method to estimate coverage probabilities of generative posterior estimators.	Pablo Lemos; Adam Coogan; Yashar Hezaveh; Laurence Perreault-Levasseur;
535	Monotonicity and Double Descent in Uncertainty Estimation with Gaussian Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we prove that by tuning hyperparameters to maximize marginal likelihood (the empirical Bayes procedure), performance, as measured by the marginal likelihood, improves monotonically with the input dimension. On the other hand, cross-validation metrics exhibit qualitatively different behavior that is characteristic of double descent.	Liam Hodgkinson; Chris van der Heide; Fred Roosta; Michael W. Mahoney;
536	Statistical Foundations of Prior-Data Fitted Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This article establishes a theoretical foundation for PFNs and illuminates the statistical mechanisms governing their behavior.	Thomas Nagler;
537	QASA: Advanced Question Answering on Scientific Articles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on our intensive think-aloud study that revealed the three types of questions: surface, testing, and deep questions, we first propose the QASA benchmark that consists of 1798 novel question answering pairs that require full-stack reasoning on scientific articles in AI and ML fields. Then we propose the QASA approach that tackles the full-stack reasoning with large language models via associative selection, evidential rationale-generation, and systematic composition.	Yoonjoo Lee; Kyungjae Lee; Sunghyun Park; Dasol Hwang; Jaehyeon Kim; Hong-in Lee; Moontae Lee;
538	Anti-Exploration By Random Network Distillation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite the success of Random Network Distillation (RND) in various domains, it was shown as not discriminative enough to be used as an uncertainty estimator for penalizing out-of-distribution actions in offline reinforcement learning. In this paper, we revisit these results and show that, with a naive choice of conditioning for the RND prior, it becomes infeasible for the actor to effectively minimize the anti-exploration bonus and discriminativity is not an issue.	Alexander Nikulin; Vladislav Kurenkov; Denis Tarasov; Sergey Kolesnikov;
539	Truncating Trajectories in Monte Carlo Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Indeed, the rewards taken in early simulation steps weigh exponentially more than future rewards. Taking a cue from this intuition, in this paper, we design an a-priori budget allocation strategy that leads to the collection of trajectories of different lengths, i.e., truncated.	Riccardo Poiani; Alberto Maria Metelli; Marcello Restelli;
540	Fast, Differentiable and Sparse Top-k: A Convex Analysis Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose new differentiable and sparse top-$k$ operators.	Michael Eli Sander; Joan Puigcerver; Josip Djolonga; Gabriel Peyré; Mathieu Blondel;
541	Certified Robust Neural Networks: Generalization and Corruption Resistance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Surprisingly, overfitting is a major concern in adversarial training despite being mostly absent in standard training. We provide here theoretical evidence for this peculiar “robust overfitting” phenomenon.	Mohammed Amine Bennouna; Ryan Lucas; Bart Van Parys;
542	NNSplitter: An Active Defense Solution for DNN Model Via Automated Weight Obfuscation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an active model IP protection scheme, namely NNSplitter, which actively protects the model by splitting it into two parts: the obfuscated model that performs poorly due to weight obfuscation, and the model secrets consisting of the indexes and original values of the obfuscated weights, which can only be accessed by authorized users with the support of the trusted execution environment.	Tong Zhou; Yukui Luo; Shaolei Ren; Xiaolin Xu;
543	FP-Diffusion: Improving Score-based Diffusion Models By Enforcing The Underlying Score Fokker-Planck Equation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we derive a corresponding equation called the score FPE that characterizes the noise-conditional scores of the perturbed data densities (i.e., their gradients).	Chieh-Hsin Lai; Yuhta Takida; Naoki Murata; Toshimitsu Uesaka; Yuki Mitsufuji; Stefano Ermon;
544	Data Poisoning Attacks Against Multimodal Encoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast to previous work, only poisoning visual modality, in this work, we take the first step to studying poisoning attacks against multimodal models in both visual and linguistic modalities.	Ziqing Yang; Xinlei He; Zheng Li; Michael Backes; Mathias Humbert; Pascal Berrang; Yang Zhang;
545	Kernel Sufficient Dimension Reduction and Variable Selection for Compositional Data Via Amalgamation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work proposes a compositional SDR that can handle zeros naturally while incorporating the nonlinear nature and spurious negative correlations among components rigorously.	Junyoung Park; Jeongyoun Ahn; Cheolwoo Park;
546	Team Belief DAG: Generalizing The Sequence Form to Team Games for Fast Computation of Correlated Team Max-Min Equilibria Via Regret Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide new complexity results on the computation of optimal strategies for teams, and propose a new representation, coined team belief DAG (TB-DAG), that describes team strategies as a convex set.	Brian Hu Zhang; Gabriele Farina; Tuomas Sandholm;
547	A Theory of Representation Learning Gives A Deep Generalisation of Kernel Methods Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, standard theoretical approaches (formally NNGPs) involving infinite width limits eliminate representation learning. We therefore develop a new infinite width limit, the Bayesian representation learning limit, that exhibits representation learning mirroring that in finite-width models, yet at the same time, retains some of the simplicity of standard infinite-width limits.	Adam X. Yang; Maxime Robeyns; Edward Milsom; Ben Anson; Nandi Schoots; Laurence Aitchison;
548	MANSA: Learning Fast and Slow in Multi-Agent Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel plug & play IL framework named Multi-Agent Network Selection Algorithm (MANSA) which selectively employs CL only at states that require coordination.	David Henry Mguni; Haojun Chen; Taher Jafferjee; Jianhong Wang; Longfei Yue; Xidong Feng; Stephen Marcus McAleer; Feifei Tong; Jun Wang; Yaodong Yang;
549	Causal Discovery with Latent Confounders Based on Higher-Order Cumulants Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In light of the power of the closed-form solution to OICA corresponding to the One-Latent-Component structure, we formulate a way to estimate the mixing matrix using the higher-order cumulants, and further propose the testable One-Latent-Component condition to identify the latent variables and determine causal orders.	Ruichu Cai; Zhiyi Huang; Wei Chen; Zhifeng Hao; Kun Zhang;
550	Unsupervised Skill Discovery for Learning Shared Structures Across Changing Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new unsupervised skill discovery algorithm that discovers a set of skills that can represent shared structures across changing environments.	Sang-Hyun Lee; Seung-Woo Seo;
551	Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Synthetic prompting, a method that leverages a few handcrafted examples to prompt the model to generate more examples by itself, and selects effective demonstrations to elicit better reasoning.	Zhihong Shao; Yeyun Gong; yelong shen; Minlie Huang; Nan Duan; Weizhu Chen;
552	Contextual Conservative Interleaving Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the contextual conservative interleaving bandit problem, which has a performance constraint that requires the chosen actions to be not much worse than given baseline actions in each round.	Kei Takemura;
553	Multi-Objective Population Based Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: PBT is a single-objective algorithm, but many real-world hyperparameter optimization problems involve two or more conflicting objectives. In this work, we therefore introduce a multi-objective version of PBT, MO-PBT.	Arkadiy Dushatskiy; Alexander Chebykin; Tanja Alderliesten; Peter Bosman;
554	Active Learning Based Structural Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework, Active Learning based Structural Inference (ALaSI), to infer the existence of directed connections from observed agents’ states over a time period in a dynamical system.	Aoran Wang; Jun Pang;
555	Multi-Fidelity Covariance Estimation in The Log-Euclidean Geometry Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a multi-fidelity estimator of covariance matrices that employs the log-Euclidean geometry of the symmetric positive-definite manifold.	Aimee Maurais; Terrence Alsup; Benjamin Peherstorfer; Youssef Marzouk;
556	MolDiff: Addressing The Atom-Bond Inconsistency Problem in 3D Molecule Diffusion Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We define this problem as the atom-bond inconsistency problem and claim it is the main reason for current approaches to generating unrealistic 3D molecules. To overcome this problem, we propose a new diffusion model called MolDiff which can generate atoms and bonds simultaneously while still maintaining their consistency by explicitly modeling the dependence between their relationships.	Xingang Peng; Jiaqi Guan; qiang liu; Jianzhu Ma;
557	Deep Temporal Sets with Evidential Reinforced Attentions for Unique Behavioral Pattern Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Real-life applications, such as digital behavioral biomarker identification, often require the discovery of complex spatiotemporal patterns in multimodal data, which is largely under-explored. To fill this gap, we propose a novel model that integrates uniquely designed Deep Temporal Sets (DTS) with Evidential Reinforced Attentions (ERA).	Dingrong Wang; Deep Shankar Pandey; Krishna Prasad Neupane; Zhiwei Yu; Ervine Zheng; Zhi Zheng; Qi Yu;
558	Vector Quantized Wasserstein Auto-Encoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study learning deep discrete representations from the generative viewpoint.	Long Tung Vuong; Trung Le; He Zhao; Chuanxia Zheng; Mehrtash Harandi; Jianfei Cai; Dinh Phung;
559	Simple Embodied Language Learning As A Byproduct of Meta-Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we design an office navigation environment, where the agent�s goal is to find a particular office, and office locations differ in different buildings (i.e., tasks).	Evan Zheran Liu; Sahaana Suri; Tong Mu; Allan Zhou; Chelsea Finn;
560	Spred: Solving L1 Penalty with SGD Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to minimize a generic differentiable objective with $L_1$ constraint using a simple reparametrization and straightforward stochastic gradient descent.	Liu Ziyin; Zihao Wang;
561	Text-To-4D Dynamic Scene Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present MAV3D (Make-A-Video3D), a method for generating three-dimensional dynamic scenes from text descriptions.	Uriel Singer; Shelly Sheynin; Adam Polyak; Oron Ashual; Iurii Makarov; Filippos Kokkinos; Naman Goyal; Andrea Vedaldi; Devi Parikh; Justin Johnson; Yaniv Taigman;
562	Effective and Efficient Structural Inference with Reservoir Computing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present an effective and efficient structural inference approach by integrating a Reservoir Computing (RC) network into a Variational Auto-encoder-based (VAE-based) structural inference framework.	Aoran Wang; Tsz Pan Tong; Jun Pang;
563	Fractional Denoising for 3D Molecular Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The underlying reason is that molecular distributions assumed by existing denoising methods fail to capture the anisotropic characteristic of molecules. To tackle these challenges, we propose a novel hybrid noise strategy, including noises on both dihedral angel and coordinate.	Shikun Feng; Yuyan Ni; Yanyan Lan; Zhi-Ming Ma; Weiying Ma;
564	Beyond Lipschitz Smoothness: A Tighter Analysis for Nonconvex Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose to use two separate assumptions for positive and negative curvatures, so that we can study the different implications of the two.	Zhengmian Hu; Xidong Wu; Heng Huang;
565	A Reinforcement Learning Framework for Dynamic Mediation Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Proposing a reinforcement learning (RL) framework, we are the first to evaluate dynamic mediation effects in settings with infinite horizons.	Lin Ge; Jitao Wang; Chengchun Shi; Zhenke Wu; Rui Song;
566	On The Functional Similarity of Robust and Non-Robust Neural Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we investigate the functional similarity of robust and non-robust representations for image classification with the help of model stitching.	András Balogh; Márk Jelasity;
567	Compressed Decentralized Proximal Stochastic Gradient Method for Nonconvex Composite Problems with Heterogeneous Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We first propose a decentralized proximal stochastic gradient tracking method (DProxSGT) for nonconvex stochastic composite problems, with data heterogeneously distributed on multiple workers in a decentralized connected network. To save communication cost, we then extend DProxSGT to a compressed method by compressing the communicated information.	Yonggui Yan; Jie Chen; Pin-Yu Chen; Xiaodong Cui; Songtao Lu; Yangyang Xu;
568	Multi-Agent Best Arm Identification with Private Communications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For each privacy definition, we propose an algorithm based on a two-level successive elimination scheme.	Alexandre Rio; Merwan Barlier; Igor Colin; Marta Soare;
569	Evidential Interactive Learning for Medical Image Captioning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an evidential interactive learning framework that leverages evidence-based uncertainty estimation and interactive machine learning to improve image captioning with limited labeled data.	Ervine Zheng; Qi Yu;
570	Generalization Bounds Using Data-Dependent Fractal Dimensions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While these works have successfully introduced new mathematical tools to apprehend generalization, they heavily rely on a Lipschitz continuity assumption, which in general does not hold for neural networks and might make the bounds vacuous. In this work, we address this issue and prove fractal geometry-based generalization bounds without requiring any Lipschitz assumption.	Benjamin Dupuis; George Deligiannidis; Umut Simsekli;
571	Quantitative Universal Approximation Bounds for Deep Belief Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that deep belief networks with binary hidden units can approximate any multivariate probability density under very mild integrability requirements on the parental density of the visible nodes.	Julian Sieber; Johann Gehringer;
572	Learn to Accumulate Evidence from All Training Samples: Theory and Practice Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This constraint often leads to inferior predictive performance compared to standard softmax models, making it challenging to extend them to many large-scale datasets. To unveil the real cause of this undesired behavior, we theoretically investigate evidential models and identify a fundamental limitation that explains the inferior performance: existing evidential activation functions create zero evidence regions, which prevent the model to learn from training samples falling into such regions.	Deep Shankar Pandey; Qi Yu;
573	Multi-agent Online Scheduling: MMS Allocations for Indivisible Items Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of fairly allocating a sequence of indivisible items that arrive online in an arbitrary order to a group of $n$ agents with additive normalized valuation functions, we consider the allocation of goods and chores separately and propose algorithms for approximating maximin share (MMS) allocations for both settings.	Shengwei Zhou; Rufan Bai; Xiaowei Wu;
574	High Fidelity Image Counterfactuals with Probabilistic Causal Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a general causal generative modelling framework for accurate estimation of high fidelity image counterfactuals with deep structural causal models.	Fabio De Sousa Ribeiro; Tian Xia; Miguel Monteiro; Nick Pawlowski; Ben Glocker;
575	Two Losses Are Better Than One: Faster Optimization Using A Cheaper Proxy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an algorithm for minimizing an objective with hard-to-compute gradients by using a related, easier-to-access function as a proxy.	Blake Woodworth; Konstantin Mishchenko; Francis Bach;
576	Prototype-Sample Relation Distillation: Towards Replay-Free Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a holistic approach to jointly learn the representation and class prototypes while maintaining the relevance of old class prototypes and their embedded similarities.	Nader Asadi; MohammadReza Davari; Sudhir Mudur; Rahaf Aljundi; Eugene Belilovsky;
577	Sequential Changepoint Detection Via Backward Confidence Sequences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a simple reduction from sequential estimation to sequential changepoint detection (SCD).	Shubhanshu Shekhar; Aaditya Ramdas;
578	On Over-Squashing in Message Passing Neural Networks: The Impact of Width, Depth, and Topology Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite recent methods introduced to mitigate this issue, an understanding of the causes for over-squashing and of possible solutions are lacking. In this theoretical work, we prove that: (i) Neural network width can mitigate over-squashing, but at the cost of making the whole network more sensitive; (ii) Conversely, depth cannot help mitigate over-squashing: increasing the number of layers leads to over-squashing being dominated by vanishing gradients; (iii) The graph topology plays the greatest role, since over-squashing occurs between nodes at high commute time.	Francesco Di Giovanni; Lorenzo Giusti; Federico Barbero; Giulia Luise; Pietro Lio; Michael M. Bronstein;
579	Causal Bounds in Quasi-Markovian Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of computing bounds for causal queries on quasi-Markovian graphs with unobserved confounders and discrete valued observed variables, where identifiability does not hold.	Madhumitha Shridharan; Garud Iyengar;
580	Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Recent papers have theoretically shown that NC emerges in the global minimizers of training problems with the simplified “unconstrained feature model”. In this context, we take a step further and prove the NC occurrences in deep linear networks for the popular mean squared error (MSE) and cross entropy (CE) losses, showing that global solutions exhibit NC properties across the linear layers.	Hien Dang; Tho Tran Huu; Stanley Osher; Hung Tran-The; Nhat Ho; Tan Minh Nguyen;
581	Continual Task Allocation in Meta-Policy Network Via Sparse Prompting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: How to train a generalizable meta-policy by continually learning a sequence of tasks? It is a natural human skill yet challenging to achieve by current reinforcement learning: the agent is expected to quickly adapt to new tasks (plasticity) meanwhile retaining the common knowledge from previous tasks (stability). We address it by Continual Task Allocation via Sparse Prompting (CoTASP), which learns over-complete dictionaries to produce sparse masks as prompts extracting a sub-network for each task from a meta-policy network.	Yijun Yang; Tianyi Zhou; Jing Jiang; Guodong Long; Yuhui Shi;
582	Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While these bounds do not require additional topological assumptions given that SGD can be modeled using a heavy-tailed stochastic differential equation (SDE), they can only apply to simple quadratic problems. In this paper, we build on this line of research and develop generalization bounds for a more general class of objective functions, which includes non-convex functions as well.	Anant Raj; Lingjiong Zhu; Mert Gurbuzbalaban; Umut Simsekli;
583	Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we first reveal that a specific finite-difference computation, composed of both gradient ascent and descent steps, reduces the computational cost of GR. Next, we show that the finite-difference computation also works better in the sense of generalization performance.	Ryo Karakida; Tomoumi Takase; Tomohiro Hayase; Kazuki Osawa;
584	TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose TR0N, a highly general framework to turn pre-trained unconditional generative models, such as GANs and VAEs, into conditional models.	Zhaoyan Liu; Noël Vouitsis; Satya Krishna Gorti; Jimmy Ba; Gabriel Loaiza-Ganem;
585	SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a CLIP-based model named SegCLIP for the topic of open-vocabulary segmentation in an annotation-free manner.	Huaishao Luo; Junwei Bao; Youzheng Wu; Xiaodong He; Tianrui Li;
586	Causal Modeling of Policy Interventions From Treatment–Outcome Sequences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Therefore, the current methods are not applicable if the treatment policy is unknown or a counterfactual analysis is needed. To handle these limitations, we model the treatments and outcomes jointly in continuous time, by combining Gaussian processes and point processes.	Çağlar Hızlı; S. T. John; Anne Tuulikki Juuti; Tuure Tapani Saarinen; Kirsi Hannele Pietiläinen; Pekka Marttinen;
587	Cross-Entropy Loss Functions: Theoretical Analysis and Applications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a theoretical analysis of a broad family of loss functions, comp-sum losses, that includes cross-entropy (or logistic loss), generalized cross-entropy, the mean absolute error and other cross-entropy-like loss functions.	Anqi Mao; Mehryar Mohri; Yutao Zhong;
588	Learning Deductive Reasoning from Synthetic Corpus Based on Formal Logic Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study a synthetic corpus based approach for language models (LMs) to acquire logical deductive reasoning ability.We release the code, data, and models.	Terufumi Morishita; Gaku Morio; Atsuki Yamaguchi; Yasuhiro Sogawa;
589	Robust Consensus in Ranking Data Analysis: Definitions, Properties and Computational Issues Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce notions of robustness, together with dedicated statistical methods, for $\textit{Consensus Ranking}$ the flagship problem in ranking data analysis, aiming at summarizing a probability distribution on $\mathfrak{S}_n$ by a $\textit{median}$ ranking.	Morgane Goibert; Clément Calauzènes; Ekhine Irurozki; Stephan CLEMENCON;
590	Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the Q-learning Decision Transformer (QDT) to address the shortcomings of DT by leveraging the benefits of Dynamic Programming (Q-learning).	Taku Yamagata; Ahmed Khalil; Raul Santos-Rodriguez;
591	Extrapolated Random Tree for Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel tree-based algorithm named Extrapolated Random Tree for Regression (ERTR) that adapts to arbitrary smoothness of the regression function while maintaining the interpretability of the tree.	Yuchao Cai; Yuheng Ma; Yiwei Dong; Hanfang Yang;
592	Contrastive Energy Prediction for Exact Energy-Guided Diffusion Sampling in Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The main challenge for this setting is that the intermediate guidance during the diffusion sampling procedure, which is jointly defined by the sampling distribution and the energy function, is unknown and is hard to estimate. To address this challenge, we propose an exact formulation of the intermediate guidance as well as a novel training objective named contrastive energy prediction (CEP) to learn the exact guidance.	Cheng Lu; Huayu Chen; Jianfei Chen; Hang Su; Chongxuan Li; Jun Zhu;
593	ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In practice, it is often observed that the policy alternates between satisfying the constraints and maximizing the reward, rarely accomplishing both objectives simultaneously. Here, we address this problem by introducing Reinforcement Learning with Optimistic Ascent-Descent (ReLOAD), a principled CRL method with guaranteed last-iterate convergence.	Ted Moskovitz; Brendan O’Donoghue; Vivek Veeriah; Sebastian Flennerhag; Satinder Singh; Tom Zahavy;
594	Improving Adversarial Robustness By Putting More Regularizations on Less Robust Samples Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new adversarial training algorithm that is theoretically well motivated and empirically superior to other existing algorithms.	Dongyoon Yang; Insung Kong; Yongdai Kim;
595	Federated Linear Contextual Bandits with User-level Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies federated linear contextual bandits under the notion of user-level differential privacy (DP).	Ruiquan Huang; Huanyu Zhang; Luca Melis; Milan Shen; Meisam Hejazinia; Jing Yang;
596	Simple Diffusion: End-to-end Diffusion for High Resolution Images Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to improve denoising diffusion for high resolution images while keeping the model as simple as possible.	Emiel Hoogeboom; Jonathan Heek; Tim Salimans;
597	Geometric Clifford Algebra Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Geometric Clifford Algebra Networks (GCANs) for modeling dynamical systems.	David Ruhe; Jayesh K Gupta; Steven De Keninck; Max Welling; Johannes Brandstetter;
598	Nonparametric Generative Modeling with Conditional Sliced-Wasserstein Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Sliced-Wasserstein Flow (SWF) is a promising approach to nonparametric generative modeling but has not been widely adopted due to its suboptimal generative quality and lack of conditional modeling capabilities. In this work, we make two major contributions to bridging this gap.	Chao Du; Tianbo Li; Tianyu Pang; Shuicheng YAN; Min Lin;
599	Accelerated Stochastic Optimization Methods Under Quasar-convexity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We demonstrate that our algorithms have fast convergence and outperform existing algorithms on several examples, including the classical problem of learning linear dynamical systems.	Qiang Fu; Dongchu Xu; Ashia Camage Wilson;
600	Conformal Prediction for Federated Uncertainty Quantification Under Label Shift Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop a new federated conformal prediction method based on quantile regression and take into account privacy constraints.	Vincent Plassier; Mehdi Makni; Aleksandr Rubashevskii; Eric Moulines; Maxim Panov;
601	The Edge of Orthogonality: A Simple View of What Makes BYOL Tick Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim at exploring the simplest possible mathematical arguments towards explaining the underlying mechanisms behind self-predictive unsupervised learning.	Pierre Harvey Richemond; Allison Tam; Yunhao Tang; Florian Strub; Bilal Piot; Felix Hill;
602	Nested Elimination: A Simple Algorithm for Best-Item Identification From Choice-Based Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an elimination-based algorithm, namely Nested Elimination (NE), which is inspired by the nested structure implied by the information-theoretic lower bound.	Junwen Yang; Yifan Feng;
603	Harmonic Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we demonstrate effective means of representing harmonic functions in neural networks and extend such results also to quantum neural networks to demonstrate the generality of our approach. We benchmark our approaches against (quantum) physics-informed neural networks, where we show favourable performance.	Atiyo Ghosh; Antonio Andrea Gentile; Mario Dagrada; Chul Lee; Seong-Hyok Sean Kim; Hyukgeun Cha; Yunjun Choi; Dongho Kim; JEONG-IL KYE; Vincent Emanuel Elfving;
604	N$\text{A}^\text{2}$Q: Neural Attention Additive Model for Interpretable Multi-Agent Q-Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study an interpretable value decomposition framework via the family of generalized additive models.	Zichuan Liu; Yuanyang Zhu; Chunlin Chen;
605	An Investigation Into Pre-Training Object-Centric Representations for Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the effectiveness of OCR pre-training for image-based reinforcement learning via empirical experiments.For systematic evaluation, we introduce a simple object-centric visual RL benchmark and conduct experiments to answer questions such as Does OCR pre-training improve performance on object-centric tasks?	Jaesik Yoon; Yi-Fu Wu; Heechul Bae; Sungjin Ahn;
606	GNN&GBDT-Guided Fast Optimizing Framework for Large-scale Integer Programming Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the framework can not effectively use the embedding spatial information in GNN and still highly relies on large-scale solvers in LNS, resulting in the scale of IP being limited by the ability of the current solver and performance bottlenecks. To handle these issues, this paper presents a GNN&GBDT-guided fast optimizing framework for large-scale IPs that only uses a small-scale optimizer to solve large-scale IPs efficiently.	Huigen Ye; Hua Xu; Hongyan Wang; Chengming Wang; Yu Jiang;
607	State and Parameter Learning with PARIS Particle Gibbs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our first contribution is to design a novel additive smoothing algorithm, the Parisian particle Gibbs (PPG) sampler, which can be viewed as a PaRIS (Olsson, Westerborn 2017) algorithm driven by conditional SMC moves, resulting in bias-reduced estimates of the targeted quantities.	Gabriel Cardoso; Yazid Janati El Idrissi; Sylvain Le Corff; Eric Moulines; Jimmy Olsson;
608	Tuning Computer Vision Models With Task Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In natural language processing, this is often addressed using reinforcement learning techniques that align models with a task reward. We adopt this approach and show its surprising effectiveness to improve generic models pretrained to imitate example outputs across multiple computer vision tasks, such as object detection, panoptic segmentation, colorization and image captioning.	André Susano Pinto; Alexander Kolesnikov; Yuge Shi; Lucas Beyer; Xiaohua Zhai;
609	Leveraging Label Non-Uniformity for Node Classification in Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the key notion of label non-uniformity, which is derived from the Wasserstein distance between the softmax distribution of the logits and the uniform distribution.	Feng Ji; See Hian Lee; Hanyang Meng; Kai Zhao; Jielong Yang; Wee Peng Tay;
610	Meta-learning Parameterized Skills Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a novel parameterized skill-learning algorithm that aims to learn transferable parameterized skills and synthesize them into a new action space that supports efficient learning in long-horizon tasks.	Haotian Fu; Shangqun Yu; Saket Tiwari; Michael Littman; George Konidaris;
611	A Scalable Frank-Wolfe-Based Algorithm for The Max-Cut SDP Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of solving large-scale instances of the Max-Cut semidefinite program (SDP), i.e., optimizing a linear function over $n\times n$ positive semidefinite (PSD) matrices with unit diagonal.	Chi Bach Pham; Wynita Griggs; James Saunderson;
612	Last Switch Dependent Bandits with Monotone Payoff Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we take a step towards understanding the approximability of planning LSD bandits, namely, the (NP-hard) problem of computing an optimal arm-pulling strategy under complete knowledge of the model.	Ayoub Foussoul; Vineet Goyal; Orestis Papadigenopoulos; assaf zeevi;
613	Multi-class Graph Clustering Via Approximated Effective $p$-Resistance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper develops an approximation to the (effective) $p$-resistance and applies it to multi-class clustering.	Shota Saito; Mark Herbster;
614	Under-Counted Tensor Completion with Neural Incorporation of Attributes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, a low-rank Poisson tensor model with an expressive unknown nonlinear side information extractor is proposed for under-counted multi-aspect data.	Shahana Ibrahim; Xiao Fu; Rebecca Hutchinson; Eugene Seo;
615	Weighted Sampling Without Replacement for Deep Top-$k$ Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose using the Weighted Sampling Without Replacement (WSWR) method as a learning objective for top-$k$ loss.	Dieqiao Feng; Yuanqi Du; Carla P Gomes; Bart Selman;
616	Nearly-Linear Time and Streaming Algorithms for Outlier-Robust PCA Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our main contribution is to develop a nearly linear time algorithm for robust PCA with near-optimal error guarantees.	Ilias Diakonikolas; Daniel Kane; Ankit Pensia; Thanasis Pittas;
617	Online Mechanism Design for Information Acquisition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For the full feedback problem, we propose an algorithm that guarantees $\tilde{O}(\sqrt{T})$ regret and violation, while for the bandit feedback setting we present an algorithm that attains $\tilde{O}(T^{\alpha})$ regret and $\tilde{O}(T^{1-\alpha/2})$ violation for any $\alpha \in [1/2, 1]$.	Federico Cacciamani; Matteo Castiglioni; Nicola Gatti;
618	Directed Chain Generative Adversarial Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel time series generator, named directed chain GANs (DC-GANs), which inserts a time series dataset (called a neighborhood process of the directed chain or input) into the drift and diffusion coefficients of the directed chain SDEs with distributional constraints.	Ming Min; Ruimeng Hu; Tomoyuki Ichiba;
619	Propensity Matters: Measuring and Enhancing Balancing for Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on BMSE, we propose IPS-V2 and DR-V2 as the estimators of unbiased loss, and theoretically show that IPS-V2 and DR-V2 have greater propensity balancing and smaller variance without sacrificing additional bias.	Haoxuan Li; Yanghao Xiao; Chunyuan Zheng; Peng Wu; Peng Cui;
620	Layered State Discovery for Incremental Autonomous Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a novel layered decomposition of the set of incrementally $L$-controllable states that is based on the iterative application of a state-expansion operator.	Liyu Chen; Andrea Tirinzoni; Alessandro Lazaric; Matteo Pirotta;
621	MonoFlow: Rethinking Divergence GANs Via The Perspective of Wasserstein Gradient Flows Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we leverage Wasserstein gradient flows which characterize the evolution of particles in the sample space, to gain theoretical insights and algorithmic inspiration of GANs.	Mingxuan Yi; Zhanxing Zhu; Song Liu;
622	CataBEEM: Integrating Latent Interaction Categories in Node-wise Community Detection Models for Network Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a category-and-block edge exchangeable model (CataBEEM) to study interaction networks with joint latent interaction-level category and node-level community structures.	Yuhua Zhang; Walter H. Dempsey;
623	Chemically Transferable Generative Backmapping of Coarse-Grained Proteins Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work addresses both issues to build a fast, transferable, and reliable generative backmapping tool for CG protein representations.	Soojung Yang; Rafael Gomez-Bombarelli;
624	Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose CoPaint, which can coherently inpaint the whole image without introducing mismatches.	Guanhua Zhang; Jiabao Ji; Yang Zhang; Mo Yu; Tommi S. Jaakkola; Shiyu Chang;
625	MAGANet: Achieving Combinatorial Generalization By Modeling A Group Action Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, recent studies discovered that the disentangled representation is insufficient for combinatorial generalization and is not even correlated. In this regard, we propose a novel framework for data generation that can robustly generalize under these distribution shift situations.	Geonho Hwang; Jaewoong Choi; Hyunsoo Cho; Myungjoo Kang;
626	Go Beyond Imagination: Maximizing Episodic Reachability with World Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a new intrinsic reward design called GoBI – Go Beyond Imagination, which combines the traditional lifelong novelty motivation with an episodic intrinsic reward that is designed to maximize the stepwise reachability expansion.	Yao Fu; Run Peng; Honglak Lee;
627	QuantumDARTS: Differentiable Quantum Architecture Search for Variational Quantum Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of quantum architecture search (QAS) for VQA to automatically design parameterized quantum circuits (PQC).	Wenjie Wu; Ge Yan; Xudong Lu; Kaisen Pan; Junchi Yan;
628	Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: But using predictive error as intrinsic motivation is fragile in stochastic environments, as the agent may become trapped by high-entropy areas of the state-action space, such as a noisy TV. In this work, we study a natural solution derived from structural causal models of the world: Our key idea is to learn representations of the future that capture precisely the unpredictable aspects of each outcome—which we use as additional input for predictions, such that intrinsic rewards only reflect the predictable aspects of world dynamics.	Daniel Jarrett; Corentin Tallec; Florent Altché; Thomas Mesnard; Remi Munos; Michal Valko;
629	Sequential Multi-Dimensional Self-Supervised Learning for Clinical Time Series Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These existing methods cannot be readily extended to model time series that exhibit multimodality, with structured features and high-dimensional data being recorded at each timestep in the sequence. In this work, we address this gap and propose a new SSL method — Sequential Multi-Dimensional SSL — where a SSL loss is applied both at the level of the entire sequence and at the level of the individual high-dimensional data points in the sequence in order to better capture information at both scales.	Aniruddh Raghu; Payal Chandak; Ridwan Alam; John Guttag; Collin Stultz;
630	Diverse and Faithful Knowledge-Grounded Dialogue Generation Via Sequential Posterior Inference Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present an end-to-end learning framework, termed Sequential Posterior Inference (SPI), capable of selecting knowledge and generating dialogues by approximately sampling from the posterior distribution.	Yan Xu; Deqian Kong; Dehong Xu; Ziwei Ji; Bo Pang; Pascale Fung; Ying Nian Wu;
631	X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation Using CLIP and StableDiffusion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we revisit Copy-Paste at scale with the power of newly emerged zero-shot recognition models (e.g., CLIP) and text2image models (e.g., StableDiffusion).	Hanqing Zhao; Dianmo Sheng; Jianmin Bao; Dongdong Chen; Dong Chen; Fang Wen; Lu Yuan; Ce Liu; Wenbo Zhou; Qi Chu; Weiming Zhang; Nenghai Yu;
632	One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a unified diffusion framework (dubbed UniDiffuser) to fit all distributions relevant to a set of multi-modal data in one model.	Fan Bao; Shen Nie; Kaiwen Xue; Chongxuan Li; Shi Pu; Yaole Wang; Gang Yue; Yue Cao; Hang Su; Jun Zhu;
633	ClimaX: A Foundation Model for Weather and Climate Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop and demonstrate ClimaX, a flexible and generalizable deep learning model for weather and climate science that can be trained using heterogeneous datasets spanning different variables, spatio-temporal coverage, and physical groundings.	Tung Nguyen; Johannes Brandstetter; Ashish Kapoor; Jayesh K Gupta; Aditya Grover;
634	Efficient Algorithms for Exact Graph Matching on Correlated Stochastic Block Models with Constant Correlation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose an efficient algorithm for matching graphs with community structure, based on the comparison between partition trees rooted from each vertex, by extending the idea of Mao et al. (2021) to graphs with communities.	Joonhyuk Yang; Dongpil Shin; Hye Won Chung;
635	Prototype-oriented Unsupervised Anomaly Detection for Multivariate Time Series Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing UAD methods try to learn a fixed set of mappings for each MTS, entailing expensive computation and limited model adaptation. To address this pivotal issue, we propose a prototype-oriented UAD (PUAD) method under a probabilistic framework.	Yuxin Li; Wenchao Chen; Bo Chen; Dongsheng Wang; Long Tian; Mingyuan Zhou;
636	Delay-agnostic Asynchronous Coordinate Update Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a delay-agnostic asynchronous coordinate update algorithm (DEGAS) for computing operator fixed points, with applications to asynchronous optimization.	Xuyang Wu; Changxin Liu; Sindri Magnússon; Mikael Johansson;
637	FedCR: Personalized Federated Learning Based on Across-Client Common Representation with Conditional Mutual Information Regularization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In personalized federated learning (PFL), multiple clients train customized models to fulfill their personal objectives, which, however, are prone to overfitting to local data due to the heterogeneity and scarcity of local data. To address this, we propose from the information-theoretic perspective a personalized federated learning framework based on the common representation learned across clients, named FedCR.	Hao Zhang; Chenglin Li; Wenrui Dai; Junni Zou; Hongkai Xiong;
638	Lifelong Language Pretraining with Distribution-Specialized Experts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, we propose Lifelong-MoE, an extensible MoE (Mixture-of-Experts) architecture that dynamically adds model capacity via adding experts with regularized pretaining.	Wuyang Chen; Yanqi Zhou; Nan Du; Yanping Huang; James Laudon; Zhifeng Chen; Claire Cui;
639	Revisiting Over-smoothing and Over-squashing Using Ollivier-Ricci Curvature Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on our theory, we propose the Batch Ollivier-Ricci Flow, a novel rewiring algorithm capable of simultaneously addressing both over-smoothing and over-squashing.	Khang Nguyen; Nong Minh Hieu; Vinh Duc NGUYEN; Nhat Ho; Stanley Osher; Tan Minh Nguyen;
640	AudioLDM: Text-to-Audio Generation with Latent Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose AudioLDM, a TTA system that is built on a latent space to learn continuous audio representations from contrastive language-audio pretraining (CLAP) embeddings.	Haohe Liu; Zehua Chen; Yi Yuan; Xinhao Mei; Xubo Liu; Danilo Mandic; Wenwu Wang; Mark D Plumbley;
641	Accounting For Informative Sampling When Learning to Forecast Treatment Outcomes Over Time Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we formalize informative sampling as a covariate shift problem and show that it can prohibit accurate estimation of treatment outcomes if not properly accounted for. To overcome this challenge, we present a general framework for learning treatment outcomes in the presence of informative sampling using inverse intensity-weighting, and propose a novel method, TESAR-CDE, that instantiates this framework using Neural CDEs.	Toon Vanderschueren; Alicia Curth; Wouter Verbeke; Mihaela van der Schaar;
642	A Picture of The Space of Typical Learnable Tasks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop information geometric techniques to understand the representations learned by deep networks when they are trained on different tasks using supervised, meta-, semi-supervised and contrastive learning.	Rahul Ramesh; Jialin Mao; Itay Griniasty; Rubing Yang; Han Kheng Teoh; Mark Transtrum; James Sethna; Pratik Chaudhari;
643	Towards Quantum Machine Learning for Constrained Combinatorial Optimization: A Quantum QAP Solver Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel quantum neural network (QNN) for learning CO problems in a supervised manner to achieve better and faster results.	Xinyu Ye; Ge Yan; Junchi Yan;
644	Transformers As Algorithms: Generalization and Stability in In-context Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we formalize in-context learning as an algorithm learning problem where a transformer model implicitly constructs a hypothesis function at inference-time.	Yingcong Li; Muhammed Emrullah Ildiz; Dimitris Papailiopoulos; Samet Oymak;
645	Estimation Beyond Data Reweighting: Kernel Method of Moments Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the use of $\varphi$-divergences effectively limits the candidate distributions to reweightings of the data samples. We lift this long-standing limitation and provide a method of moments that goes beyond data reweighting.	Heiner Kremer; Yassine Nemmour; Bernhard Schölkopf; Jia-Jie Zhu;
646	Importance Weighted Expectation-Maximization for Protein Sequence Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose IsEM-Pro, an approach to generate protein sequences towards a given fitness criterion.	Zhenqiao Song; Lei Li;
647	Dissecting The Effects of SGD Noise in Distinct Regimes of Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Understanding when the noise in stochastic gradient descent (SGD) affects generalization of deep neural networks remains a challenge, complicated by the fact that networks can operate in distinct training regimes. Here we study how the magnitude of this noise $T$ affects performance as the size of the training set $P$ and the scale of initialization $\alpha$ are varied.	Antonio Sclocchi; Mario Geiger; Matthieu Wyart;
648	Neural Status Registers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Neither architecture can extrapolate to much larger numbers than those seen in the training set. We propose a novel differentiable architecture, the Neural Status Register (NSR) to solve this problem.	Lukas Faber; Roger Wattenhofer;
649	Convex Geometry of ReLU-layers, Injectivity on The Ball and Local Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The paper uses a frame-theoretic setting to study the injectivity of a ReLU-layer on the closed ball of $\mathbb{R}^n$ and its non-negative part.	Daniel Haider; Martin Ehler; Peter Balazs;
650	Improving Adversarial Robustness Through The Contrastive-Guided Diffusion Process Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Thus, we propose the Contrastive-Guided Diffusion Process (Contrastive-DP), which incorporates the contrastive loss to guide the diffusion model in data generation.	Yidong Ouyang; Liyan Xie; Guang Cheng;
651	High-Probability Bounds for Stochastic Optimization and Variational Inequalities: The Case of Unbounded Variance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose several algorithms with high-probability convergence results under less restrictive assumptions.	Abdurakhmon Sadiev; Marina Danilova; Eduard Gorbunov; Samuel Horváth; Gauthier Gidel; Pavel Dvurechensky; Alexander Gasnikov; Peter Richtárik;
652	Linkless Link Prediction Via Relational Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, to combine the advantages of GNNs and MLPs, we start with exploring direct knowledge distillation (KD) methods for link prediction, i.e., predicted logit-based matching and node representation-based matching.	Zhichun Guo; William Shiao; Shichang Zhang; Yozen Liu; Nitesh Chawla; Neil Shah; Tong Zhao;
653	FedAvg Converges to Zero Training Loss Linearly for Overparameterized Multi-Layer Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we show that data heterogeneity can be dealt from a different perspective.	Bingqing Song; Prashant Khanduri; Xinwei Zhang; Jinfeng Yi; Mingyi Hong;
654	Subset Selection Based On Multiple Rankings in The Presence of Bias: Effectiveness of Fairness Constraints for Multiwinner Voting Score Functions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We consider the problem of subset selection where one is given multiple rankings of items and the goal is to select the highest quality subset.	Niclas Boehmer; L. Elisa Celis; Lingxiao Huang; Anay Mehrotra; Nisheeth K Vishnoi;
655	LeadFL: Client Self-Defense Against Model Poisoning in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a client-self defense, LeadFL, that is combined with existing server-side defenses to thwart backdoor and targeted attacks.	Chaoyi Zhu; Stefanie Roos; Lydia Y. Chen;
656	Phase-aware Adversarial Defense for Improving Adversarial Robustness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, inspired by the cognitive science, we investigate the interference of adversarial noise from the perspective of image phase, and find ordinarily-trained models lack enough robustness against phase-level perturbations.	Dawei Zhou; Nannan Wang; Heng Yang; Xinbo Gao; Tongliang Liu;
657	Expertise Trees Resolve Knowledge Limitations in Collective Decision-Making Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In practice, this may lead to sub-optimal or discriminatory decisions against minority cases. In this work, we model such changes in depth and breadth of knowledge as a partitioning of the problem space into regions of differing expertise.	Axel Abels; Tom Lenaerts; Vito Trianni; Ann Nowe;
658	Efficient Learning of Mesh-Based Physical Simulation with Bi-Stride Multi-Scale Graph Neural Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by the bipartite graph determination, we propose a novel pooling strategy, bi-stride* to tackle the aforementioned limitations.*	Yadi Cao; Menglei Chai; Minchen Li; Chenfanfu Jiang;
659	Quantum 3D Graph Learning with Applications to Molecule Embedding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, for the first time to our best knowledge, we propose a quantum 3D embedding ansatz that learns the latent representation of 3D structures from the Hilbert space composed of the Bloch sphere of each qubit.	Ge Yan; Huaijin Wu; Junchi Yan;
660	Bag of Tricks for Training Data Extraction from Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate and benchmark tricks for improving training data extraction using a publicly available dataset.	Weichen Yu; Tianyu Pang; Qian Liu; Chao Du; Bingyi Kang; Yan Huang; Min Lin; Shuicheng YAN;
661	Comparison of Meta-learners for Estimating Multi-valued Treatment Heterogeneous Effects Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce and discuss meta-learners that perform well as the number of treatments increases.	Naoufal Acharki; Ramiro Lugo; Antoine Bertoncello; Josselin Garnier;
662	Revisiting Gradient Clipping: Stochastic Bias and Tight Convergence Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we give convergence guarantees that show precise dependence on arbitrary clipping thresholds $c$ and show that our guarantees are tight with both deterministic and stochastic gradients.	Anastasia Koloskova; Hadrien Hendrikx; Sebastian U Stich;
663	Convergence of Proximal Point and Extragradient-Based Methods Beyond Monotonicity: The Case of Negative Comonotonicity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by non-monotone machine learning applications, we follow the line of works (Diakonikolas et al., 2021; Lee & Kim, 2021; Pethick et al., 2022; Bohm,2022) aiming at going beyond monotonicity by considering the weaker negative comonotonicity assumption. In this work, we provide tight complexity analyses for the Proximal Point (PP), Extragradient (EG), and Optimistic Gradient (OG) methods in this setup, closing several questions on their working guarantees beyond monotonicity.	Eduard Gorbunov; Adrien Taylor; Samuel Horváth; Gauthier Gidel;
664	One-Step Estimator for Permuted Sparse Recovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper considers the unlabeled sparse recovery under multiple measurements, i.e., ${\mathbf{Y}} = {\mathbf{\Pi}}^{\natural} {\mathbf{X}} {\mathbf{B}}^{\natural} + {\mathbf{W}}$, where ${\mathbf{Y}} \in \mathbb{R}^{n\times m}, {\mathbf{\Pi}}^{\natural}\in \mathbb{R}^{n\times n}, {\mathbf{X}} \in \mathbb{R}^{n\times p}, {\mathbf{B}} ^{\natural}\in \mathbb{R}^{p\times m}, {\mathbf{W}}\in \mathbb{R}^{n\times m}$ represents the observations, missing (or incomplete) correspondence information, sensing matrix, sparse signals, and additive sensing noise, respectively.	Hang Zhang; Ping Li;
665	Fast Online Node Labeling for Very Large Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose an improvement based on the online relaxation* technique introduced by a series of works (Rakhlin et al., 2012; Rakhlin & Sridharan, 2015; 2017).*	Baojian Zhou; Yifan Sun; Reza Babanezhad Harikandeh;
666	What Can Be Learnt With Wide Convolutional Neural Networks? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study infinitely-wide deep CNNs in the kernel regime.	Francesco Cagnetta; Alessandro Favero; Matthieu Wyart;
667	Language Instructed Reinforcement Learning for Human-AI Coordination Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This problem is challenging, especially in domains that lack high quality human behavioral data, because multi-agent reinforcement learning (RL) often converges to different equilibria from the ones that humans prefer. We propose a novel framework, instructRL, that enables humans to specify what kind of strategies they expect from their AI partners through natural language instructions.	Hengyuan Hu; Dorsa Sadigh;
668	Structured Cooperative Learning with Graphical Model Priors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Structured Cooperative Learning (SCooL), in which a cooperation graph across devices is generated by a graphical model prior to automatically coordinate mutual learning between devices.	Shuangtong Li; Tianyi Zhou; Xinmei Tian; Dacheng Tao;
669	A Mathematical Model for Curriculum Learning for Parities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a CL model for learning the class of k-parities on d bits of a binary string with a neural network trained by stochastic gradient descent (SGD).	Elisabetta Cornacchia; Elchanan Mossel;
670	A Model-free Closeness-of-influence Test for Features in Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Ideally, it is desired to understand how a set of collected features combine together and influence the response value, but this problem is notoriously difficult, due to the high-dimensionality of data and limited number of labeled data points, among many others. In this work, we take a new perspective on this problem, and we study the question of assessing the difference of influence that the two given features have on the response value.	Mohammad Mehrabi; Ryan A. Rossi;
671	Conditional Graph Information Bottleneck for Molecular Relational Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel relational learning framework, called CGIB, that predicts the interaction behavior between a pair of graphs by detecting core subgraphs therein.	Namkyeong Lee; Dongmin Hyun; Gyoung S. Na; Sungwon Kim; Junseok Lee; Chanyoung Park;
672	Guiding Pretraining in Reinforcement Learning with Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We describe a method that uses background knowledge from text corpora to shape exploration.	Yuqing Du; Olivia Watkins; Zihan Wang; Cédric Colas; Trevor Darrell; Pieter Abbeel; Abhishek Gupta; Jacob Andreas;
673	A Neural PDE Solver with Temporal Stencil Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this study shows that significant information is often lost in the low-resolution down-sampled features. To address such issues, we propose a new approach, namely Temporal Stencil Modeling (TSM), which combines the strengths of advanced time-series sequence modeling (with the HiPPO features) and state-of-the-art neural PDE solvers (with learnable stencil modeling).	Zhiqing Sun; Yiming Yang; Shinjae Yoo;
674	Learning Perturbations to Explain Time Series Predictions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we aim to explain predictions by learning not only masks, but also associated perturbations.	Joseph Enguehard;
675	Learning to Suggest Breaks: Sustainable Optimization of Long-Term User Engagement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the role of breaks in recommendation, and propose a framework for learning optimal breaking policies that promote and sustain long-term engagement.	Eden Saig; Nir Rosenfeld;
676	Training Normalizing Flows from Dependent Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a likelihood objective of normalizing flows incorporating dependencies between the data points, for which we derive a flexible and efficient learning algorithm suitable for different dependency structures.	Matthias Kirchler; Christoph Lippert; Marius Kloft;
677	The Power of Uniform Sampling for K-Median Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the power of uniform sampling for $k$-Median in various metric spaces.	Lingxiao Huang; Shaofeng H.-C. Jiang; Jianing Lou;
678	Towards Explaining Distribution Shifts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We derive our interpretable mappings from a relaxation of the optimal transport problem, where the candidate mappings are restricted to a set of interpretable mappings.	Sean Kulinski; David I. Inouye;
679	On Second-Order Scoring Rules for Epistemic Uncertainty Quantification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, recent work has revealed serious theoretical shortcomings for second-order predictors based on loss minimisation. In this paper, we generalise these findings and prove a more fundamental result: There seems to be no loss function that provides an incentive for a second-order learner to faithfully represent its epistemic uncertainty in the same manner as proper scoring rules do for standard (first-order) learners.	Viktor Bengs; Eyke Hüllermeier; Willem Waegeman;
680	Generated Graph Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose the first framework to systematically investigate a set of sophisticated models and their performance in four classification scenarios.	Yihan Ma; Zhikun Zhang; Ning Yu; Xinlei He; Michael Backes; Yun Shen; Yang Zhang;
681	The Catalog Problem: Clustering and Ordering Variable-Sized Sets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite progress in both neural clustering and set-to-sequence methods, no joint, fully differentiable model exists to-date. We develop such a modular architecture, referred to further as Neural Ordered Clusters (NOC), enhance it with a specific mechanism for learning cluster-level cardinality constraints, and provide a robust comparison of its performance in relation to alternative models.	Mateusz Maria Jurewicz; Graham W. Taylor; Leon Derczynski;
682	Efficient Transformed Gaussian Processes for Non-Stationary Dependent Multi-class Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work introduces the Efficient Transformed Gaussian Process (ETGP), a new way of creating $C$ stochastic processes characterized by: 1) the $C$ processes are non-stationary, 2) the $C$ processes are dependent by construction without needing a mixing matrix, 3) training and making predictions is very efficient since the number of Gaussian Processes (GP) operations (e.g. inverting the inducing point’s covariance matrix) do not depend on the number of processes. This makes the ETGP particularly suited for multi-class problems with a very large number of classes, which are the problems studied in this work.	Juan Maroñas; Daniel Hernández-Lobato;
683	MetaModulation: Learning Variational Feature Hierarchies for Few-Shot Learning with Fewer Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Meta-learning algorithms are able to learn a new task using previously learned knowledge, but they often require a large number of meta-training tasks which may not be readily available. To address this issue, we propose a method for few-shot learning with fewer tasks, which we call MetaModulation.	Wenfang Sun; Yingjun Du; Xiantong Zhen; Fan Wang; Ling Wang; Cees G. M. Snoek;
684	Adaptive Smoothing Gradient Learning for Spiking Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose a methodology such that training a prototype neural network will evolve into training an SNN gradually by fusing the learnable relaxation degree into the network with random spike noise.	Ziming Wang; Runhao Jiang; Shuang Lian; Rui Yan; Huajin Tang;
685	An Instrumental Variable Approach to Confounded Off-Policy Evaluation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper develops an instrumental variable (IV)-based method for consistent OPE in confounded sequential decision making.	Yang Xu; Jin Zhu; Chengchun Shi; Shikai Luo; Rui Song;
686	Scalable Multi-Agent Reinforcement Learning Through Intelligent Information Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose InforMARL, a novel architecture for multi-agent reinforcement learning (MARL) which uses local information intelligently to compute paths for all the agents in a decentralized manner.	Siddharth Nayak; Kenneth Choi; Wenqi Ding; Sydney Dolan; Karthik Gopalakrishnan; Hamsa Balakrishnan;
687	Out-of-Domain Robustness Via Targeted Augmentations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study principles for designing data augmentations for out-of-domain (OOD) generalization.	Irena Gao; Shiori Sagawa; Pang Wei Koh; Tatsunori Hashimoto; Percy Liang;
688	Constrained Causal Bayesian Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose different surrogate models that enable to integrate observational and interventional data while capturing correlation among effects with increasing levels of sophistication.	Virginia Aglietti; Alan Malek; Ira Ktena; Silvia Chiappa;
689	Dynamic Constrained Submodular Optimization with Polylogarithmic Update Time Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop a simpler algorithm for the problem that maintains a $(\frac{1}{2}-\epsilon)$-approximate solution for submodular maximization under cardinality constraint $k$ using a polylogarithmic amortized update time.	Kiarash Banihashem; Leyla Biabani; Samira Goudarzi; MohammadTaghi Hajiaghayi; Peyman Jabbarzade; Morteza Monemizadeh;
690	Deep Clustering with Incomplete Noisy Pairwise Annotations: A Geometric Regularization Approach Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work first takes a deep look into a recently emerged logistic loss function of DCC, and characterizes its theoretical properties.	Tri Nguyen; Shahana Ibrahim; Xiao Fu;
691	Self-supervised Learning of Split Invariant Equivariant Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We aim at bridging the gap between the two in order to learn more diverse representations that are suitable for a wide range of tasks.	Quentin Garrido; Laurent Najman; Yann LeCun;
692	Distance Weighted Supervised Learning for Offline Interaction Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge the gap between IL and RL, we introduce Distance Weighted Supervised Learning or DWSL, a supervised method for learning goal-conditioned policies from offline data.	Joey Hejna; Jensen Gao; Dorsa Sadigh;
693	A Theoretical Analysis of The Learning Dynamics Under Class Imbalance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our main contribution is the analysis of the convergence of full-batch (GD) and stochastic gradient descent (SGD), and of variants that renormalize the contribution of each per-class gradient.	Emanuele Francazi; Marco Baity-Jesi; Aurelien Lucchi;
694	PCA-based Multi-Task Learning: A Random Matrix Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The article proposes and theoretically analyses a computationally efficient multi-task learning (MTL) extension of popular principal component analysis (PCA)-based supervised learning schemes.	Malik Tiomoko; Romain Couillet; Frederic Pascal;
695	Low-Switching Policy Gradient with Exploration Via Online Sensitivity Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet, recent advances for this problems have only been successful in tabular and linear setting, whose benign structures cannot be generalized to non-linearly parameterized policies. In this paper, we address this problem by leveraging recent advances in value-based algorithms, including bounded eluder-dimension and online sensitivity sampling, to design a low-switching sample-efficient policy optimization algorithm, LPO, with general non-linear function approximation.	Yunfan Li; Yiran Wang; Yu Cheng; Lin Yang;
696	Improving Adversarial Robustness of Deep Equilibrium Models with Explicit Regulations Along The Neural Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Informed by the correlation between the entropy of dynamical systems and their stability properties, we propose reducing prediction entropy by progressively updating inputs along the neural dynamics.	Zonghan Yang; Peng Li; Tianyu Pang; Yang Liu;
697	Protecting Language Generation Models Via Invisible Watermarking Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these methods can be nullified by obvious countermeasures such as “synonym randomization”. To address this issue, we propose GINSW, a novel method to protect text generation models from being stolen through distillation.	Xuandong Zhao; Yu-Xiang Wang; Lei Li;
698	Towards Practical Preferential Bayesian Optimization with Skew Gaussian Processes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study preferential Bayesian optimization (BO) where reliable feedback is limited to pairwise comparison called duels.	Shion Takeno; Masahiro Nomura; Masayuki Karasuyama;
699	End-to-End Learning for Stochastic Optimization: A Bayesian Perspective Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop a principled approach to end-to-end learning in stochastic optimization.	Yves Rychener; Daniel Kuhn; Tobias Sutter;
700	Label Distributionally Robust Losses for Multi-class Classification: Consistency, Robustness and Adaptivity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our contributions include: (1) we study both consistency and robustness by establishing top-$k$ ($\forall k\geq 1$) consistency of LDR losses for multi-class classification, and a negative result that a top-$1$ consistent and symmetric robust loss cannot achieve top-$k$ consistency simultaneously for all $k\geq 2$; (2) we propose a new adaptive LDR loss that automatically adapts the individualized temperature parameter to the noise degree of class label of each instance; (3) we demonstrate stable and competitive performance for the proposed adaptive LDR loss on 7 benchmark datasets under 6 noisy label and 1 clean settings against 13 loss functions, and on one real-world noisy dataset.	Dixian Zhu; Yiming Ying; Tianbao Yang;
701	Revisiting Domain Randomization Via Relaxed State-Adversarial Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, the existing methods often assume that the distribution of domain parameters belongs to a specific family of probability functions, such as normal distributions, which may not be correct. To overcome these limitations, we propose a new approach to DR by rethinking it from the perspective of adversarial state perturbation, without the need for reconfiguring the simulator or relying on prior knowledge about the environment.	Yun-Hsuan Lien; Ping-Chun Hsieh; Yu-Shuen Wang;
702	GFlowOut: Dropout with Generative Flow Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These methods face two important challenges: (a) the posterior distribution over masks can be highly multi-modal which can be difficult to approximate with standard variational inference and (b) it is not trivial to fully utilize sample-dependent information and correlation among dropout masks to improve posterior estimation. In this work, we propose GFlowOut to address these issues.	Dianbo Liu; Moksh Jain; Bonaventure F. P. Dossou; Qianli Shen; Salem Lahlou; Anirudh Goyal; Nikolay Malkin; Chris Chinenye Emezue; Dinghuai Zhang; Nadhir Hassen; Xu Ji; Kenji Kawaguchi; Yoshua Bengio;
703	Emergence of Sparse Representations from Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Surprisingly, we discover that noisy training introduces three implicit loss terms that result in sparsely firing neurons specializing to high variance features of the dataset.	Trenton Bricken; Rylan Schaeffer; Bruno Olshausen; Gabriel Kreiman;
704	Non-stationary Reinforcement Learning Under General Function Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, theoretical understanding of non-stationary MDPs with general function approximation is still limited. In this paper, we make the first such an attempt.	Songtao Feng; Ming Yin; Ruiquan Huang; Yu-Xiang Wang; Jing Yang; Yingbin Liang;
705	MetaDiffuser: Diffusion Model As Conditional Planner for Offline Meta-RL Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these works mostly lack the generalization ability across tasks with reward or dynamics change. To tackle this challenge, in this paper we propose a task-oriented conditioned diffusion planner for offline meta-RL(MetaDiffuser), which considers the generalization problem as conditional trajectory generation task with contextual representation.	Fei Ni; Jianye HAO; Yao Mu; Yifu Yuan; YAN ZHENG; Bin Wang; Zhixuan Liang;
706	SemSup-XC: Semantic Supervision for Zero and Few-shot Extreme Classification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we develop SemSup-XC, a model that achieves state-of-the-art zero-shot and few-shot performance on three XC datasets derived from legal, e-commerce, and Wikipedia data.	Pranjal Aggarwal; Ameet Deshpande; Karthik R Narasimhan;
707	Improved Active Multi-Task Representation Learning Via Lasso Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we show the strict dominance of the L1-regularized-relevance-based ($\nu^1$-based) strategy by giving a lower bound for the $\nu^2$-based strategy.	Yiping Wang; Yifang Chen; Kevin Jamieson; Simon Shaolei Du;
708	Convergence of First-Order Methods for Constrained Nonconvex Optimization with Dependent Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We focus on analyzing the classical stochastic projected gradient methods under a general dependent data sampling scheme for constrained smooth nonconvex optimization.	Ahmet Alacaoglu; Hanbaek Lyu;
709	SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We analyze the performance of existing model-parallel algorithms in these conditions and find configurations where training larger models becomes less communication-intensive. Based on these findings, we propose SWARM Parallelism (Stochastically Wired Adaptively Rebalanced Model Parallelism), a model-parallel training algorithm designed for poorly connected, heterogeneous and unreliable devices.	Max Ryabinin; Tim Dettmers; Michael Diskin; Alexander Borzunov;
710	Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we study the problem of (finite horizon tabular) Markov decision processes (MDPs) with heavy-tailed rewards under the constraint of differential privacy (DP).	Yulian Wu; Xingyu Zhou; Sayak Ray Chowdhury; Di Wang;
711	Rotation and Translation Invariant Representation Learning with Implicit Neural Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose Invariant Representation Learning with Implicit Neural Representation (IRL-INR), which uses an implicit neural representation (INR) with a hypernetwork to obtain semantic representations disentangled from the orientation of the image.	Sehyun Kwon; Joo Young Choi; Ernest K. Ryu;
712	Optimizing Hyperparameters with Conformal Quantile Regression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to leverage conformalized quantile regression which makes minimal assumptions about the observation noise and, as a result, models the target function in a more realistic and robust fashion which translates to quicker HPO convergence on empirical benchmarks.	David Salinas; Jacek Golebiowski; Aaron Klein; Matthias Seeger; Cedric Archambeau;
713	Recovering Top-Two Answers and Confusion Probability in Multi-Choice Crowdsourcing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The most confusing answer provides useful information about the task by revealing the most plausible answer other than the ground truth and how plausible it is. To theoretically analyze such scenarios, we propose a model in which there are the top two plausible answers for each task, distinguished from the rest of the choices.	Hyeonsu Jeong; Hye Won Chung;
714	Hyperbolic Diffusion Embedding and Distance for Hierarchical Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a new method for hierarchical data embedding and distance.	Ya-Wei Eileen Lin; Ronald R. Coifman; Gal Mishne; Ronen Talmon;
715	Graph Neural Networks with Learnable and Optimal Polynomial Bases Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose two spectral GNN models that provide positive answers to the questions posed above.	Yuhe Guo; Zhewei Wei;
716	Emergent Agentic Transformer from Chain of Hindsight Experience Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In reinforcement learning (RL), despite many efforts into transformer-based policies, a key limitation, however, is that current transformer-based policies cannot learn by directly combining information from multiple sub-optimal trials. In this work, we address this issue using recently proposed chain of hindsight to relabel experience, where we train a transformer on a sequence of trajectory experience ascending sorted according to their total rewards.	Hao Liu; Pieter Abbeel;
717	FedHPO-Bench: A Benchmark Suite for Federated Hyperparameter Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To facilitate the research of FedHPO, we propose and implement a benchmark suite FedHPO-Bench that incorporates comprehensive FedHPO problems, enables flexible customization of the function evaluations, and eases continuing extensions.	Zhen WANG; Weirui Kuang; Ce Zhang; Bolin Ding; Yaliang Li;
718	Bandits with Knapsacks: Advice on Time-Varying Demands Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, with online predictions on $Q$, we propose an online algorithm that judiciously incorporates the predictions, and achieve regret bounds that depends on the accuracy of the predictions.	Lixing Lyu; Wang Chi Cheung;
719	Adaptive Compositional Continual Meta-Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an Adaptive Compositional Continual Meta-Learning (ACML) algorithm, which employs a compositional premise to associate a task with a subset of mixture components, allowing meta-knowledge sharing among heterogeneous tasks.	Bin Wu; Jinyuan Fang; xiangxiang Zeng; Shangsong Liang; Qiang Zhang;
720	Integrating Prior Knowledge in Contrastive Learning with Kernel Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we open the door to new perspectives for CL by integrating prior knowledge, given either by generative models – viewed as prior representations – or weak attributes in the positive and negative sampling.	Benoit Dufumier; Carlo Alberto Barbano; Robin Louiset; Edouard Duchesnay; Pietro Gori;
721	The Saddle-Point Method in Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We characterize the differential privacy guarantees of privacy mechanisms in the large-composition regime, i.e., when a privacy mechanism is sequentially applied a large number of times to sensitive data.	Wael Alghamdi; Juan Felipe Gomez; Shahab Asoodeh; Flavio Calmon; Oliver Kosut; Lalitha Sankar;
722	From Temporal to Contemporaneous Iterative Causal Discovery in The Presence of Latent Confounders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a constraint-based algorithm for learning causal structures from observational time-series data, in the presence of latent confounders.	Raanan Yehezkel Rohekar; Shami Nisimov; Yaniv Gurwicz; Gal Novik;
723	Offline Meta Reinforcement Learning with In-Distribution Online Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our theory finds that out-of-distribution adaptation episodes may lead to unreliable policy evaluation and that online adaptation with in-distribution episodes can ensure adaptation performance guarantee. Based on these theoretical insights, we propose a novel adaptation framework, called In-Distribution online Adaptation with uncertainty Quantification (IDAQ), which generates in-distribution context using a given uncertainty quantification and performs effective task belief inference to address new tasks.	Jianhao Wang; Jin Zhang; Haozhe Jiang; Junyu Zhang; Liwei Wang; Chongjie Zhang;
724	Accelerated Infeasibility Detection of Constrained Optimization and Fixed-Point Iterations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we characterize the optimal accelerated rate of infeasibility detection.	Jisun Park; Ernest K. Ryu;
725	Restoration Based Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we establish the interpretation of DDMs in terms of image restoration (IR).	Jaemoo Choi; Yesom Park; Myungjoo Kang;
726	Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study PO in adversarial MDPs with a challenge that arises in almost every real-world application — delayed bandit feedback.	Tal Lancewicki; Aviv Rosenberg; Dmitry Sotnikov;
727	One-Shot Compression of Large Edge-Exchangeable Graphs Using Bits-Back Coding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a one-shot method for compressing large labeled graphs called Random Edge Coding.	Daniel Severo; James Townsend; Ashish J Khisti; Alireza Makhzani;
728	GOAT: A Global Transformer on Large-scale Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Meanwhile, existing GNN architectures are limited in their ability to perform equally well on both homophilious and heterophilious graphs as their inductive biases are generally tailored to only one setting. To address these issues, we propose GOAT, a scalable global graph transformer.	Kezhi Kong; Jiuhai Chen; John Kirchenbauer; Renkun Ni; C. Bayan Bruss; Tom Goldstein;
729	Effectively Using Public Data in Privacy Preserving Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we further explore the potential of using public data in DP models, showing that utility gains can in fact be significantly higher than what shown in prior works. Specifically, we introduce DOPE-SGD, a modified DP-SGD algorithm that leverages public data during its training.	Milad Nasr; Saeed Mahloujifar; Xinyu Tang; Prateek Mittal; Amir Houmansadr;
730	Learning to Initiate and Reason in Event-Driven Cascading Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new supervised learning setup called Cascade.	Yuval Atzmon; Eli Meirom; Shie Mannor; Gal Chechik;
731	Anchor Sampling for Federated Learning with Partial Client Participation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Training with large batches on individual clients is proposed to address data heterogeneity in general, but their effectiveness under partial client participation is not clear. Motivated by these challenges, we propose to develop a novel federated learning framework, referred to as FedAMD, for partial client participation.	Feijie Wu; Song Guo; Zhihao Qu; Shiqi He; Ziming Liu; Jing Gao;
732	Scaling Laws for Generative Mixed-Modal Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We report new mixed-modal scaling laws that unify the contributions of individual modalities and the interactions between them.	Armen Aghajanyan; LILI YU; Alexis Conneau; Wei-Ning Hsu; Karen Hambardzumyan; Susan Zhang; Stephen Roller; Naman Goyal; Omer Levy; Luke Zettlemoyer;
733	Learning for Edge-Weighted Online Bipartite Matching with Robustness Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel RL-based approach to edge-weighted online bipartite matching with robustness guarantees (LOMAR), achieving both good average-case and worst-case performance.	Pengfei Li; Jianyi Yang; Shaolei Ren;
734	Improved Algorithms for Multi-period Multi-class Packing Problems with Bandit Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a bandit policy that is a closed-form function of said estimated parameters.	Wonyoung Kim; Garud Iyengar; assaf zeevi;
735	Temporally Consistent Transformers for Video Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we curate 3 challenging video datasets with long-range dependencies by rendering walks through 3D scenes of procedural mazes, Minecraft worlds, and indoor scans.	Wilson Yan; Danijar Hafner; Stephen James; Pieter Abbeel;
736	Towards Better Graph Representation Learning with Parameterized Decomposition & Filtering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we develop a novel and general framework which unifies many existing GNN models from the view of parameterized decomposition and filtering, and show how it helps to enhance the flexibility of GNNs while alleviating the smoothness and amplification issues of existing models.	Mingqi Yang; Wenjie Feng; Yanming Shen; Bryan Hooi;
737	Large Language Models Can Be Easily Distracted By Irrelevant Context Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate the distractibility* of large language models, i.e., how the model prediction can be distracted by irrelevant context.*	Freda Shi; Xinyun Chen; Kanishka Misra; Nathan Scales; David Dohan; Ed H. Chi; Nathanael Schärli; Denny Zhou;
738	GraphCleaner: Detecting Mislabelled Samples in Popular Graph Learning Benchmarks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite increasing efforts towards improving the quality of generic data types, such as images and texts, the problem of mislabel detection in graph data remains underexplored. To bridge the gap, we explore mislabelling issues in popular real-world graph datasets and propose GraphCleaner, a post-hoc method to detect and correct these mislabelled nodes in graph datasets.	Yuwen Li; Miao Xiong; Bryan Hooi;
739	Data Structures for Density Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study statistical/computational tradeoffs for the following density estimation problem: given $k$ distributions $v_1, \ldots, v_k$ over a discrete domain of size $n$, and sampling access to a distribution $p$, identify $v_i$ that is close to $p$.	Anders Aamand; Alexandr Andoni; Justin Y Chen; Piotr Indyk; Shyam Narayanan; Sandeep Silwal;
740	Long-Tailed Recognition By Mutual Information Maximization Between Latent Features and Ground-Truth Labels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The goal of this paper is to provide the background and further improve the performance.	Min-Kook Suh; Seung-Woo Seo;
741	Identifiability of Label Noise Transition Matrix Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of this paper is to characterize the identifiability of the label noise transition matrix.	Yang Liu; Hao Cheng; Kun Zhang;
742	Proper Scoring Rules for Survival Analysis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Although there are fundamental theories on strictly proper scoring rules for uncertainty quantification, little is known about those for survival analysis. In this paper, we investigate extensions of four major strictly proper scoring rules for survival analysis and we prove that these extensions are proper under certain conditions, which arise from the discretization of the estimation of probability distributions.	Hiroki Yanagisawa;
743	LazyGNN: Large-Scale Graph Neural Networks Via Lazy Propagation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose to capture long-distance dependency in graphs by shallower models instead of deeper models, which leads to a much more efficient model, LazyGNN, for graph representation learning.	Rui Xue; Haoyu Han; MohamadAli Torkamani; Jian Pei; Xiaorui Liu;
744	Efficient List-Decodable Regression Using Batches Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate the use of batches in studying list-decodable linear regression, in which only $\alpha\in (0,1]$ fraction of batches contain genuine samples from a common distribution and the rest can contain arbitrary or even adversarial samples.	Abhimanyu Das; Ayush Jain; Weihao Kong; Rajat Sen;
745	Discovering Object-Centric Generalized Value Functions From Pixels Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Automatically learning such representations in an object-centric manner geared towards control and fast adaptation remains an open research problem. In this paper, we introduce a method that tries to discover meaningful features from objects, translating them to temporally coherent `question’ functions and leveraging the subsequent learned general value functions for control.	Somjit Nath; Gopeshh Raaj Subbaraj; Khimya Khetarpal; Samira Ebrahimi Kahou;
746	Parallel $Q$-Learning: Scaling Off-policy Reinforcement Learning Under Massively Parallel Simulation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel Parallel Q-Learning (PQL) scheme that outperforms PPO in terms of wall-clock time and maintains superior sample efficiency.	Zechu Li; Tao Chen; Zhang-Wei Hong; Anurag Ajay; Pulkit Agrawal;
747	Matrix Estimation for Individual Fairness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In recent years, multiple notions of algorithmic fairness have arisen. One such notion is individual fairness (IF), which requires that individuals who are similar receive similar treatment. In parallel, matrix estimation (ME) has emerged as a natural paradigm for handling noisy data with missing values. In this work, we connect the two concepts.	Cindy Zhang; Sarah Huiyi Cen; Devavrat Shah;
748	Instrumental Variable Estimation of Average Partial Causal Effects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of estimating the average partial causal effect (APCE) of a continuous treatment in an IV setting.	Yuta Kawakami; manabu kuroki; Jin Tian;
749	One-sided Matrix Completion from Two Observations Per Row Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a natural algorithm that involves imputing the missing values of the matrix $X^TX$ and show that even with only two observations per row in $X$, we can provably recover $X^TX$ as long as we have at least $\Omega(r^2 d \log d)$ rows, where $r$ is the rank and $d$ is the number of columns.	Steven Cao; Percy Liang; Gregory Valiant;
750	Monge, Bregman and Occam: Interpretable Optimal Transport in High-Dimensions with Feature-Sparse Maps Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new research direction, that leverages a specific translation invariant cost $c(x, y):=h(x-y)$ inspired by the elastic net.	marco cuturi; Michal Klein; Pierre Ablin;
751	Hierarchical Neural Coding for Controllable CAD Model Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a novel generative model for Computer Aided Design (CAD) that 1) represents high-level design concepts of a CAD model as a three-level hierarchical tree of neural codes, from global part arrangement down to local curve geometry; and 2) controls the generation or completion of CAD models by specifying the target design using a code tree.	Xiang Xu; Pradeep Kumar Jayaraman; Joseph George Lambourne; Karl D.D. Willis; Yasutaka Furukawa;
752	Change Is Hard: A Closer Look at Subpopulation Shift Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we provide a fine-grained analysis of subpopulation shift.	Yuzhe Yang; Haoran Zhang; Dina Katabi; Marzyeh Ghassemi;
753	Ske2Grid: Skeleton-to-Grid Representation Learning for Action Recognition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents Ske2Grid, a new representation learning framework for improved skeleton-based action recognition.	Dongqi Cai; Yangyuxuan Kang; Anbang Yao; Yurong Chen;
754	Stabilizing GANs’ Training with Brownian Motion Controller Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we examine the stability of GANs from the perspective of control theory and propose a universal higher-order noise-based controller called Brownian Motion Controller (BMC).	Tianjiao Luo; Ziyu Zhu; Jianfei Chen; Jun Zhu;
755	Featured Graph Coarsening with Similarity Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a novel optimization-based framework for graph coarsening that takes both the graph matrix and the node features as the input and jointly learns the coarsened graph matrix and the coarsened feature matrix while ensuring desired properties.	Manoj Kumar; Anurag Sharma; Shashwat Saxena; Sandeep Kumar;
756	Biases in Evaluation of Molecular Optimization Methods and Bias Reduction Strategies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We discuss bias reduction methods for each of the biases, and empirically investigate their effectiveness.	Hiroshi Kajino; Kohei Miyaguchi; Takayuki Osogami;
757	Policy Regularization with Dataset Constraint for Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we find that regularizing the policy towards the nearest state-action pair can be more effective and thus propose Policy Regularization with Dataset Constraint (PRDC).	Yuhang Ran; Yi-Chen Li; Fuxiang Zhang; Zongzhang Zhang; Yang Yu;
758	LESSON: Learning to Integrate Exploration Strategies for Reinforcement Learning Via An Option Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a unified framework for exploration in reinforcement learning (RL) is proposed based on an option-critic architecture.	Woojun Kim; Jeonghye Kim; Youngchul Sung;
759	R-U-SURE? Uncertainty-Aware Code Suggestions By Maximizing Utility Across Random User Intents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Randomized Utility-driven Synthesis of Uncertain REgions (R-U-SURE), an approach for building uncertainty-aware suggestions based on a decision-theoretic model of goal-conditioned utility, using random samples from a generative model as a proxy for the unobserved possible intents of the end user.	Daniel D. Johnson; Daniel Tarlow; Christian Walder;
760	SeedGNN: Graph Neural Network for Supervised Seeded Graph Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast, this paper proposes a new supervised approach that can learn from a training set how to match unseen graphs with only a few seeds.	Liren Yu; Jiaming Xu; Xiaojun Lin;
761	Learning Regions of Interest for Bayesian Optimization with Adaptive Level-Set Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a framework, called BALLET, which adaptively filters for a high-confidence region of interest (ROI) as a superlevel-set of a nonparametric probabilistic model such as a Gaussian process (GP).	Fengxue Zhang; Jialin Song; James C Bowden; Alexander Ladd; Yisong Yue; Thomas Desautels; Yuxin Chen;
762	Lower Bounds for Learning in Revealing POMDPs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies the fundamental limits of reinforcement learning (RL) in the challenging partially observable* setting.*	Fan Chen; Huan Wang; Caiming Xiong; Song Mei; Yu Bai;
763	Width and Depth Limits Commute in Residual Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that taking the width and depth to infinity in a deep neural network with skip connections, when branches are scaled by $1/\sqrt{depth}$, result in the same covariance structure no matter how that limit is taken.	Soufiane Hayou; Greg Yang;
764	Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making Using Language Guided World Modelling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose using few-shot large language models (LLMs) to hypothesize an AWM, that will be verified through world experience, to improve sample efficiency of RL agents.	Kolby Nottingham; Prithviraj Ammanabrolu; Alane Suhr; Yejin Choi; Hannaneh Hajishirzi; Sameer Singh; Roy Fox;
765	Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing works often assume a strictly positive lower bound of the cost function or an upper bound of the expected length for the optimal policy. In this paper, we propose a new algorithm to eliminate these restrictive assumptions.	Qiwei Di; Jiafan He; Dongruo Zhou; Quanquan Gu;
766	Hiding Data Helps: On The Benefits of Masking for Sparse Coding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We show in this work that, in the presence of noise, minimizing the standard dictionary learning objective can fail to recover the elements of the ground-truth dictionary in the over-realized regime, regardless of the magnitude of the signal in the data-generating process.	Muthu Chidambaram; Chenwei Wu; Yu Cheng; Rong Ge;
767	Learning to Design Analog Circuits to Meet Threshold Specifications Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a method for generating from simulation data a dataset on which a system can be trained via supervised learning to design circuits to meet threshold specifications.	Dmitrii Krylov; Pooya Khajeh; Junhan Ouyang; Thomas Reeves; Tongkai Liu; Hiba Ajmal; Hamidreza Aghasi; Roy Fox;
768	Explainability As Statistical Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we take a new route and cast interpretability as a statistical inference problem.We propose new datasets with ground truth selection which allow for the evaluation of the features importance map and show experimentally that multiple imputation provides more reasonable interpretations.	Hugo Henri Joseph Senetaire; Damien Garreau; Jes Frellsen; Pierre-Alexandre Mattei;
769	Streaming Active Learning with Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, prior active learning approaches with deep neural networks assume offline access to the entire dataset ahead of time. This paper proposes VeSSAL, a new algorithm for batch active learning with deep neural networks in streaming settings, which samples groups of points to query for labels at the moment they are encountered.	Akanksha Saran; Safoora Yousefi; Akshay Krishnamurthy; John Langford; Jordan T. Ash;
770	Multiply Robust Off-policy Evaluation and Learning Under Truncation By Death Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we formulate OPE and OPL using principal stratification under truncation by death.	Jianing Chu; Shu Yang; Wenbin Lu;
771	Shiftable Context: Addressing Training-Inference Context Mismatch in Simultaneous Speech Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such models create a context mismatch between training and inference environments, hindering potential translation accuracy. We solve this issue by proposing Shiftable Context, a simple yet effective scheme to ensure that consistent segment and context sizes are maintained throughout training and inference, even with the presence of partially filled segments due to the streaming nature of simultaneous translation.	Matthew Raffel; Drew Penney; Lizhong Chen;
772	Online Platt Scaling with Calibeating Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an online post-hoc calibration method, called Online Platt Scaling (OPS), which combines the Platt scaling technique with online logistic regression.	Chirag Gupta; Aaditya Ramdas;
773	SpENCNN: Orchestrating Encoding and Sparsity for Fast Homomorphically Encrypted Neural Network Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a HE-based fast neural network (NN) inference framework–SpENCNN built upon the co-design of HE operation-aware model sparsity and the single-instruction-multiple-data (SIMD)-friendly data packing, to improve NN inference latency.	Ran Ran; Xinwei Luo; Wei Wang; Tao Liu; Gang Quan; Xiaolin Xu; Caiwen Ding; Wujie Wen;
774	The Benefits of Mixup for Feature Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to seek a fundamental understanding of the benefits of Mixup.	Difan Zou; Yuan Cao; Yuanzhi Li; Quanquan Gu;
775	Secure Federated Correlation Test and Entropy Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the first federated correlation test framework compatible with secure aggregation, namely FED-$\chi^2$.	Qi Pang; Lun Wang; Shuai Wang; Wenting Zheng; Dawn Song;
776	Meta-Learning The Inductive Bias of Simple Neural Circuits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it is generally very difficult to map circuit structure to inductive bias. Here, we present a neural network tool to bridge this gap.	Will Dorrell; Maria Yuffa; Peter E. Latham;
777	Cyclic Block Coordinate Descent With Variance Reduction for Composite Nonconvex Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose cyclic block coordinate methods for nonconvex optimization problems with non-asymptotic gradient norm guarantees.	Xufeng Cai; Chaobing Song; Stephen Wright; Jelena Diakonikolas;
778	Neural Latent Aligner: Cross-trial Alignment for Learning Representations of Complex, Naturalistic Neural Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose a novel unsupervised learning framework, Neural Latent Aligner (NLA), to find well-constrained, behaviorally relevant neural representations of complex behaviors.	Cheol Jun Cho; Edward Chang; Gopala Anumanchipalli;
779	Provable Reset-free Reinforcement Learning By No-Regret Reduction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To make learning more practical, we propose a generic no-regret reduction to systematically design reset-free RL algorithms.	Hoai-An Nguyen; Ching-An Cheng;
780	A Universal Unbiased Method for Classification from Aggregate Observations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: CFAO is a generalized learning framework that contains various learning problems, such as multiple-instance learning and learning from label proportions. The goal of this paper is to present a novel universal method of CFAO, which holds an unbiased estimator of the classification risk for arbitrary losses—previous research failed to achieve this goal.	Zixi Wei; Lei Feng; Bo Han; Tongliang Liu; Gang Niu; Xiaofeng Zhu; Heng Tao Shen;
781	Controlling Type Confounding in Ad Hoc Teamwork with Instance-wise Teammate Feedback Rectification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work addresses the issue from the lens of causal inference.	Dong Xing; Pengjie Gu; Qian Zheng; Xinrun Wang; Shanqi Liu; Longtao Zheng; Bo An; Gang Pan;
782	Decentralized Stochastic Bilevel Optimization with Improved Per-Iteration Complexity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, it remains unknown how to design the distributed algorithm with sample complexity and convergence rate comparable to SGD for stochastic optimization, and at the same time without directly computing the exact Hessian or Jacobian matrices. In this paper we propose such an algorithm.	Xuxing Chen; Minhui Huang; Shiqian Ma; Krishna Balasubramanian;
783	On The Impact of Algorithmic Recourse on Social Segregation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address the aforementioned gaps by making one of the first attempts at analyzing the delayed societal impact of algorithmic recourse.	Ruijiang Gao; Himabindu Lakkaraju;
784	Understanding and Generalizing Contrastive Learning from The Inverse Optimal Transport Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we aim to understand and generalize CL from a point set matching perspective, instead of the comparison between two points.	Liangliang Shi; Gu Zhang; Haoyu Zhen; Jintao Fan; Junchi Yan;
785	Beyond Uniform Lipschitz Condition in Differentially Private Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We generalize uniform Lipschitzness by assuming that the per-sample gradients have sample-dependent upper bounds, i.e., per-sample Lipschitz constants, which themselves may be unbounded.	Rudrajit Das; Satyen Kale; Zheng Xu; Tong Zhang; sujay sanghavi;
786	Personalized Federated Learning Under Mixture of Distributions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, these techniques often lack the ability to adapt to unseen data, further limiting their effectiveness in real-world scenarios. To address these limitations, we propose a novel approach, FedGMM, which utilizes Gaussian mixture models (GMM) to effectively fit the input data distributions across diverse clients.	Yue Wu; SHUAICHENG ZHANG; Wenchao Yu; Yanchi Liu; Quanquan Gu; Dawei Zhou; Haifeng Chen; Wei Cheng;
787	SurCo: Learning Linear SURrogates for COmbinatorial Nonlinear Optimization Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Optimization problems with nonlinear cost functions and combinatorial constraints appear in many real-world applications but remain challenging to solve efficiently compared to their linear counterparts. To bridge this gap, we propose $\textbf{\emph{\texttt{SurCo}}}$ that learns linear $\underline{\text{Sur}}$rogate costs which can be used in existing $\underline{\text{Co}}$mbinatorial solvers to output good solutions to the original nonlinear combinatorial optimization problem.	Aaron M Ferber; Taoan Huang; Daochen Zha; Martin Schubert; Benoit Steiner; Bistra Dilkina; Yuandong Tian;
788	Towards A Persistence Diagram That Is Robust to Noise and Varied Densities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recent works have identified that existing methods, which construct persistence diagrams in Topological Data Analysis (TDA), are not robust to noise and varied densities in a point cloud. We analyze the necessary properties of an approach that can address these two issues, and propose a new filter function for TDA based on a new data-dependent kernel which possesses these properties.	Hang Zhang; Kaifeng Zhang; Kai Ming Ting; Ye Zhu;
789	Robustness in Multimodal Learning Under Train-Test Modality Mismatch Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we are concerned with understanding how models behave as the type of modalities differ between training and deployment, a situation that naturally arises in many applications of multimodal learning to hardware platforms.	Brandon McKinzie; Vaishaal Shankar; Joseph Yitan Cheng; Yinfei Yang; Jonathon Shlens; Alexander T Toshev;
790	Stochastic Gradient Succeeds for Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that the stochastic gradient bandit algorithm converges to a globally optimal policy at an $O(1/t)$ rate, even with a constant step size.	Jincheng Mei; Zixin Zhong; Bo Dai; Alekh Agarwal; Csaba Szepesvari; Dale Schuurmans;
791	Optimal Sets and Solution Paths of ReLU Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop an analytical framework to characterize the set of optimal ReLU neural networks by reformulating the non-convex training problem as a convex program.	Aaron Mishkin; Mert Pilanci;
792	Transformed Distribution Matching for Missing Value Imputation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The key to missing value imputation is to capture the data distribution with incomplete samples and impute the missing values accordingly. In this paper, by leveraging the fact that any two batches of data with missing values come from the same data distribution, we propose to impute the missing values of two batches of samples by transforming them into a latent space through deep invertible functions and matching them distributionally.	He Zhao; Ke Sun; Amir Dezfouli; Edwin V. Bonilla;
793	Demonstration-free Autonomous Reinforcement Learning Via Implicit and Bidirectional Curriculum Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In contrast, we propose a demonstration-free ARL algorithm via Implicit and Bi-directional Curriculum (IBC).	Jigang Kim; Daesol Cho; H. Jin Kim;
794	Effective Neural Topic Modeling with Embedding Clustering Regularization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new neural topic model, Embedding Clustering Regularization Topic Model (ECRTM).	Xiaobao Wu; Xinshuai Dong; Thong Thanh Nguyen; Anh Tuan Luu;
795	Traversing Between Modes in Function Space for Fast Ensembling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While this provides a way to efficiently train ensembles, for inference, multiple forward passes should still be executed using all the ensemble parameters, which often becomes a serious bottleneck for real-world deployment. In this work, we propose a novel framework to reduce such costs.	Eunggu Yun; Hyungi Lee; Giung Nam; Juho Lee;
796	What Is Essential for Unseen Goal Generalization of Offline Goal-conditioned RL? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study out-of-distribution (OOD) generalization of offline GCRL both theoretically and empirically to identify factors that are important.	Rui Yang; LIN Yong; Xiaoteng Ma; Hao Hu; Chongjie Zhang; Tong Zhang;
797	Understanding Oversquashing in GNNs Through The Lens of Effective Resistance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Understanding and mitigating oversquashing has recently received significant attention from the research community. In this paper, we continue this line of work by analyzing oversquashing through the lens of the effective resistance between nodes in the input graph.	Mitchell Black; Zhengchao Wan; Amir Nayyeri; Yusu Wang;
798	Fast Rates in Time-Varying Strongly Monotone Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a new decentralized online algorithm for time-varying strongly monotone games, which greatly improves existing results and obtains fast rates, matching the best time-invariant guarantee without knowing the environmental non-stationarity.	Yu-Hu Yan; Peng Zhao; Zhi-Hua Zhou;
799	Why Did The Model Fail?: Attributing Model Performance Changes to Distribution Shifts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce the problem of attributing performance differences between environments to distribution shifts in the underlying data generating mechanisms.	Haoran Zhang; Harvineet Singh; Marzyeh Ghassemi; Shalmali Joshi;
800	Blockwise Stochastic Variance-Reduced Methods with Parallel Speedup for Multi-Block Bilevel Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we consider non-convex multi-block bilevel optimization (MBBO) problems, which involve $m\gg 1$ lower level problems and have important applications in machine learning.	Quanqi Hu; Zi-Hao Qiu; Zhishuai Guo; Lijun Zhang; Tianbao Yang;
801	Tighter Analysis for ProxSkip Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide a tighter analysis for ProxSkip, an algorithm that allows fewer proximal operator computations to solve composite optimization problems.	Zhengmian Hu; Heng Huang;
802	Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This interpretation has motivated classifier-based and classifier-free guidance as methods for post-hoc control of diffusion models. In this work, we build upon these ideas using the score-based interpretation of diffusion models, and explore alternative ways to condition, modify, and reuse diffusion models for tasks involving compositional generation and guidance.	Yilun Du; Conor Durkan; Robin Strudel; Joshua B. Tenenbaum; Sander Dieleman; Rob Fergus; Jascha Sohl-Dickstein; Arnaud Doucet; Will Sussman Grathwohl;
803	Global Optimization with Parametric Function Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new algorithm GO-UCB that leverages a parametric family of functions (e.g., neural networks) instead.	Chong Liu; Yu-Xiang Wang;
804	Online Learning in Stackelberg Games with An Omniscient Follower Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of online learning in a two-player decentralized cooperative Stackelberg game.	Geng Zhao; Banghua Zhu; Jiantao Jiao; Michael Jordan;
805	On Distribution Dependent Sub-Logarithmic Query Time of Learned Indexing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Only existing attempt shows the same query time of $O(\log n)$, but with a constant factor improvement in space complexity over non-learned methods, under some assumptions on data distribution. In this paper, we significantly strengthen this result, showing that under mild assumptions on data distribution, and the same space complexity as non-learned methods, learned indexes can answer queries in $O(\log\log n)$ expected query time.	Sepanta Zeighami; Cyrus Shahabi;
806	Atari-5: Distilling The Arcade Learning Environment Down to Five Games Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the computational cost of generating results on the entire 57-game dataset limits ALE’s use and makes the reproducibility of many results infeasible. We propose a novel solution to this problem in the form of a principled methodology for selecting small but representative subsets of environments within a benchmark suite.	Matthew Aitchison; Penny Sweetser; Marcus Hutter;
807	Towards Robust and Safe Reinforcement Learning with Benign Off-policy Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the robuSt vAriational ofF-policy lEaRning (SAFER) approach, which only requires benign training data without attacking the agent.	Zuxin Liu; Zijian Guo; Zhepeng Cen; Huan Zhang; Yihang Yao; Hanjiang Hu; Ding Zhao;
808	Auto-Differentiation of Relational Computations for Very Large Scale Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of how to differentiate computations expressed relationally.	Yuxin Tang; Zhimin Ding; Dimitrije Jankov; Binhang Yuan; Daniel Bourgeois; Chris Jermaine;
809	Supported Trust Region Optimization for Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Supported Trust Region optimization (STR) which performs trust region policy optimization with the policy constrained within the support of the behavior policy, enjoying the less restrictive support constraint.	Yixiu Mao; Hongchang Zhang; Chen Chen; Yi Xu; Xiangyang Ji;
810	Minimizing Trajectory Curvature of ODE-based Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Based on the relationship between the forward process and the curvature, here we present an efficient method of training the forward process to minimize the curvature of generative trajectories without any ODE/SDE simulation.	Sangyun Lee; Beomsu Kim; Jong Chul Ye;
811	Communication-Efficient Federated Hypergradient Computation Via Aggregated Iterative Differentiation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel communication-efficient federated hypergradient estimator via aggregated iterative differentiation (AggITD).	Peiyao Xiao; Kaiyi Ji;
812	Motion Question Answering Via Modular Motion Programs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In order to build artificial intelligence systems that can perceive and reason with human behavior in the real world, we must first design models that conduct complex spatio-temporal reasoning over motion sequences. Moving towards this goal, we propose the HumanMotionQA task to evaluate complex, multi-step reasoning abilities of models on long-form human motion sequences	Mark Endo; Joy Hsu; Jiaman Li; Jiajun Wu;
813	Optimal Online Generalized Linear Regression with Stochastic Noise and Its Application to Heteroscedastic Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of online generalized linear regression in the stochastic setting, where the label is generated from a generalized linear model with possibly unbounded additive noise.	Heyang Zhao; Dongruo Zhou; Jiafan He; Quanquan Gu;
814	Graph Reinforcement Learning for Network Control Via Bi-Level Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we argue that data-driven strategies can automate this process and learn efficient algorithms without compromising optimality.To do so, we present network control problems through the lens of reinforcement learning and propose a graph network-based framework to handle a broad class of problems.	Daniele Gammelli; James Harrison; Kaidi Yang; Marco Pavone; Filipe Rodrigues; Francisco C. Pereira;
815	System Identification of Neural Systems: If We Got It Right, Would We Know? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We evaluate the most commonly used comparison techniques, such as a linear encoding model and centered kernel alignment, to correctly identify a model by replacing brain recordings with known ground truth models.	Yena Han; Tomaso Poggio; Brian Cheung;
816	From Robustness to Privacy and Back Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the relationship between two desiderata of algorithms in statistical inference and machine learning—differential privacy and robustness to adversarial data corruptions.	Hilal Asi; Jonathan Ullman; Lydia Zakynthinou;
817	Gradient-based Wang–Landau Algorithm: A Novel Sampler for Output Distribution of Neural Networks Over The Input Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Exhaustive enumeration or traditional Monte Carlo methods for the entire input space can exhibit impractical sampling time, especially for high-dimensional inputs. To make such difficult sampling computationally feasible, in this paper, we propose a novel Gradient-based Wang-Landau (GWL) sampler.	Weitang Liu; Yi-Zhuang You; Ying Wai Li; Jingbo Shang;
818	Spatial-Temporal Graph Learning with Adversarial Contrastive Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The ubiquitous spatial-temporal data noise and incompleteness in real-life scenarios bring difficulties to generate high-quality region representations. In this paper, we propose a Spatial-Temporal Adversarial Graph contrastive learning model (STAG) to tackle this challenge for adaptive self-supervised graph augmentation.	Qianru Zhang; Chao Huang; Lianghao Xia; Zheng Wang; Siu Ming Yiu; Ruihua Han;
819	Estimating Joint Treatment Effects By Combining Multiple Experiments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop machinery for estimating joint treatment effects by combining data from multiple experimental datasets.	Yonghan Jung; Jin Tian; Elias Bareinboim;
820	Brainformers: Trading Simplicity for Efficiency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Transformers have a mostly uniform backbone where layers alternate between feed-forward and self-attention in order to build a deep network. Here we investigate this design choice and find that more complex blocks that have different permutations of layer primitives can be more efficient.	Yanqi Zhou; Nan Du; Yanping Huang; Daiyi Peng; Chang Lan; Da Huang; Siamak Shakeri; David So; Andrew M. Dai; Yifeng Lu; Zhifeng Chen; Quoc V Le; Claire Cui; James Laudon; Jeff Dean;
821	On The Convergence of Federated Averaging with Cyclic Client Participation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide (to our knowledge) the first theoretical framework to analyze the convergence of FedAvg with cyclic client participation with several different client optimizers such as GD, SGD, and shuffled SGD.	Yae Jee Cho; Pranay Sharma; Gauri Joshi; Zheng Xu; Satyen Kale; Tong Zhang;
822	Learning to Bid in Repeated First-Price Auctions with Budgets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the problem of learning in repeated first-price auctions with budgets.	Qian Wang; Zongjun Yang; Xiaotie Deng; Yuqing Kong;
823	Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate Policies Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop improved global convergence guarantees for a general class of Fisher-non-degenerate parameterized policies which allows to address the case of continuous state action spaces.	Ilyas Fatkhullin; Anas Barakat; Anastasia Kireeva; Niao He;
824	Lookahead When It Matters: Adaptive Non-causal Transformers for Streaming Neural Transducers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This dichotomy amounts to a trade-off for real-time Automatic Speech Recognition (ASR) system design: profit from the low-latency benefit of strictly-causal architectures while accepting predictive performance limitations, or realize the modeling benefits of future-context models accompanied by their higher latency penalty. In this work, we relax the constraints of this choice and present the Adaptive Non-Causal Attention Transducer (ANCAT).	Grant Strimel; Yi Xie; Brian John King; martin radfar; Ariya Rastrow; Athanasios Mouchtaris;
825	Infinite Action Contextual Bandits with Reusable Data Exhaust Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we describe an online algorithm with an equivalent smoothed regret guarantee, but which generates well-defined importance weights: in exchange, the online computational cost increases, but only to order smoothness (i.e., still independent of the action set).	Mark Rucker; Yinglun Zhu; Paul Mineiro;
826	CodeIPPrompt: Intellectual Property Infringement Assessment of Code Language Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we aim to bridge the gap by presenting CodeIPPrompt, a platform for automatic evaluation of the extent to which code language models may reproduce licensed programs.	Zhiyuan Yu; Yuhao Wu; Ning Zhang; Chenguang Wang; Yevgeniy Vorobeychik; Chaowei Xiao;
827	EF21-P and Friends: Improved Theoretical Communication Complexity for Distributed Optimization with Bidirectional Compression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work we focus our attention on distributed optimization problems in the context where the communication time between the server and the workers is non-negligible.	Kaja Gruntkowska; Alexander Tyurin; Peter Richtárik;
828	Dropout Reduces Underfitting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we demonstrate that dropout can also mitigate underfitting when used at the start of training.	Zhuang Liu; Zhiqiu Xu; Joseph Jin; Zhiqiang Shen; Trevor Darrell;
829	Abstract-to-Executable Trajectory Translation for One-Shot Task Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to achieve one-shot task generalization by decoupling plan generation and plan execution.	Stone Tao; Xiaochen Li; Tongzhou Mu; Zhiao Huang; Yuzhe Qin; Hao Su;
830	D2Match: Leveraging Deep Learning and Degeneracy for Subgraph Matching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we develop $D^2$Match by leveraging the efficiency of Deep learning and Degeneracy for subgraph matching.	Xuanzhou Liu; Lin Zhang; Jiaqi Sun; Yujiu Yang; Haiqin Yang;
831	Improving Medical Predictions By Irregular Multimodal Electronic Health Records Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our method first addresses irregularity in each single modality by (1) modeling irregular time series by dynamically incorporating hand-crafted imputation embeddings into learned interpolation embeddings via a gating mechanism, and (2) casting a series of clinical note representations as multivariate irregular time series and tackling irregularity via a time attention mechanism.	Xinlu Zhang; Shiyang Li; Zhiyu Chen; Xifeng Yan; Linda Ruth Petzold;
832	Target-based Surrogates for Stochastic Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We consider minimizing functions for which it is expensive to compute the (possibly stochastic) gradient.	Jonathan Wilder Lavington; Sharan Vaswani; Reza Babanezhad Harikandeh; Mark Schmidt; Nicolas Le Roux;
833	Reinforcement Learning with History Dependent Dynamic Contexts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts change over time.	Guy Tennenholtz; Nadav Merlis; Lior Shani; Martin Mladenov; Craig Boutilier;
834	CoDi: Co-evolving Contrastive Diffusion Models for Mixed-type Tabular Synthesis Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to process continuous and discrete variables separately (but being conditioned on each other) by two diffusion models.	Chaejeong Lee; Jayoung Kim; Noseong Park;
835	PLay: Parametrically Conditioned Layout Generation Using Latent Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we build a conditional latent diffusion model, PLay, that generates parametrically conditioned layouts in vector graphic space from user-specified guidelines, which are commonly used by designers for representing their design intents in current practices.	Chin-Yi Cheng; Forrest Huang; Gang Li; Yang Li;
836	On Pre-Training for Visuo-Motor Control: Revisiting A Learning-from-Scratch Baseline Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we examine the effectiveness of pre-training for visuo-motor control tasks.	Nicklas Hansen; Zhecheng Yuan; Yanjie Ze; Tongzhou Mu; Aravind Rajeswaran; Hao Su; Huazhe Xu; Xiaolong Wang;
837	Sketched Ridgeless Linear Regression: The Role of Downsampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To facilitate practical implementation, we propose an empirical procedure to determine the optimal sketching size.	Xin Chen; Yicheng Zeng; Siyue Yang; Qiang Sun;
838	Does Sparsity Help in Learning Misspecified Linear Bandits? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet it is unknown whether a structural assumption on the ground-truth parameter, such as sparsity, could break $\varepsilon\sqrt{d}$ barrier. In this paper, we address this question by showing that algorithms can obtain $O(\varepsilon)$-optimal actions by querying $\tilde{O}(\exp(m\varepsilon))$ actions, where $m$ is the sparsity parameter, removing the $\exp(d)$-dependence.	Jialin Dong; Lin Yang;
839	Gradient Descent in Neural Networks As Sequential Learning in Reproducing Kernel Banach Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our goal is to extend beyond the limits of NTK toward a more general theory.	Alistair Shilton; Sunil Gupta; Santu Rana; Svetha Venkatesh;
840	Statistical Inference on Multi-armed Bandits with Delayed Feedback Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present an adaptively weighted estimator that on one hand incorporates the arm-dependent delaying mechanism to achieve consistency, and on the other hand mitigates the variance inflation across stages due to vanishing sampling probability.	Lei Shi; Jingshen Wang; Tianhao Wu;
841	Provably Convergent Schrödinger Bridge with Applications to Probabilistic Time Series Imputation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in practice, only approximated projections are accessible and their convergence is not well understood. To fill this gap, we present a first convergence analysis of the Schrödinger bridge algorithm based on approximated projections.	Yu Chen; Wei Deng; Shikai Fang; Fengpei Li; Tianjiao Nicole Yang; Yikai Zhang; Kashif Rasul; Shandian Zhe; Anderson Schneider; Yuriy Nevmyvaka;
842	Towards Understanding Ensemble Distillation in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we build a theoretical foundation of the ensemble distillation framework in federated learning from the perspective of kernel ridge regression (KRR).	Sejun Park; Kihun Hong; Ganguk Hwang;
843	Randomized Schur Complement Views for Graph Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a randomized topological augmentor based on Schur complements for Graph Contrastive Learning (GCL).	Vignesh Kothapalli;
844	Improved Algorithms for White-Box Adversarial Streams Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose efficient algorithms for sparse recovery of vectors, low rank recovery of matrices and tensors, as well as low rank plus sparse recovery of matrices, i.e., robust PCA.	Ying Feng; David Woodruff;
845	Towards Trustworthy Explanation: On Causal Rationalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Yet, existing association-based approaches on rationalization cannot identify true rationales when two or more snippets are highly inter-correlated and thus provide a similar contribution to prediction accuracy, so-called spuriousness. To address this limitation, we novelly leverage two causal desiderata, non-spuriousness and efficiency, into rationalization from the causal inference perspective.	Wenbo Zhang; TONG WU; Yunlong Wang; Yong Cai; Hengrui Cai;
846	GREAD: Graph Neural Reaction-Diffusion Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we present a reaction-diffusion equation-based GNN method that considers all popular types of reaction equations in addition to one special reaction equation designed by us.	Jeongwhan Choi; Seoyoung Hong; Noseong Park; Sung-Bae Cho;
847	FlexRound: Learnable Rounding Based on Element-wise Division for Post-Training Quantization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As PTQ schemes based on reconstructing each layer or block output turn out to be effective to enhance quantized model performance, recent works have developed algorithms to devise and learn a new weight-rounding scheme so as to better reconstruct each layer or block output. In this work, we propose a simple yet effective new weight-rounding mechanism for PTQ, coined FlexRound, based on element-wise division instead of typical element-wise addition such that FlexRound enables jointly learning a common quantization grid size as well as a different scale for each pre-trained weight.	Jung Hyun Lee; Jeonghoon Kim; Se Jung Kwon; Dongsoo Lee;
848	Distributional Offline Policy Evaluation with Predictive Error Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an algorithm called Fitted Likelihood Estimation (FLE), which conducts a sequence of Maximum Likelihood Estimation (MLE) and has the flexibility of integrating any state-of-the-art probabilistic generative models as long as it can be trained via MLE.	Runzhe Wu; Masatoshi Uehara; Wen Sun;
849	MODeL: Memory Optimizations for Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present MODeL, an algorithm that optimizes the lifetime and memory location of the tensors used to train neural networks.	Benoit Steiner; Mostafa Elhoushi; Jacob Kahn; James Hegarty;
850	Test-Time Style Shifting: Handling Arbitrary Styles in Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In domain generalization (DG), the target domain is unknown when the model is being trained, and the trained model should successfully work on an arbitrary (and possibly unseen) target domain during inference. This is a difficult problem, and despite active studies in recent years, it remains a great challenge. In this paper, we take a simple yet effective approach to tackle this issue.	Jungwuk Park; Dong-Jun Han; Soyeong Kim; Jaekyun Moon;
851	Difference-in-Differences Meets Tree-based Methods: Heterogeneous Treatment Effects Estimation with Unmeasured Confounding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study considers the estimation of conditional causal effects in the presence of unmeasured confounding for a balanced panel with treatment imposed at the last time point. To address this, we combine Difference-in-differences (DiD) and tree-based methods and propose a new identification assumption that allows for the violation of the (conditional) parallel trends assumption adopted by most existing DiD methods.	Caizhi Tang; Huiyuan Wang; Xinyu Li; Qing Cui; Longfei Li; JUN ZHOU;
852	Loss Balancing for Fair Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Imposing EL on the learning process leads to a non-convex optimization problem even if the loss function is convex, and the existing fair learning algorithms cannot properly be adopted to find the fair predictor under the EL constraint. This paper introduces an algorithm that can leverage off-the-shelf convex programming tools (e.g., CVXPY (Diamond and Boyd, 2016; Agrawal et al., 2018)) to efficiently find the global optimum of this non-convex optimization.	Mohammad Mahdi Khalili; Xueru Zhang; Mahed Abroshan;
853	Constrained Decision Transformer for Offline Safe Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We aim to tackle a more challenging problem: learning a safe policy from an offline dataset.	Zuxin Liu; Zijian Guo; Yihang Yao; Zhepeng Cen; Wenhao Yu; Tingnan Zhang; Ding Zhao;
854	Chameleon: Adapting to Peer Images for Planting Durable Backdoors in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the connection between the durability of FL backdoors and the relationships between benign images and poisoned images (i.e., the images whose labels are flipped to the target label during local training).	Yanbo Dai; Songze Li;
855	Generative Adversarial Symmetry Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a framework, LieGAN, to automatically discover equivariances* from a dataset using a paradigm akin to generative adversarial training.*	Jianke Yang; Robin Walters; Nima Dehmamy; Rose Yu;
856	Bayesian Neural Networks Avoid Encoding Complex and Perturbation-Sensitive Concepts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on mean-field variational Bayesian Neural Networks (BNNs) and explore the representation capacity of such BNNs by investigating which types of concepts are less likely to be encoded by the BNN.	Qihan Ren; Huiqi Deng; Yunuo Chen; Siyu Lou; Quanshi Zhang;
857	RLSbench: Domain Adaptation Under Relaxed Label Shift Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce RLSbench, a large-scale benchmark for relaxed label shift, consisting of $>$500 distribution shift pairs spanning vision, tabular, and language modalities, with varying label proportions.	Saurabh Garg; Nick Erickson; James Sharpnack; Alex Smola; Sivaraman Balakrishnan; Zachary Chase Lipton;
858	Mixture Proportion Estimation Beyond Irreducibility Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a more general sufficient condition that accommodates several settings of interest where irreducibility does not hold.	Yilun Zhu; Aaron Fjeldsted; Darren Holland; George Landon; Azaree Lintereur; Clayton Scott;
859	Pre-computed Memory or On-the-fly Encoding? A Hybrid Approach to Retrieval Augmentation Makes The Most of Your Compute Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, pre-encoding memory incurs a severe quality penalty as the memory representations are not conditioned on the current input. We propose LUMEN, a hybrid between these two extremes, pre-computing the majority of the retrieval representation and completing the encoding on the fly using a live encoder that is conditioned on the question and fine-tuned for the task.	Michiel de Jong; Yury Zemlyanskiy; Nicholas FitzGerald; Joshua Ainslie; Sumit Sanghai; Fei Sha; William W. Cohen;
860	Polynomial Time and Private Learning of Unbounded Gaussian Mixture Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of privately estimating the parameters of $d$-dimensional Gaussian Mixture Models (GMMs) with $k$ components.	Jamil Arbas; Hassan Ashtiani; Christopher Liaw;
861	Behavior Contrastive Learning for Unsupervised Skill Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel unsupervised skill discovery method through contrastive learning among behaviors, which makes the agent produce similar behaviors for the same skill and diverse behaviors for different skills.	Rushuai Yang; Chenjia Bai; Hongyi Guo; Siyuan Li; Bin Zhao; Zhen Wang; Peng Liu; Xuelong Li;
862	ELSA: Efficient Label Shift Adaptation Through The Lens of Semiparametric Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address these issues, we first propose a moment-matching framework for adapting the label shift based on the geometry of the influence function. Under such a framework, we propose a novel method named $\underline{\mathrm{E}}$fficient $\underline{\mathrm{L}}$abel $\underline{\mathrm{S}}$hift $\underline{\mathrm{A}}$daptation (ELSA), in which the adaptation weights can be estimated by solving linear systems.	Qinglong Tian; Xin Zhang; Jiwei Zhao;
863	FedVS: Straggler-Resilient and Privacy-Preserving Vertical Federated Learning for Split Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper aims to address two major challenges in split VFL: 1) performance degradation due to straggling clients during training; and 2) data and model privacy leakage from clients’ uploaded data embeddings.	Songze Li; Duanyi YAO; Jin Liu;
864	On Excess Mass Behavior in Gaussian Mixture Models with Orlicz-Wasserstein Distances Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we (re)introduce and investigate a metric, named Orlicz-Wasserstein distance, in the study of the Bayesian contraction behavior for the parameters.	Aritra Guha; Nhat Ho; XuanLong Nguyen;
865	Toward Large Kernel Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we provide a way forward for constructing large-scale general kernel models, which are a generalization of kernel machines that decouples the model and data, allowing training on large datasets.	Amirhesam Abedsoltan; Mikhail Belkin; Parthe Pandit;
866	Poisoning Generative Replay in Continual Learning to Promote Forgetting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, their vulnerability under poisoning attacks has been largely understudied. In this work, we investigate this issue in the context of continual learning, where generative replayers are utilized to tackle catastrophic forgetting.	Siteng Kang; Zhan Shi; Xinhua Zhang;
867	Mirror Sinkhorn: Fast Online Optimization on Transport Polytopes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Optimal transport is an important tool in machine learning, allowing to capture geometric properties of the data through a linear program on transport polytopes. We present a single-loop optimization algorithm for minimizing general convex objectives on these domains, utilizing the principles of Sinkhorn matrix scaling and mirror descent.	Marin Ballu; Quentin Berthet;
868	Learning Globally Smooth Functions on Manifolds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work combines techniques from semi-infinite constrained learning and manifold regularization to learn representations that are globally smooth on a manifold.	Juan Cervino; Luiz F. O. Chamon; Benjamin David Haeffele; Rene Vidal; Alejandro Ribeiro;
869	Offline Learning in Markov Games with General Function Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide the first framework for sample-efficient offline learning in Markov games under general function approximation, handling all 3 equilibria in a unified manner.	Yuheng Zhang; Yu Bai; Nan Jiang;
870	Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by the observation, we present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO.	Yulai Zhao; Zhuoran Yang; Zhaoran Wang; Jason D. Lee;
871	When Sparsity Meets Contrastive Models: Less Graph Data Can Bring Better Class-Balanced Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Additionally, we find that pruned sparse contrastive models may miss valuable information, leading to a large loss value on the informative subset. Motivated by the above findings, we develop a unified data model dynamic sparsity framework called Data Decantation (DataDec) to address the above challenges.	Chunhui Zhang; Chao Huang; Yijun Tian; Qianlong Wen; Zhongyu Ouyang; Youhuan Li; Yanfang Ye; Chuxu Zhang;
872	Efficient Online Reinforcement Learning with Offline Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we demonstrate that the answer is yes; however, a set of minimal but important changes to existing off-policy RL algorithms are required to achieve reliable performance.*	Philip J. Ball; Laura Smith; Ilya Kostrikov; Sergey Levine;
873	Magneto: A Foundation Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a Transformer variant, named Magneto, to fulfill the goal.	Hongyu Wang; Shuming Ma; Shaohan Huang; Li Dong; Wenhui Wang; Zhiliang Peng; Yu Wu; Payal Bajaj; Saksham Singhal; Alon Benhaim; Barun Patra; Zhun Liu; Vishrav Chaudhary; Xia Song; Furu Wei;
874	A Three-regime Model of Network Pruning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Perhaps surprisingly, a systematic approach to predict precisely how adjusting a specific hyperparameter will affect prunability remains elusive. To address this gap, we introduce a phenomenological model grounded in the statistical mechanics of learning.	Yefan Zhou; Yaoqing Yang; Arin Chang; Michael W. Mahoney;
875	Inverse Reinforcement Learning Without Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we demonstrate for the first time a more informed imitation learning reduction where we utilize the state distribution of the expert to alleviate the global exploration component of the RL subroutine, providing an exponential speedup in theory.	Gokul Swamy; David Wu; Sanjiban Choudhury; Drew Bagnell; Steven Wu;
876	An Information-Theoretic Analysis of Nonstationary Bandit Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We view the optimal action sequence as a stochastic process, and take an information-theoretic approach to analyze attainable performance.	Seungki Min; Daniel Russo;
877	Moccasin: Efficient Tensor Rematerialization for Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we consider the problem of execution time minimization of compute graphs subject to a memory budget.	Burak Bartan; Haoming Li; Harris Teague; Christopher Lott; Bistra Dilkina;
878	On The Global Convergence of Risk-Averse Policy Gradient Methods with Expected Conditional Risk Measures Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider a class of dynamic time-consistent risk measures, called Expected Conditional Risk Measures (ECRMs), and derive policy gradient updates for ECRM-based objective functions.	Xian Yu; Lei Ying;
879	Learning Mixtures of Gaussians with Censored Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an algorithm that takes only $\frac{1}{\varepsilon^{O(k)}}$ samples to estimate the weights $w_i$ and the means $\mu_i$ within $\varepsilon$ error.	Wai Ming Tai; Bryon Aragam;
880	Theoretical Bounds on The Network Community Profile from Low-rank Semi-definite Programming Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study a new connection between a technical measure called $\mu$-conductance that arises in the study of Markov chains for sampling convex bodies and the network community profile that characterizes size-resolved properties of clusters and communities in social and information networks.	Yufan Huang; C. Seshadhri; David F. Gleich;
881	Not All Semantics Are Created Equal: Contrastive Self-supervised Learning with Automatic Temperature Individualization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we aim to optimize a contrastive loss with individualized temperatures in a principled manner.	Zi-Hao Qiu; Quanqi Hu; Zhuoning Yuan; Denny Zhou; Lijun Zhang; Tianbao Yang;
882	Addressing Budget Allocation and Revenue Allocation in Data Market Environments Using An Adaptive Sampling Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a new algorithm to solve budget allocation and revenue allocation problems simultaneously in linear time.	Boxin Zhao; Boxiang Lyu; Raul Castro Fernandez; mladen kolar;
883	Policy Gradient in Robust MDPs with Global Convergence Guarantee Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a new Double-Loop Robust Policy Gradient (DRPG), the first generic policy gradient method for RMDPs.	Qiuhao Wang; Chin Pang Ho; Marek Petrik;
884	Wrapped Cauchy Distributed Angular Softmax for Long-Tailed Visual Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose the Wrapped Cauchy Distributed Angular Softmax (WCDAS), a novel softmax function that incorporates data-wise Gaussian-based kernels into the angular correlation between feature representations and classifier weights, effectively mitigating noise and sparse sampling concerns.	Boran Han;
885	Achieving Hierarchy-Free Approximation for Bilevel Programs with Equilibrium Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we develop an approximation scheme for solving bilevel programs with equilibrium constraints, which are generally difficult to solve.	Jiayang Li; Jing Yu; Boyi Liu; Yu Nie; Zhaoran Wang;
886	On Penalty-based Bilevel Gradient Descent Method Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we tackle the bilevel problem through the lens of the penalty method.	Han Shen; Tianyi Chen;
887	Eventual Discounting Temporal Logic Counterfactual Experience Replay Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper makes two contributions. First, we develop a new value-function based proxy, using a technique we call eventual discounting, under which one can find policies that satisfy the LTL specification with highest achievable probability. Second, we develop a new experience replay method for generating off-policy data from on-policy rollouts via counterfactual reasoning on different ways of satisfying the LTL specification.	Cameron Voloshin; Abhinav Verma; Yisong Yue;
888	Scaling Laws for Multilingual Neural Machine Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide a large-scale empirical study of the scaling properties of multilingual neural machine translation models.	Patrick Fernandes; Behrooz Ghorbani; Xavier Garcia; Markus Freitag; Orhan Firat;
889	Fairness in Matching Under Uncertainty Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As our key contribution, we carefully axiomatize a notion of individual fairness in the two-sided marketplace setting which respects the uncertainty in the merits; indeed, it simultaneously recognizes uncertainty as the primary potential cause of unfairness and an approach to address it.	Siddartha Devic; David Kempe; Vatsal Sharan; Aleksandra Korolova;
890	Competitive Gradient Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the problem of convergence to a stationary point in zero-sum games. We propose competitive gradient optimization (CGO), a gradient-based method that incorporates the interactions between two players in zero-sum games for its iterative updates.	Abhijeet Vyas; Brian Bullins; Kamyar Azizzadenesheli;
891	Entity Divider with Language Grounding in Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, given a simple general instruction, e.g., beating all enemies, agents are required to decompose it into multiple subgoals and figure out the right one to focus on. Inspired by previous work, we try to address these issues at the entity level and propose a novel framework for language grounding in multi-agent reinforcement learning, entity divider (EnDi).	Ziluo Ding; Wanpeng Zhang; Junpeng Yue; Xiangjun Wang; Tiejun Huang; Zongqing Lu;
892	Better Diffusion Models Further Improve Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: After two years of rapid development in diffusion models, a question naturally arises: can better diffusion models further improve adversarial training? This paper gives an affirmative answer by employing the most recent diffusion model which has higher efficiency ($\sim 20$ sampling steps) and image quality (lower FID score) compared with DDPM.	Zekai Wang; Tianyu Pang; Chao Du; Min Lin; Weiwei Liu; Shuicheng YAN;
893	Learning Deep Time-index Models for Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Indeed, while naive deep time-index models are far more expressive than the manually predefined function representations of classical time-index models, they are inadequate for forecasting, being unable to generalize to unseen time steps due to the lack of inductive bias. In this paper, we propose DeepTime, a meta-optimization framework to learn deep time-index models which overcome these limitations, yielding an efficient and accurate forecasting model.	Gerald Woo; Chenghao Liu; Doyen Sahoo; Akshat Kumar; Steven Hoi;
894	Surrogate Module Learning: Reduce The Gradient Error Accumulation in Training Spiking Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach to reducing gradient error from a new perspective called surrogate module learning (SML).	Shikuang Deng; Hao Lin; Yuhang Li; Shi Gu;
895	How to Address Monotonicity for Model Risk Management? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the problem of establishing the accountability and fairness of transparent machine learning models through monotonicity.	Dangxing Chen; Weicheng Ye;
896	B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under Hidden Confounding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a meta-learner called the B-Learner, which can efficiently learn sharp bounds on the CATE function under limits on the level of hidden confounding.	Miruna Oprescu; Jacob Dorn; Marah Ghoummaid; Andrew Jesson; Nathan Kallus; Uri Shalit;
897	DeSRA: Detect and Delete The Artifacts of GAN-based Real-World Super-Resolution Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we analyze the cause and characteristics of the GAN artifacts produced in unseen test data without ground-truths.	Liangbin Xie; Xintao Wang; Xiangyu Chen; Gen Li; Ying Shan; Jiantao Zhou; Chao Dong;
898	Tensor Gaussian Process with Contraction for Multi-Channel Imaging Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we extend the Tensor-GP model by introducing an integrative dimensionality reduction technique, called tensor contraction, with a Tensor-GP for a scalar-on-tensor regression task with multi-channel imaging data.	Hu Sun; Ward Manchester; Meng Jin; Yang Liu; Yang Chen;
899	Automatically Auditing Large Language Models Via Discrete Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we cast auditing as an optimization problem, where we automatically search for input-output pairs that match a desired target behavior.	Erik Jones; Anca Dragan; Aditi Raghunathan; Jacob Steinhardt;
900	On The Interplay Between Misspecification and Sub-optimality Gap in Linear Contextual Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an algorithm based on a novel data selection scheme, which only selects the contextual vectors with large uncertainty for online regression.	Weitong Zhang; Jiafan He; Zhiyuan Fan; Quanquan Gu;
901	AutoCoreset: An Automatic Practical Coreset Construction Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we suggest an automatic practical framework for constructing coresets, which requires (only) the input data and the desired cost function from the user, without the need for any other task-related computation to be done by the user. To do so, we reduce the problem of approximating a loss function to an instance of vector summation approximation, where the vectors we aim to sum are loss vectors of a specific subset of the queries, such that we aim to approximate the image of the function on this subset.	Alaa Maalouf; Murad Tukan; Vladimir Braverman; Daniela Rus;
902	On The Expressive Power of Geometric Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a geometric version of the WL test (GWL) for discriminating geometric graphs while respecting the underlying physical symmetries: permutations, rotation, reflection, and translation.	Chaitanya K. Joshi; Cristian Bodnar; Simon V Mathis; Taco Cohen; Pietro Lio;
903	Deep Regression Unlearning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we explore unlearning for the regression problem, particularly in deep learning models.	Ayush Kumar Tarun; Vikram Singh Chundawat; Murari Mandal; Mohan Kankanhalli;
904	Variance Control for Distributional Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To fully understand how the approximation errors of the Q-function affect the whole training process, we do some error analysis and theoretically show how to reduce both the bias and the variance of the error terms. With this new understanding, we construct a new estimator Quantiled Expansion Mean (QEM) and introduce a new DRL algorithm (QEMRL) from the statistical perspective.	Qi Kuang; Zhoufan Zhu; Liwen Zhang; Fan Zhou;
905	Data-Copying in Generative Models: A Formal Framework Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A formal framework for memorization in generative models, called “data-copying” was proposed by Meehan et. al (2020). We build upon their work to show that their framework may fail to detect certain kinds of blatant memorization.	Robi Bhattacharjee; Sanjoy Dasgupta; Kamalika Chaudhuri;
906	The Impact of Exploration on Convergence and Performance of Multi-Agent Q-Learning Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Outside of these classes, learning dynamics rarely converge and little is known about the effect of exploration in the face of non-convergence. To progress this front, we study the smooth Q- Learning dynamics.	Aamal Hussain; Francesco Belardinelli; Dario Paccagnan;
907	A Kernel-Based View of Language Model Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We investigate whether the Neural Tangent Kernel (NTK)—which originated as a model to study the gradient descent dynamics of infinitely wide networks with suitable random initialization—describes fine-tuning of pre-trained LMs.	Sadhika Malladi; Alexander Wettig; Dingli Yu; Danqi Chen; Sanjeev Arora;
908	Graph Contrastive Backdoor Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we systematically study the vulnerability of GCL in the presence of malicious backdoor adversaries.	Hangfan Zhang; Jinghui Chen; Lu Lin; Jinyuan Jia; Dinghao Wu;
909	Provable Data Subset Selection For Efficient Neural Networks Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the first algorithm to construct coresets for RBFNNs, i.e., small weighted subsets that approximate the loss of the input data on any radial basis function network and thus approximate any function defined by an RBFNN on the larger input data.	Murad Tukan; Samson Zhou; Alaa Maalouf; Daniela Rus; Vladimir Braverman; Dan Feldman;
910	On The Stepwise Nature of Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a simple picture of the training process of self-supervised learning methods with dual deep networks.	James B Simon; Maksis Knutins; Liu Ziyin; Daniel Geisz; Abraham J Fetterman; Joshua Albrecht;
911	How to Trust Your Diffusion Model: A Convex Optimization Approach to Conformal Risk Control Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we focus on image-to-image regression tasks and we present a generalization of the Risk-Controlling Prediction Sets (RCPS) procedure, that we term $K$-RCPS, which allows to $(i)$ provide entrywise calibrated intervals for future samples of any diffusion model, and $(ii)$ control a certain notion of risk with respect to a ground truth image with minimal mean interval length.	Jacopo Teneggi; Matthew Tivnan; Web Stayman; Jeremias Sulam;
912	Estimating Causal Effects Using A Multi-task Deep Ensemble Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: A number of methods have been proposed for causal effect estimation, yet few have demonstrated efficacy in handling data with complex structures, such as images. To fill this gap, we propose Causal Multi-task Deep Ensemble (CMDE), a novel framework that learns both shared and group-specific information from the study population.	Ziyang Jiang; Zhuoran Hou; Yiling Liu; Yiman Ren; Keyu Li; David Carlson;
913	Controlling Posterior Collapse By An Inverse Lipschitz Constraint on The Decoder Network Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce an inverse Lipschitz neural network into the decoder and, based on this architecture, provide a new method that can control in a simple and clear manner the degree of posterior collapse for a wide range of VAE models equipped with a concrete theoretical guarantee.	Yuri Kinoshita; Kenta Oono; Kenji Fukumizu; Yuichi Yoshida; Shin-ichi Maeda;
914	Coordinate Descent Methods for Fractional Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider a class of structured fractional minimization problems, in which the numerator part of the objective is the sum of a differentiable convex function and a convex non-smooth function, while the denominator part is a convex or concave function.	Ganzhao Yuan;
915	Interpretable Neural-Symbolic Concept Reasoning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, state-of-the-art concept-based models rely on high-dimensional concept embedding representations which lack a clear semantic meaning, thus questioning the interpretability of their decision process. To overcome this limitation, we propose the Deep Concept Reasoner (DCR), the first interpretable concept-based model that builds upon concept embeddings.	Pietro Barbiero; Gabriele Ciravegna; Francesco Giannini; Mateo Espinosa Zarlenga; Lucie Charlotte Magister; Alberto Tonda; Pietro Lio; Frederic Precioso; Mateja Jamnik; Giuseppe Marra;
916	Incentivizing Exploration with Linear Contexts and Combinatorial Actions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We advance the study of incentivized bandit exploration, in which arm choices are viewed as recommendations and are required to be Bayesian incentive compatible.	Mark Sellke;
917	Towards Robust Graph Incremental Learning on Evolving Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the inductive NGIL problem, which accounts for the evolution of graph structure (structural shift) induced by emerging tasks.	Junwei Su; Difan Zou; Zijun Zhang; Chuan Wu;
918	Proximal Causal Learning of Conditional Average Treatment Effects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the P-learner, motivated by the R- and DR-learner, a tailored two-stage loss function for learning heterogeneous treatment effects in settings where exchangeability given observed covariates is an implausible assumption, and we wish to rely on proxy variables for causal inference.	Erik Sverdrup; Yifan Cui;
919	SAM Operates Far from Home: Eigenvalue Regularization As A Dynamical Phenomenon Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The original motivation for SAM was a modified loss function which penalized sharp minima; subsequent analyses have also focused on the behavior near minima. However, our work reveals that SAM provides a strong regularization of the eigenvalues throughout the learning trajectory.	Atish Agarwala; Yann Dauphin;
920	Dimensionality Reduction for General KDE Mode Finding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we significantly generalize a result of (LeeLiMusco:2021) on mode approximation for Gaussian mixture models.	Xinyu Luo; Christopher P Musco; Cas Widdershoven;
921	Likelihood Adjusted Semidefinite Programs for Clustering Heterogeneous Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we extend the SDP approach to a general setting by integrating cluster labels as model parameters and propose an iterative likelihood adjusted SDP (iLA-SDP) method that directly maximizes the exact observed likelihood in the presence of data heterogeneity.	Yubo Zhuang; Xiaohui Chen; Yun Yang;
922	Markovian Gaussian Process Variational Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we leverage the equivalent discrete state space representation of Markovian GPs to enable linear time GPVAE training via Kalman filtering and smoothing.	Harrison Zhu; Carles Balsells-Rodas; Yingzhen Li;
923	Efficient Bound of Lipschitz Constant for Convolutional Layers By Gram Iteration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper we introduce a precise, fast, and differentiable upper bound for the spectral norm of convolutional layers using circulant matrix theory and a new alternative to the Power iteration.	Blaise Delattre; Quentin Barthélemy; Alexandre Araujo; Alexandre Allauzen;
924	Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the previous best regret upper bound is still $O(K^{1/3} T^{2/3}\log^{1/3}(T))$, which is achieved by the simple uniform exploration algorithm. In this paper, we close this gap and complete the picture of regret minimization in single-pass streaming MABs.	Chen Wang;
925	Approximation Algorithms for Fair Range Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide an efficient constant factor approximation algorithm for the fair range $\ell_p$-clustering for all values of $p\in [1,\infty)$.	Sedjro Salomon Hotegni; Sepideh Mahabadi; Ali Vakilian;
926	Does Continual Learning Equally Forget All Parameters? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study which modules in neural networks are more prone to forgetting by investigating their training dynamics during CL.	Haiyan Zhao; Tianyi Zhou; Guodong Long; Jing Jiang; Chengqi Zhang;
927	Towards Understanding and Improving GFlowNet Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce an efficient evaluation strategy to compare the learned sampling distribution to the target reward distribution.	Max W Shen; Emmanuel Bengio; Ehsan Hajiramezanali; Andreas Loukas; Kyunghyun Cho; Tommaso Biancalani;
928	Causal Isotonic Calibration for Heterogeneous Treatment Effects Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose causal isotonic calibration, a novel nonparametric method for calibrating predictors of heterogeneous treatment effects.	Lars van der Laan; Ernesto Ulloa-Perez; Marco Carone; Alex Luedtke;
929	Gradient-Free Structured Pruning with Unlabeled Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a gradient-free structured pruning framework that uses only unlabeled data.	Azade Nova; Hanjun Dai; Dale Schuurmans;
930	How Many Perturbations Break This Model? Evaluating Robustness Beyond Adversarial Accuracy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce an alternative approach, adversarial sparsity, which quantifies how difficult it is to find a successful perturbation given both an input point and a constraint on the direction of the perturbation.	Raphael Olivier; Bhiksha Raj;
931	Coder Reviewer Reranking for Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by collaborative programming, we propose Coder-Reviewer reranking.	Tianyi Zhang; Tao Yu; Tatsunori Hashimoto; Mike Lewis; Wen-tau Yih; Daniel Fried; Sida Wang;
932	On The Training Instability of Shuffling SGD with Batch Normalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present explicit constructions to show how SS leads to distorted optima in regression and divergence for classification, whereas RR avoids both distortion and divergence.	David Xing Wu; Chulhee Yun; Suvrit Sra;
933	Improving Fair Training Under Correlation Shifts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, when the bias between labels and sensitive groups changes, the fairness of the trained model is directly influenced and can worsen. We make two contributions for solving this problem. First, we analytically show that existing in-processing fair algorithms have fundamental limits in accuracy and group fairness. We utilize the notion of correlation shifts between labels and groups, which can explicitly capture the change of the above bias. Second, we propose a novel pre-processing step that samples the input data to reduce correlation shifts and thus enables the in-processing approaches to overcome their limitations.	Yuji Roh; Kangwook Lee; Steven Euijong Whang; Changho Suh;
934	A Hybrid Quantum-Classical Approach Based on The Hadamard Transform for The Convolutional Layer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel Hadamard Transform (HT)-based neural network layer for hybrid quantum-classical computing.	Hongyi Pan; Xin Zhu; Salih Furkan Atici; Ahmet Cetin;
935	Benign Overfitting in Deep Neural Networks Under Lazy Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper focuses on over-parameterized deep neural networks (DNNs) with ReLU activation functions and proves that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification while obtaining (nearly) zero-training error under the lazy training regime.	Zhenyu Zhu; Fanghui Liu; Grigorios Chrysos; Francesco Locatello; Volkan Cevher;
936	Run-off Election: Improved Provable Defense Against Data Poisoning Attacks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show that merely considering the majority vote in ensemble defenses is wasteful as it does not effectively utilize available information in the logits layers of the base models.	Keivan Rezaei; Kiarash Banihashem; Atoosa Chegini; Soheil Feizi;
937	Understanding Self-Distillation in The Presence of Label Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we theoretically characterize the effect of SD in two supervised learning problems with noisy labels.	Rudrajit Das; sujay sanghavi;
938	Dataset Distillation with Convexified Implicit Gradients Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new dataset distillation algorithm using reparameterization and convexification of implicit gradients (RCIG), that substantially improves the state-of-the-art.	Noel Loo; Ramin Hasani; Mathias Lechner; Daniela Rus;
939	Omnipredictors for Constrained Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce omnipredictors for constrained optimization and study their complexity and implications.	Lunjia Hu; Inbal Rachel Livni Navon; Omer Reingold; Chutong Yang;
940	ACAT: Adversarial Counterfactual Attention for Classification and Detection in Medical Imaging Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these are expensive to collect and may vary significantly across annotators. To overcome these issues, we propose a framework that employs saliency maps to obtain soft spatial attention masks that modulate the image features at different scales.	Alessandro Fontanella; Antreas Antoniou; Wenwen Li; Joanna Wardlaw; Grant Mair; Emanuele Trucco; Amos Storkey;
941	Recasting Self-Attention with Holographic Reduced Representations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by problems in malware detection, where sequence lengths of $T \geq 100,000$ are a roadblock to deep learning, we re-cast self-attention using the neuro-symbolic approach of Holographic Reduced Representations (HRR).	Mohammad Mahmudul Alam; Edward Raff; Stella Biderman; Tim Oates; James Holt;
942	Generative Graph Dictionary Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a Fused Gromov-Wasserstein (FGW) Mixture Model named FraMe to address the GDL problem from the generative view.	Zhichen Zeng; Ruike Zhu; Yinglong Xia; Hanqing Zeng; Hanghang Tong;
943	Synergies Between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we provide evidence that disentangled representations coupled with sparse task-specific predictors improve generalization.	Sebastien Lachapelle; Tristan Deleu; Divyat Mahajan; Ioannis Mitliagkas; Yoshua Bengio; Simon Lacoste-Julien; Quentin Bertrand;
944	Gaussian Processes at The Helm(holtz): A More Fluid Model for Ocean Currents Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To better reflect known physical properties of currents, we propose to instead put a standard stationary kernel on the divergence and curl-free components of a vector field obtained through a Helmholtz decomposition. We show that, because this decomposition relates to the original vector field just via mixed partial derivatives, we can still perform inference given the original data with only a small constant multiple of additional computational expense.	Renato Berlinghieri; Brian L. Trippe; David R. Burt; Ryan James Giordano; Kaushik Srinivasan; Tamay Özgökmen; Junfei Xia; Tamara Broderick;
945	End-to-End Multi-Object Detection with A Regularized Mixture Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel framework to train an end-to-end multi-object detector consisting of only two terms: negative log-likelihood (NLL) and a regularization term.	Jaeyoung Yoo; Hojun Lee; Seunghyeon Seo; Inseop Chung; Nojun Kwak;
946	Pairwise Ranking Losses of Click-Through Rates Prediction for Welfare Maximization in Ad Auctions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we bring back the welfare objectives of ad auctions into CTR predictions and propose a novel weighted rankloss to train the CTR model.	Boxiang Lyu; Zhe Feng; Zachary Robertson; Oluwasanmi O Koyejo;
947	Data-Driven Subgroup Identification for Linear Regression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose DDGroup (data-driven group discovery), a data-driven method to effectively identify subgroups in the data with a uniform linear relationship between the features and the label.	Zachary Izzo; Ruishan Liu; James Zou;
948	Hindsight Learning for MDPs with Exogenous Inputs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Many resource management problems require sequential decision-making under uncertainty, where the only uncertainty affecting the decision outcomes are exogenous variables outside the control of the decision-maker. We model these problems as Exo-MDPs (Markov Decision Processes with Exogenous Inputs) and design a class of data-efficient algorithms for them termed Hindsight Learning (HL).	Sean R. Sinclair; Felipe Vieira Frujeri; Ching-An Cheng; Luke Marshall; Hugo De Oliveira Barbalho; Jingling Li; Jennifer Neville; Ishai Menache; Adith Swaminathan;
949	Model Transferability with Responsive Decision Subjects Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We provide further instantiated analysis for two popular domain adaptation settings, including covariate shift and target shift.	Yatong Chen; Zeyu Tang; Kun Zhang; Yang Liu;
950	LegendreTron: Uprising Proper Multiclass Loss Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present LegendreTron as a novel and practical method that jointly learns proper canonical losses* and probabilities for multiclass problems.*	Kevin H Lam; Christian Walder; Spiridon Penev; Richard Nock;
951	Semi-Offline Reinforcement Learning for Optimized Text Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose semi-offline RL, a novel paradigm that can smoothly transit from the offline setting to the online setting, balances the exploration capability and training cost, and provides a theoretical foundation for comparing different RL settings.	Changyu Chen; Xiting Wang; Yiqiao Jin; Victor Ye Dong; Li Dong; Jie Cao; Yi Liu; Rui Yan;
952	Smooth Non-stationary Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study a non-stationary two-arm bandit problem where we assume an arm’s mean reward is a $\beta$-Hölder function over (normalized) time, meaning it is $(\beta-1)$-times Lipschitz-continuously differentiable.	Su Jia; Qian Xie; Nathan Kallus; Peter I. Frazier;
953	Prometheus: Taming Sample and Communication Complexities in Constrained Decentralized Stochastic Bilevel Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This problem often arises from multi-agent learning problems with safety constraints. As shown in this paper, constrained decentralized bilevel optimization is far more challenging than its unconstrained counterpart due to the complex coupling structure, which necessitates new algorithm design and analysis techniques.	Zhuqing Liu; Xin Zhang; Prashant Khanduri; Songtao Lu; Jia Liu;
954	GC-Flow: A Graph-Based Flow Network for Effective Clustering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we design normalizing flows that replace GCN layers, leading to a generative model that models both the class conditional likelihood $p(\mathbf{x}\|y)$ and the class prior $p(y)$.	Tianchun Wang; Farzaneh Mirzazadeh; Xiang Zhang; Jie Chen;
955	Weighted Tallying Bandits: Overcoming Intractability Via Repeated Exposure Optimality Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by this, a significant line of work has formalized settings where an action’s loss is a function of the number of times it was played in the prior $m$ timesteps, where $m$ corresponds to a bound on human memory capacity. To more faithfully capture decay of human memory with time, we introduce the Weighted Tallying Bandit (WTB), which generalizes this setting by requiring that an action’s loss is a function of a weighted summation of the number of times it was played in the last $m$ timesteps.	Dhruv Malik; Conor Igoe; Yuanzhi Li; Aarti Singh;
956	Nonparametric Density Estimation Under Distribution Drift Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study nonparametric density estimation in non-stationary drift settings.	Alessio Mazzetto; Eli Upfal;
957	Global Selection of Contrastive Batches Via Optimization on Sample Permutations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide an alternative to hard negative mining, Global Contrastive Batch Sampling (GCBS), an efficient approximation to the batch assignment problem that upper bounds the gap between the global and training losses, $\mathcal{L}^{Global} – \mathcal{L}^{Train}$, in contrastive learning settings.	Vin Sachidananda; Ziyi Yang; Chenguang Zhu;
958	PromptBoosting: Black-Box Text Classification with Ten Forward Passes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We describe PromptBoosting, a query-efficient procedure for building a text classifier from a neural language model (LM) without access to the LM’s parameters, gradients, or hidden representations.	Bairu Hou; Joe O’Connor; Jacob Andreas; Shiyu Chang; Yang Zhang;
959	LESS-VFL: Communication-Efficient Feature Selection for Vertical Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose LESS-VFL, a communication-efficient feature selection method for distributed systems with vertically partitioned data.	Timothy Castiglia; Yi Zhou; Shiqiang Wang; Swanand Kadhe; Nathalie Baracaldo; Stacy Patterson;
960	Confidence and Dispersity Speak: Characterizing Prediction Matrix for Unsupervised Accuracy Estimation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work aims to assess how well a model performs under distribution shifts without using labels.	Weijian Deng; Yumin Suh; Stephen Gould; Liang Zheng;
961	Learning Intuitive Policies Using Action Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Through fine-grained evaluation and scenario analysis, we show that the resulting policies are human-interpretable.	Mingwei Ma; Jizhou Liu; Samuel Sokota; Max Kleiman-Weiner; Jakob Nicolaus Foerster;
962	On Balancing Bias and Variance in Unsupervised Multi-Source-Free Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we aim to analyze the fundamental limits of MSFDA.	Maohao Shen; Yuheng Bu; Gregory Wornell;
963	Abstracting Imperfect Information Away from Two-Player Zero-Sum Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a consequence, existing sound decision-time planning algorithms require complicated additional mechanisms that have unappealing properties. The main contribution of this work is showing that certain regularized equilibria do not possess the aforementioned non-correspondence problem—thus, computing them can be treated as perfect-information problems.	Samuel Sokota; Ryan D’Orazio; Chun Kai Ling; David J Wu; J Zico Kolter; Noam Brown;
964	Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For episodic time-inhomogeneous linear Markov decision processes (linear MDPs) whose transition probability can be parameterized as a linear function of a given feature mapping, we propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $\tilde O(d\sqrt{H^3K})$, where $d$ is the dimension of the feature mapping, $H$ is the planning horizon, and $K$ is the number of episodes.	Jiafan He; Heyang Zhao; Dongruo Zhou; Quanquan Gu;
965	On User-Level Private Convex Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new mechanism for stochastic convex optimization (SCO) with user-level differential privacy guarantees.	Badih Ghazi; Pritish Kamath; Ravi Kumar; Pasin Manurangsi; Raghu Meka; Chiyuan Zhang;
966	Disentangled Generative Models for Robust Prediction of System Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The use of deep neural networks for modelling system dynamics is increasingly popular, but long-term prediction accuracy and out-of-distribution generalization still present challenges. In this study, we address these challenges by considering the parameters of dynamical systems as factors of variation of the data and leverage their ground-truth values to disentangle the representations learned by generative models.	Stathi Fotiadis; Mario Lino Valencia; Shunlong Hu; Stef Garasto; Chris D Cantwell; Anil Anthony Bharath;
967	HOPE: High-order Graph ODE For Modeling Interacting Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing methods have severe deficiencies in capacity and efficiency due to the failure to model high-order correlations in long-term temporal trends. To tackle this, in this paper, we propose a novel model named High-order graph ODE (HOPE) for learning from dynamic interaction data, which can be naturally represented as a graph.	Xiao Luo; Jingyang Yuan; Zijie Huang; Huiyu Jiang; Yifang Qin; Wei Ju; Ming Zhang; Yizhou Sun;
968	Formalizing Preferences Over Runtime Distributions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: When trying to solve a computational problem, we are often faced with a choice between algorithms that are guaranteed to return the right answer but differ in their runtime distributions (e.g., SAT solvers, sorting algorithms). This paper aims to lay theoretical foundations for such choices by formalizing preferences over runtime distributions.	Devon R. Graham; Kevin Leyton-Brown; Tim Roughgarden;
969	Curious Replay for Model-based Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we present Curious Replay—a form of prioritized experience replay tailored to model-based agents through use of a curiosity-based priority signal.	Isaac Kauvar; Chris Doyle; Linqi Zhou; Nick Haber;
970	Deep Latent State Space Models for Time-Series Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose LS4, a generative model for sequences with latent variables evolving according to a state space ODE to increase modeling capacity.	Linqi Zhou; Michael Poli; Winnie Xu; Stefano Massaroli; Stefano Ermon;
971	Pricing Experimental Design: Causal Effect, Expected Revenue and Tail Risk Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: When designing pricing experiments, there are three fundamental objectives: estimating the causal effect of price (i.e., price elasticity), maximizing the expected revenue through the experiment, and controlling the tail risk suffering from a very huge loss. In this paper, we reveal the relationship among such three objectives.	David Simchi-Levi; Chonghuan Wang;
972	Differentiable and Transportable Structure Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce D-Struct which recovers transportability in the discovered structures through a novel architecture and loss function while remaining fully differentiable.	Jeroen Berrevoets; Nabeel Seedat; Fergus Imrie; Mihaela van der Schaar;
973	Repository-Level Prompt Generation for Large Language Models of Code Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a framework called Repo-Level Prompt Generator that learns to generate example-specific prompts using prompt proposals.	Disha Shrivastava; Hugo Larochelle; Daniel Tarlow;
974	Improving Bi-level Optimization Based Methods with Inspiration from Humans’ Classroom Study Techniques Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We are interested in investigating whether these techniques can inspire the development of ML training strategies to improve bi-level optimization (BLO) based methods. Towards this goal, we develop a general framework, Skillearn, which consists of basic elements such as learners, interaction functions, learning stages, etc.	Pengtao Xie;
975	Graph Generative Model for Benchmarking Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This greatly reduces the amount of benchmark graphs available to researchers, causing the field to rely only on a handful of publicly-available datasets. To address this problem, we introduce a novel graph generative model, Computation Graph Transformer (CGT) that learns and reproduces the distribution of real-world graphs in a privacy-controlled way.	Minji Yoon; Yue Wu; John Palowitch; Bryan Perozzi; Russ Salakhutdinov;
976	Linear Optimal Partial Transport Embedding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose the Linear optimal partial transport (LOPT) embedding, which extends the (local) linearization technique on OT and HK to the OPT problem.	Yikun Bai; Ivan Vladimir Medri; Rocio Diaz Martin; Rana Shahroz; Soheil Kolouri;
977	Quantum Speedups for Zero-Sum Games Via Improved Dynamic Gibbs Sampling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We give a quantum algorithm for computing an $\epsilon$-approximate Nash equilibrium of a zero-sum game in a $m \times n$ payoff matrix with bounded entries.	Adam Bouland; Yosheb M Getachew; Yujia Jin; Aaron Sidford; Kevin Tian;
978	A Near-Optimal Algorithm for Safe Reinforcement Learning Under Instantaneous Hard Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Hence, in this paper, we develop the first near-optimal safe RL algorithm for episodic Markov Decision Processes with unsafe states and actions under instantaneous hard constraints and the linear mixture model.	Ming Shi; Yingbin Liang; Ness Shroff;
979	Deep Graph Representation Learning and Optimization for Influence Maximization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the development of learning-based IM methods is still limited by fundamental obstacles, including 1) the difficulty of effectively solving the objective function; 2) the difficulty of characterizing the diversified and underlying diffusion patterns; and 3) the difficulty of adapting the solution under various node-centrality-constrained IM variants. To cope with the above challenges, we design a novel framework DeepIM to generatively characterize the latent representation of seed sets, and we propose to learn the diversified information diffusion pattern in a data-driven and end-to-end manner.	Chen Ling; Junji Jiang; Junxiang Wang; My Thai; Lukas Xue; James Song; Meikang Qiu; Liang Zhao;
980	Probabilistic Categorical Adversarial Attack and Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these achievements are still hard to be generalized to categorical data. To bridge this gap, we propose a novel framework, Probabilistic Categorical Adversarial Attack (or PCAA).	Han Xu; Pengfei He; Jie Ren; Yuxuan Wan; Zitao Liu; Hui Liu; Jiliang Tang;
981	Second-order Regression Models Exhibit Progressive Sharpening to The Edge of Stability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As such, we consider the next simplest class of predictive models, namely those that are quadratic in the parameters, which we call second-order regression models. For quadratic objectives in two dimensions, we prove that this second-order regression model exhibits progressive sharpening of the NTK eigenvalue towards a value that differs slightly from the edge of stability, which we explicitly compute.	Atish Agarwala; Fabian Pedregosa; Jeffrey Pennington;
982	Reducing SO(3) Convolutions to SO(2) for Efficient Equivariant GNNs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unfortunately equivariant convolutions, which are a fundamental operation for equivariant networks, increase significantly in computational complexity as higher-order tensors are used. In this paper, we address this issue by reducing the $SO(3)$ convolutions or tensor products to mathematically equivalent convolutions in $SO(2)$ .	Saro Passaro; C. Lawrence Zitnick;
983	Delayed Bandits: When Do Intermediate Observations Help? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider a model, where intermediate observations have a form of a finite state, which is observed immediately after taking an action, whereas the loss is observed after an adversarially chosen delay.	Emmanuel Esposito; Saeed Masoudian; Hao Qiu; Dirk van der Hoeven; Nicolò Cesa-Bianchi; Yevgeny Seldin;
984	Von Mises Mixture Distributions for Molecular Conformation Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we present VonMisesNet, a new graph neural network that captures conformational variability via a variational approximation of rotatable bond torsion angles as a mixture of von Mises distributions.	Kirk Swanson; Jake Williams; Eric M Jonas;
985	Neural Algorithmic Reasoning with Causal Regularisation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we make an important observation: there are many different inputs for which an algorithm will perform certain intermediate computations identically.	Beatrice Bevilacqua; Kyriacos Nikiforou; Borja Ibarz; Ioana Bica; Michela Paganini; Charles Blundell; Jovana Mitrovic; Petar Veličković;
986	Robust Speech Recognition Via Large-Scale Weak Supervision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.	Alec Radford; Jong Wook Kim; Tao Xu; Greg Brockman; Christine McLeavey; Ilya Sutskever;
987	Sequential Monte Carlo Learning for Time Series Structure Discovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a new approach to automatically discovering accurate models of complex time series data.	Feras Saad; Brian Patton; Matthew Douglas Hoffman; Rif A. Saurous; Vikash Mansinghka;
988	Scaling Spherical CNNs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we show how spherical CNNs can be scaled for much larger problems.	Carlos Esteves; Jean-Jacques Slotine; Ameesh Makadia;
989	Sample Complexity of Probability Divergences Under Group Symmetry Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We rigorously quantify the improvement in the sample complexity of variational divergence estimations for group-invariant distributions. In the cases of the Wasserstein-1 metric …	Ziyu Chen; Markos Katsoulakis; Luc Rey-Bellet; Wei Zhu;
990	Faster Rates of Convergence to Stationary Points in Differentially Private Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of approximating stationary points of Lipschitz and smooth functions under $(\varepsilon,\delta)$-differential privacy (DP) in both the finite-sum and stochastic settings.	Raman Arora; Raef Bassily; Tomás González; Cristóbal A Guzmán; Michael Menart; Enayat Ullah;
991	Free-Form Variational Inference for Gaussian Process State-Space Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new method for inference in Bayesian GPSSMs, which overcomes the drawbacks of previous approaches, namely over-simplified assumptions, and high computational requirements.	Xuhui Fan; Edwin V. Bonilla; Terry O’Kane; Scott A Sisson;
992	Maximum Optimality Margin: A Unified Approach for Contextual Linear Programming and Inverse Linear Programming Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the predict-then-optimize problem where the output of a machine learning prediction task is used as the input of some downstream optimization problem, say, the objective coefficient vector of a linear program.	Chunlin Sun; Shang Liu; Xiaocheng Li;
993	Sequential Counterfactual Risk Minimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore the case where it is possible to deploy learned policies multiple times and acquire new data.	Houssam Zenati; Eustache Diemert; Matthieu Martin; Julien Mairal; Pierre Gaillard;
994	Automatic Data Augmentation Via Invariance-Constrained Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In fact, there is both empirical and theoretical evidence that the indiscriminate use of data augmentation can introduce biases that outweigh its benefits. This work tackles these issues by automatically adapting the data augmentation while solving the learning task.	Ignacio Hounie; Luiz F. O. Chamon; Alejandro Ribeiro;
995	Divide and Conquer Dynamic Programming: An Almost Linear Time Change Point Detection Methodology in High Dimensions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop a novel, general and computationally efficient framework, called Divide and Conquer Dynamic Programming (DCDP), for localizing change points in time series data with high-dimensional features.	Wanshan Li; Daren Wang; Alessandro Rinaldo;
996	An Effective Meaningful Way to Evaluate Survival Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we explore various metrics to estimate MAE for survival datasets that include (many) censored individuals.	Shi-ang Qi; Neeraj Kumar; Mahtab Farrokh; Weijie Sun; Li-Hao Kuan; Rajesh Ranganath; Ricardo Henao; Russell Greiner;
997	Collaborative Multi-Agent Heterogeneous Multi-Armed Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop decentralized algorithms which facilitate collaboration between the agents under two scenarios.	Ronshee Chawla; Daniel Vial; Sanjay Shakkottai; R. Srikant;
998	On The Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study a Fitted Q-Iteration with two-layer ReLU neural network parameterization, and find the sample complexity guarantees for the algorithm.	Mudit Gaur; Vaneet Aggarwal; Mridul Agarwal;
999	Decoding Layer Saliency in Language Transformers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a strategy for identifying textual saliency in large-scale language models applied to classification tasks.	Elizabeth Mary Hou; Gregory David Castanon;
1000	Data Efficient Neural Scaling Law Via Model Reusing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the neural scaling law under the previously overlooked data scarcity regime, focusing on the more challenging situation where we need to train a gigantic model with a disproportionately limited supply of available training data.	Peihao Wang; Rameswar Panda; Zhangyang Wang;
1001	Can Forward Gradient Match Backpropagation? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While current solutions rely on weighted averages over isotropic guess vector distributions, we propose to strongly bias our gradient guesses in directions that are much more promising, such as feedback obtained from small, local auxiliary networks. For a standard computer vision neural network, we conduct a rigorous study systematically covering a variety of combinations of gradient targets and gradient guesses, including those previously presented in the literature.	Louis Fournier; Stephane Rivaud; Eugene Belilovsky; Michael Eickenberg; Edouard Oyallon;
1002	CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To directly leverage the abundant geospatial information associated with images in pre-training, fine-tuning, and inference stages, we present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images.	Gengchen Mai; Ni Lao; Yutong He; Jiaming Song; Stefano Ermon;
1003	Fully Dynamic Submodular Maximization Over Matroids Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Maximizing monotone submodular functions under a matroid constraint is a classic algorithmic problem with multiple applications in data mining and machine learning. We study this classic problem in the fully dynamic setting, where elements can be both inserted and deleted in real-time.	Paul Duetting; Federico Fusco; Silvio Lattanzi; Ashkan Norouzi-Fard; Morteza Zadimoghaddam;
1004	Counterfactual Analysis in Dynamic Latent State Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide an optimization-based framework to perform counterfactual analysis in a dynamic model with hidden states.	Martin B Haugh; Raghav Singal;
1005	Beyond In-Domain Scenarios: Robust Density-Aware Calibration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While existing post-hoc calibration methods achieve impressive results on in-domain test datasets, they are limited by their inability to yield reliable uncertainty estimates in domain-shift and out-of-domain (OOD) scenarios. We aim to bridge this gap by proposing DAC, an accuracy-preserving as well as Density-Aware Calibration method based on k-nearest-neighbors (KNN).	Christian Tomani; Futa Kai Waseda; Yuesong Shen; Daniel Cremers;
1006	Demystifying Uneven Vulnerability of Link Stealing Attacks Against Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we first present theoretical evidence of the uneven vulnerability of GNNs to link stealing attacks, which lays the foundation for demystifying such uneven risks among different groups of edges. We further demonstrate a group-based attack paradigm to expose the practical privacy harm to GNN users derived from the uneven vulnerability of edges.	He Zhang; Bang Wu; Shuo Wang; Xiangwen Yang; Minhui Xue; Shirui Pan; Xingliang YUAN;
1007	PAC Generalization Via Invariant Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the following question: If a representation is approximately invariant with respect to a given number of training interventions, will it continue to be approximately invariant on a larger collection of unseen intervened SEMs? Inspired by PAC learning, we obtain finite-sample out-of-distribution generalization guarantees for approximate invariance that holds probabilistically over a family of linear SEMs without faithfulness assumptions.	Advait U Parulekar; Karthikeyan Shanmugam; Sanjay Shakkottai;
1008	A New Near-linear Time Algorithm for K-nearest Neighbor Search Using A Compressed Cover Tree Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given a reference set R of n points and a query set Q of m points in a metric space, this paper studies an important problem of finding k-nearest neighbors of every point q of Q in the set R in a near-linear time. In the paper at ICML 2006, Beygelzimer, Kakade, and Langford introduced a cover tree and attempted to prove that this tree can be built in O(n log n) time while the nearest neighbor search can be done O(n log m) time with a hidden dimensionality factor.	Yury Elkin; Vitaliy Kurlin;
1009	Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new reward-free algorithm for learning linear mixture Markov decision processes (MDPs), where the transition probability can be parameterized as a linear combination of known feature mappings.	Junkai Zhang; Weitong Zhang; Quanquan Gu;
1010	TAN Without A Burn: Scaling Laws of DP-SGD Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we decouple privacy analysis and experimental behavior of noisy training to explore the trade-off with minimal computational requirements.	Tom Sander; Pierre Stock; Alexandre Sablayrolles;
1011	The Benefits of Model-Based Generalization in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we provide theoretical and empirical insight into when, and how, we can expect data generated by a learned model to be useful.	Kenny John Young; Aditya Ramesh; Louis Kirsch; Jürgen Schmidhuber;
1012	Provably and Practically Efficient Neural Contextual Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast to the existing work which primarily focuses on ReLU neural nets, we consider a general set of smooth activation functions. Under this more general setting, (i) we derive non-asymptotic error bounds on the difference between an overparameterized neural net and its corresponding neural tangent kernel, (ii) we propose an algorithm with a provable sublinear regret bound that is also efficient in the finite regime as demonstrated by empirical studies.	Sudeep Salgia;
1013	Training Deep Surrogate Models with Large Scale Online Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper advocates that relying on a traditional static dataset to train these models does not allow the full benefit of the solver to be used as a data generator. It proposes an open source online training framework for deep surrogate models.	Lucas Thibaut Meyer; Marc Schouler; Robert Alexander Caulk; Alejandro Ribes; Bruno Raffin;
1014	Auxiliary Modality Learning with Generalized Curriculum Distillation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we formally define “Auxiliary Modality Learning” (AML), systematically classify types of auxiliary modality (in visual computing) and architectures for AML, and analyze their performance.	Yu Shen; Xijun Wang; Peng Gao; Ming Lin;
1015	Sample and Predict Your Latent: Modality-free Sequential Disentanglement Via Contrastive Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose to avoid that by generating, sampling, and comparing empirical distributions from the underlying variational model.	Ilan Naiman; Nimrod Berman; Omri Azencot;
1016	STEERING : Stein Information Directed Exploration for Model-Based Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we posit an alternative exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal, which under suitable conditions, can be computed in closed form with the kernelized Stein discrepancy (KSD).	Souradip Chakraborty; Amrit Bedi; Alec Koppel; Mengdi Wang; Furong Huang; Dinesh Manocha;
1017	Distributed Linear Bandits Under Communication Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider distributed linear bandits where $M$ agents learn collaboratively to minimize the overall cumulative regret incurred by all agents.	Sudeep Salgia; Qing Zhao;
1018	Privacy-Aware Compression for Federated Learning Through Numerical Mechanism Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper builds upon it by introducing a new interpolation procedure in the numerical design process that allows for a far more efficient privacy analysis.	Chuan Guo; Kamalika Chaudhuri; Pierre Stock; Michael Rabbat;
1019	Efficient Preconditioned Stochastic Gradient Descent for Estimation in Latent Variable Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose as an alternative for parameter estimation an efficient preconditioned stochastic gradient algorithm.	Charlotte Baey; Maud Delattre; Estelle Kuhn; Jean-Benoist Leger; Sarah Lemler;
1020	Automated Search for Conjectures on Mathematical Constants Using Analysis of Integer Sequences Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here we propose a fundamentally different method to search for conjectures on mathematical constants: through analysis of integer sequences.	Ofir Razon; Yoav Harris; Shahar Gottlieb; Dan Carmon; Ofir David; Ido Kaminer;
1021	TIDE: Time Derivative Diffusion for Deep Learning on Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The challenge of approaches that use this paradigm is to ensure efficient and accurate long-distance communication between nodes, as deep convolutional networks are prone to over smoothing. In this paper, we present a novel method based on time derivative graph diffusion (TIDE) to overcome these structural limitations of the message-passing framework.	Maysam Behmanesh; Maximilian Krahn; Maks Ovsjanikov;
1022	Geometric Latent Diffusion Models for 3D Molecule Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the recent huge success of Stable (latent) Diffusion models, we propose a novel and principled method for 3D molecule generation named Geometric Latent Diffusion Models (GeoLDM).	Minkai Xu; Alexander S Powers; Ron O. Dror; Stefano Ermon; Jure Leskovec;
1023	Attention-Based Recurrence for Multi-Agent Reinforcement Learning Under Stochastic Partial Observability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Attention-based Embeddings of Recurrence In multi-Agent Learning (AERIAL) to approximate value functions under stochastic partial observability.	Thomy Phan; Fabian Ritz; Philipp Altmann; Maximilian Zorn; Jonas Nüßlein; Michael Kölle; Thomas Gabor; Claudia Linnhoff-Popien;
1024	Improving Statistical Fidelity for Neural Image Compression with Implicit Local Likelihood Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a non-binary discriminator that is conditioned on quantized local image representations obtained via VQ-VAE autoencoders.	Matthew J. Muckley; Alaaeldin El-Nouby; Karen Ullrich; Herve Jegou; Jakob Verbeek;
1025	On The Forward Invariance of Neural ODEs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new method to ensure neural ordinary differential equations (ODEs) satisfy output specifications by using invariance set propagation.	Wei Xiao; Tsun-Hsuan Wang; Ramin Hasani; Mathias Lechner; Yutong Ban; Chuang Gan; Daniela Rus;
1026	Learning Neural PDE Solvers with Parameter-Guided Channel Attention Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a Channel Attention guided by PDE Parameter Embeddings (CAPE) component for neural surrogate models and a simple yet effective curriculum learning strategy.	Makoto Takamoto; Francesco Alesiani; Mathias Niepert;
1027	Private Statistical Estimation of Many Quantiles Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work studies the estimation of many statistical quantiles under differential privacy. More precisely, given a distribution and access to i.i.d. samples from it, we study the estimation of the inverse of its cumulative distribution function (the quantile function) at specific points.	Clément Lalanne; Aurélien Garivier; Rémi Gribonval;
1028	Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We aim at an ambitious goal of democratizing pretraining.	Boris Knyazev; DOHA HWANG; Simon Lacoste-Julien;
1029	Entropy-driven Unsupervised Keypoint Representation Learning in Videos Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a novel approach for unsupervised learning of meaningful representations from videos, leveraging the concept of image spatial entropy (ISE) that quantifies the per-pixel information in an image.	Ali Younes; Simone Schaub-Meyer; Georgia Chalvatzaki;
1030	Loss-Guided Diffusion Models for Plug-and-Play Controllable Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we show that such approaches have significant errors over the scale of the approximations. To address this issue, we propose a Monte Carlo method that uses multiple samples from a suitable distribution to reduce bias.	Jiaming Song; Qinsheng Zhang; Hongxu Yin; Morteza Mardani; Ming-Yu Liu; Jan Kautz; Yongxin Chen; Arash Vahdat;
1031	UMD: Unsupervised Model Detection for X2X Backdoor Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose UMD, the first Unsupervised Model Detection method that effectively detects X2X backdoor attacks via a joint inference of the adversarial (source, target) class pairs.	Zhen Xiang; Zidi Xiong; Bo Li;
1032	Improving Graph Generation By Restricting Graph Bandwidth Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, one of the main limitations of existing methods is their large output space, which limits generation scalability and hinders accurate modeling of the underlying distribution. To overcome these limitations, we propose a novel approach that significantly reduces the output space of existing graph generative models.	Nathaniel Lee Diamant; Alex Tseng; Kangway V Chuang; Tommaso Biancalani; Gabriele Scalia;
1033	On The Estimation of Gaussian Mixture Copula Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper revisits Gaussian Mixture Copula Model (GMCM), a more expressive alternative to the widely used Gaussian Mixture Model (GMM), with the goal to make its parameter estimation tractable.	ASHUTOSH TEWARI;
1034	Solving Linear Programs with Fast Online Learning Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents fast first-order methods for solving linear programs (LPs) approximately.	Wenzhi Gao; Dongdong Ge; Chunlin Sun; Yinyu Ye;
1035	Variational Open-Domain Question Answering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce the Variational Open-Domain (VOD) framework for end-to-end training and evaluation of retrieval-augmented models, focusing on open-domain question answering and language modelling.	Valentin Liévin; Andreas Geert Motzfeldt; Ida Riis Jensen; Ole Winther;
1036	Structure Learning of Latent Factors Via Clique Search on Correlation Thresholded Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the widespread application of latent factor analysis, existing methods suffer from the following weaknesses: requiring the number of factors to be known, lack of theoretical guarantees for learning the model structure, and nonidentifiability of the parameters due to rotation invariance properties of the likelihood. We address these concerns by proposing a fast correlation thresholding (CT) algorithm that simultaneously learns the number of latent factors and a rotationally identifiable model structure.	Dale Kim; Qing Zhou;
1037	Phase Transitions in The Detection of Correlated Databases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of detecting the correlation between two Gaussian databases $\mathsf{X}\in\mathbb{R}^{n\times d}$ and $\mathsf{Y}^{n\times d}$, each composed of $n$ users with $d$ features.	Dor Elimelech; Wasim Huleihel;
1038	Hardware-Aware Compression with Random Operation Access Specific Tile (ROAST) Hashing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a model-agnostic, cache-friendly, and hardware-aware model compression approach: Random Operation Access Specific Tile (ROAST) hashing.	Aditya Desai; Keren Zhou; Anshumali Shrivastava;
1039	Algorithmic Collective Action in Machine Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simple theoretical model of a collective interacting with a firm’s learning algorithm.	Moritz Hardt; Eric Mazumdar; Celestine Mendler-Dünner; Tijana Zrnic;
1040	Adversarial Cheap Talk Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, this paper proposes a novel adversarial setting called a Cheap Talk MDP in which an Adversary can merely append deterministic messages to the Victim’s observation, resulting in a minimal range of influence.	Chris Lu; Timon Willi; Alistair Letcher; Jakob Nicolaus Foerster;
1041	RLang: A Declarative Language for Describing Partial World Knowledge to Reinforcement Learning Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce RLang, a domain-specific language (DSL) for communicating domain knowledge to an RL agent.	Rafael Rodriguez-Sanchez; Benjamin Adin Spiegel; Jennifer Wang; Roma Patel; Stefanie Tellex; George Konidaris;
1042	Scalable Safe Policy Improvement Via Monte Carlo Tree Search Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose an algorithm, called MCTS-SPIBB, that computes safe policy improvement online using a Monte Carlo Tree Search based strategy.	Alberto Castellini; Federico Bianchi; Edoardo Zorzi; Thiago D. Simão; Alessandro Farinelli; Matthijs T. J. Spaan;
1043	Robust Collaborative Learning with Linear Gradient Overhead Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present MoNNA, a new algorithm that (a) is provably robust under standard assumptions and (b) has a gradient computation overhead that is linear in the fraction of faulty machines, which is conjectured to be tight.	Sadegh Farhadkhani; Rachid Guerraoui; Nirupam Gupta; Lê-Nguyên Hoang; Rafael Pinot; John Stephan;
1044	Regularization-free Diffeomorphic Temporal Alignment Nets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Unfortunately, existing DTAN formulations crucially depend on a regularization term whose optimal hyperparameters are dataset-specific and usually searched via a large number of experiments. Here we propose a regularization-free DTAN that obviates the need to perform such an expensive, and often impractical, search.	Ron Shapira Weber; Oren Freifeld;
1045	CLUTR: Curriculum Learning Via Unsupervised Task Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce CLUTR: a novel unsupervised curriculum learning algorithm that decouples task representation and curriculum learning into a two-stage optimization.	Abdus Salam Azad; Izzeddin Gur; Jasper Emhoff; Nathaniel Alexis; Aleksandra Faust; Pieter Abbeel; Ion Stoica;
1046	POUF: Prompt-Oriented Unsupervised Fine-tuning for Large Pre-trained Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Though these big models have zero-shot capabilities, in general, labeled data are still required to adapt them to downstream tasks. To overcome this critical limitation, we propose an unsupervised fine-tuning framework to directly fine-tune the model or prompt on the unlabeled target data.	Korawat Tanwisuth; Shujian Zhang; Huangjie Zheng; Pengcheng He; Mingyuan Zhou;
1047	Feature Expansion for Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the feature space that dominates representation learning has not been systematically studied in graph neural networks. In this paper, we propose to fill this gap by analyzing the feature space of both spatial and spectral models.	Jiaqi Sun; Lin Zhang; Guangyi Chen; Peng XU; Kun Zhang; Yujiu Yang;
1048	Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning with Energy-based Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we will show that current RCRL approaches are fundamentally limited and fail to address two critical challenges of RCRL — improving generalization on high reward-to-go (RTG) inputs, and avoiding out-of-distribution (OOD) RTG queries during testing time. To address these challenges when training vanilla RCRL architectures, we propose Bayesian Reparameterized RCRL (BR-RCRL), a novel set of inductive biases for RCRL inspired by Bayes’ theorem.	Wenhao Ding; Tong Che; Ding Zhao; Marco Pavone;
1049	PINA: Leveraging Side Information in EXtreme Multi-label Classification Via Predicted Instance Neighborhood Aggregation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Predicted Instance Neighborhood Aggregation (PINA), a data augmentation method for the general XMC problem that leverages beneficial side information.	Eli Chien; Jiong Zhang; Cho-Jui Hsieh; Jyun-Yu Jiang; Wei-Cheng Chang; Olgica Milenkovic; Hsiang-Fu Yu;
1050	A Two-Stage Active Learning Algorithm for K-Nearest Neighbors Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide consistency guarantees for a modified $k$-nearest neighbors classifier trained on samples acquired via our scheme, and show that when the conditional probability function $\mathbb{P}(Y=y\|X=x)$ is sufficiently smooth and the Tsybakov noise condition holds, our actively trained classifiers converge to the Bayes optimal classifier at a faster asymptotic rate than passively trained $k$-nearest neighbor classifiers.	Nicholas Rittler; Kamalika Chaudhuri;
1051	Consistency Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Diffusion models have significantly advanced the fields of image, audio, and video generation, but they depend on an iterative sampling process that causes slow generation. To overcome this limitation, we propose consistency models, a new family of models that generate high quality samples by directly mapping noise to data.	Yang Song; Prafulla Dhariwal; Mark Chen; Ilya Sutskever;
1052	Continuously Parameterized Mixture Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that by continuously parameterizing a mixture of factor analyzers using a learned ordinary differential equation, we can improve the fit of mixture models over direct methods.	Christopher M Bender; Yifeng Shi; Marc Niethammer; Junier Oliva;
1053	Cooperative Multi-Agent Reinforcement Learning: Asynchronous Communication and Linear Function Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a provably efficient algorithm based on value iteration that can simultaneously allow asynchronous communication and guarantee the benefit of cooperation with low communication complexity.	Yifei Min; Jiafan He; Tianhao Wang; Quanquan Gu;
1054	A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study the universality hypothesis by examining how small networks learn to implement group compositions.	Bilal Chughtai; Lawrence Chan; Neel Nanda;
1055	On The Convergence of SARSA with Linear Function Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, little is known about how fast SARSA converges to that region and how large the region is. In this paper, we make progress towards this open problem by showing the convergence rate of projected SARSA to a bounded region.	Shangtong Zhang; Remi Tachet des Combes; Romain Laroche;
1056	PAC-Bayesian Generalization Bounds for Adversarial Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We extend PAC-Bayesian theory to generative models and develop generalization bounds for models based on the Wasserstein distance and the total variation distance.	Sokhna Diarra Mbacke; Florence Clerc; Pascal Germain;
1057	Cluster Explanation Via Polyhedral Descriptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formulate the cluster description problem as an integer program and present a column generation approach to search over an exponential number of candidate half-spaces that can be used to build the polyhedra.	Connor Lawless; Oktay Gunluk;
1058	Diffusion Based Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Using the same approach, we propose to learn an infinite-dimensional latent code that achieves improvements on state-of-the-art models on semi-supervised image classification.	Sarthak Mittal; Korbinian Abstreiter; Stefan Bauer; Bernhard Schölkopf; Arash Mehrjou;
1059	Fully Bayesian Autoencoders with Latent Sparse Gaussian Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a fully Bayesian autoencoder model that treats both local latent variables and global decoder parameters in a Bayesian fashion.	Ba-Hien Tran; Babak Shahbaba; Stephan Mandt; Maurizio Filippone;
1060	Benign Overfitting in Two-layer ReLU Convolutional Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: How and when benign overfitting can occur in ReLU neural networks remains an open problem. In this work, we seek to answer this question by establishing algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise.	Yiwen Kou; Zixiang Chen; Yuanzhou Chen; Quanquan Gu;
1061	Fairness in Streaming Submodular Maximization Over A Matroid Constraint Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the natural generalization of this problem to a matroid constraint.	Marwa El Halabi; Federico Fusco; Ashkan Norouzi-Fard; Jakab Tardos; Jakub Tarnawski;
1062	Multi-Layer Neural Networks As Trainable Ladders of Hilbert Spaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To characterize the functions spaces explored by multi-layer neural networks (NNs), we introduce Neural Hilbert Ladders (NHLs), a collection of reproducing kernel Hilbert spaces (RKHSes) that are defined iteratively and adaptive to training.	Zhengdao Chen;
1063	Optimal Randomized Multilevel Monte Carlo for Repeatedly Nested Expectations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel Monte Carlo estimator called $\mathsf{READ}$, which stands for “Recursive Estimator for Arbitrary Depth.”	Yasa Syed; Guanyang Wang;
1064	PAC Prediction Sets for Large Language Models of Code Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In the context of code generation, we propose a solution that considers a restricted set of prediction sets that can compactly be represented as partial programs, which are programs with portions replaced with holes.	Adam Khakhar; Stephen Mell; Osbert Bastani;
1065	Scalable Adaptive Computation for Iterative Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Natural data is redundant yet predominant architectures tile computation uniformly across their input and output space. We propose the Recurrent Interface Network (RIN), an attention-based architecture that decouples its core computation from the dimensionality of the data, enabling adaptive computation for more scalable generation of high-dimensional data.	Allan Jabri; David J. Fleet; Ting Chen;
1066	Function-Space Regularization in Neural Networks: A Probabilistic Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we approach regularization in neural networks from a probabilistic perspective and show that by viewing parameter-space regularization as specifying an empirical prior distribution over the model parameters, we can derive a probabilistically well-motivated regularization technique that allows explicitly encoding information about desired predictive functions into neural network training.	Tim G. J. Rudner; Sanyam Kapoor; Shikai Qiu; Andrew Gordon Wilson;
1067	PAC-Bayesian Offline Contextual Bandits With Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces a new principled approach for off-policy learning in contextual bandits.	Otmane Sakhi; Pierre Alquier; Nicolas Chopin;
1068	Distribution-dependent McDiarmid-type Inequalities for Functions of Unbounded Interaction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper gives unbounded analogues of the McDiarmid-type exponential inequalities for three popular classes of distributions, namely sub-Gaussian, sub-exponential and heavy-tailed distributions.	Shaojie Li; Yong Liu;
1069	Special Properties of Gradient Descent with Large Learning Rates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formally prove that GD with large step size —on certain non-convex function classes — follows a different trajectory than GD with a small step size, which can lead to convergence to a global minimum instead of a local one.	Amirkeivan Mohtashami; Martin Jaggi; Sebastian U Stich;
1070	The Power of Learned Locally Linear Models for Nonlinear Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A common pipeline in learning-based control is to iteratively estimate a model of system dynamics, and apply a trajectory optimization algorithm – e.g. $\mathtt{iLQR}$ – on the learned model to minimize a target cost. This paper conducts a rigorous analysis of a simplified variant of this strategy for general nonlinear systems.	Daniel Pfrommer; Max Simchowitz; Tyler Westenbroek; Nikolai Matni; Stephen Tu;
1071	On The Privacy-Robustness-Utility Trilemma in Distributed Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove our matching upper bound by presenting a new distributed ML algorithm using a high-dimensional robust aggregation rule.	Youssef Allouah; Rachid Guerraoui; Nirupam Gupta; Rafael Pinot; John Stephan;
1072	Statistical Learning Under Heterogenous Distribution Shift Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper studies the prediction of a target $\mathbf{z}$ from a pair of random variables $(\mathbf{x},\mathbf{y})$, where the ground-truth predictor is additive $\mathbb{E}[\mathbf{z} \mid \mathbf{x},\mathbf{y}] = f_\star(\mathbf{x}) +g_{\star}(\mathbf{y})$.	Max Simchowitz; Anurag Ajay; Pulkit Agrawal; Akshay Krishnamurthy;
1073	Why Is Public Pretraining Necessary for Private Model Training? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the stark contrast in the gain of pretraining between non-private and private machine learning suggests that the gain in the latter is rooted in a fundamentally different cause. To explain this phenomenon, we hypothesize that the non-convex loss landscape of a model training necessitates the optimization algorithm to go through two phases.	Arun Ganesh; MAHDI HAGHIFAM; Milad Nasr; Sewoong Oh; Thomas Steinke; Om Thakkar; Abhradeep Guha Thakurta; Lun Wang;
1074	Reinforcement Learning in Low-rank MDPs with Density Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we instead investigate sample-efficient learning with density features, i.e., the right matrix, which induce powerful models for state-occupancy distributions.	Audrey Huang; Jinglin Chen; Nan Jiang;
1075	Intrinsic Sliced Wasserstein Distances for Comparing Collections of Probability Distributions on Manifolds and Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In practice these distributions can be defined over diverse domain types including finite intervals, circles, cylinders, spheres, other manifolds, and graphs. This paper introduces an approach for detecting differences between two collections of distributions over such general domains.	Raif M. Rustamov; Subhabrata Majumdar;
1076	Random Teachers Are Good Teachers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate the implicit regularization induced by teacher-student learning dynamics in self-distillation.	Felix Sarnthein; Gregor Bachmann; Sotiris Anagnostidis; Thomas Hofmann;
1077	Concurrent Shuffle Differential Privacy Under Continual Observation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce the concurrent shuffle model of differential privacy.	Jay Tenenbaum; Haim Kaplan; Yishay Mansour; Uri Stemmer;
1078	Discrete Key-Value Bottleneck Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Given a new task, however, updating the weights of these encoders is challenging as a large number of weights needs to be fine-tuned, and as a result, they forget information about the previous tasks. In the present work, we propose a model architecture to address this issue, building upon a discrete bottleneck containing pairs of separate and learnable key-value codes.	Frederik Träuble; Anirudh Goyal; Nasim Rahaman; Michael Curtis Mozer; Kenji Kawaguchi; Yoshua Bengio; Bernhard Schölkopf;
1079	Fast Online Value-Maximizing Prediction Sets with Conformal Cost Control Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For instance, a hospital might expect a smart diagnosis system to capture as many severe, often co-morbid, diseases as possible (the value), while maintaining strict control over incorrect predictions (the cost). We present a general pipeline, dubbed as FavMac, to maximize the value while controlling the cost in such scenarios.	Zhen Lin; Shubhendu Trivedi; Cao Xiao; Jimeng Sun;
1080	Understanding The Complexity Gains of Single-Task RL with A Curriculum Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide a theoretical framework that reformulates a single-task RL problem as a multi-task RL problem defined by a curriculum.	Qiyang Li; Yuexiang Zhai; Yi Ma; Sergey Levine;
1081	PPG Reloaded: An Empirical Study on What Matters in Phasic Policy Gradient Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In particular, we can achieve the same level of performance with low value sample reuse and frequent feature distillation, as long as the policy regularization strength and data diversity are preserved.	Kaixin Wang; Daquan Zhou; Jiashi Feng; Shie Mannor;
1082	Is Consensus Acceleration Possible in Decentralized Optimization Over Slowly Time-Varying Networks? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider decentralized optimization problems where one aims to minimize a sum of convex smooth objective functions distributed between nodes in the network.	Dmitry Metelev; Alexander Rogozin; Dmitry Kovalev; Alexander Gasnikov;
1083	Sequential Predictive Conformal Inference for Time Series Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a new distribution-free conformal prediction algorithm for sequential data (e.g., time series), called the sequential predictive conformal inference* (SPCI).*	Chen Xu; Yao Xie;
1084	Neural Network Approximations of PDEs Beyond Linearity: A Representational Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we take a step towards studying the representational power of neural networks for approximating solutions to nonlinear PDEs.	Tanya Marwah; Zachary Chase Lipton; Jianfeng Lu; Andrej Risteski;
1085	Distortion and Uncertainty Aware Loss for Panoramic Depth Completion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the assumption is inapplicable to panoramic data due to its latitude-wise distortion and high uncertainty nearby textures and edges. To handle these challenges, we propose distortion and uncertainty aware loss (DUL) that consists of a distortion-aware loss and an uncertainty-aware loss.	Zhiqiang Yan; Xiang Li; Kun Wang; Shuo Chen; Jun Li; Jian Yang;
1086	SinFusion: Training Diffusion Models on A Single Image or Video Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, they are usually trained on very large datasets and are not naturally adapted to manipulate a given input image or video. In this paper we show how this can be resolved by training a diffusion model on a single input image or video.	Yaniv Nikankin; Niv Haim; michal Irani;
1087	Variational Curriculum Reinforcement Learning for Unsupervised Discovery of Skills Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: From this perspective, we propose a novel approach to unsupervised skill discovery based on information theory, called Value Uncertainty Variational Curriculum (VUVC).	Seongun Kim; Kyowoon Lee; Jaesik Choi;
1088	VIMA: Robot Manipulation with Multimodal Prompts Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that a wide spectrum of robot manipulation tasks can be expressed with multimodal prompts, interleaving textual and visual tokens.Accordingly, we develop a new simulation benchmark that consists of thousands of procedurally-generated tabletop tasks with multimodal prompts, 600K+ expert trajectories for imitation learning, and a four-level evaluation protocol for systematic generalization.	Yunfan Jiang; Agrim Gupta; Zichen Zhang; Guanzhi Wang; Yongqiang Dou; Yanjun Chen; Li Fei-Fei; Anima Anandkumar; Yuke Zhu; Linxi Fan;
1089	Generating Novel, Designable, and Diverse Protein Structures By Equivariantly Diffusing Oriented Residue Clouds Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we leverage recent advances in denoising diffusion probabilistic models and equivariant neural networks to develop Genie, a generative model of protein structures that performs discrete-time diffusion using a cloud of oriented reference frames in 3D space.	Yeqing Lin; Mohammed AlQuraishi;
1090	Revisiting Pseudo-Label for Single-Positive Multi-Label Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, the conditions of the effectiveness of learning from pseudo-label for SPMLL are shown and the learnability of pseudo-label-based methods is proven.	Biao Liu; Ning Xu; Jiaqi Lv; Xin Geng;
1091	What Makes Entities Similar? A Similarity Flooding Perspective for Multi-sourced Knowledge Graph Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose two simple but effective methods inspired by the fixpoint computation in similarity flooding, and demonstrate their effectiveness on benchmark datasets.	Zequn Sun; Jiacheng Huang; Xiaozhou Xu; Qijin Chen; Weijun Ren; Wei Hu;
1092	Differentially Private Optimization on Large Model at Small Cost Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we develop a novel Book-Keeping (BK) technique that implements existing DP optimizers (thus achieving the same accuracy), with a substantial improvement on the computational cost.	Zhiqi Bu; Yu-Xiang Wang; Sheng Zha; George Karypis;
1093	Fully-Adaptive Composition in Differential Privacy Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We construct filters that match the rates of advanced composition, including constants, despite allowing for adaptively chosen privacy parameters.	Justin Whitehouse; Aaditya Ramdas; Ryan Rogers; Steven Wu;
1094	Cocktail Party Attack: Breaking Aggregation-Based Privacy in Federated Learning Using Independent Component Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the Cocktail Party Attack (CPA) that, contrary to prior belief, is able to recover the private inputs from gradients/weight updates aggregated over as many as 1024 samples.	Sanjay Kariyappa; Chuan Guo; Kiwan Maeng; Wenjie Xiong; G. Edward Suh; Moinuddin K Qureshi; Hsien-Hsin S. Lee;
1095	The Numerical Stability of Hyperbolic Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we analyze the limitations of two popular models for the hyperbolic space, namely, the Poincaré ball and the Lorentz model.	Gal Mishne; Zhengchao Wan; Yusu Wang; Sheng Yang;
1096	Revisiting Bellman Errors for Offline Model Selection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we elucidate why previous work has seen pessimistic results with Bellman errors and identify conditions under which OMS algorithms based on Bellman errors will perform well.	Joshua P Zitovsky; Daniel de Marchi; Rishabh Agarwal; Michael Rene Kosorok;
1097	$\pi$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a universal parameter-efficient transfer learning method, termed Predict-Interpolate Tuning ($\pi$-Tuning), for vision, language, and vision-language tasks.	Chengyue Wu; Teng Wang; Yixiao Ge; Zeyu Lu; Ruisong Zhou; Ying Shan; Ping Luo;
1098	Multi-task Representation Learning for Pure Exploration in Linear Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study multi-task representation learning for best arm identification in linear bandit (RepBAI-LB) and best policy identification in contextual linear bandit (RepBPI-CLB), two popular pure exploration settings with wide applications, e.g., clinical trials and web content optimization.	Yihan Du; Longbo Huang; Wen Sun;
1099	Data Representations’ Study of Latent Image Manifolds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Deep neural networks have been demonstrated to achieve phenomenal success in many domains, and yet their inner mechanisms are not well understood. In this paper, we investigate the curvature of image manifolds, i.e., the manifold deviation from being flat in its principal directions.	Ilya Kaufman; Omri Azencot;
1100	Implicit Graph Neural Networks: A Monotone Operator Viewpoint Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we provide a new well-posedness characterization for IGNNs leveraging monotone operator theory, resulting in a much more expressive parameterization than the existing one.	Justin Baker; Qingsong Wang; Cory D Hauck; Bao Wang;
1101	Reasons for The Superiority of Stochastic Estimators Over Deterministic Ones: Robustness, Consistency and Perceptual Quality Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we reveal additional fundamental advantages of stochastic methods over deterministic ones, which further motivate their use.	Guy Ohayon; Theo Joseph Adrai; Michael Elad; Tomer Michaeli;
1102	Regularizing Towards Soft Equivariance Under Mixed Symmetries Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead of proposing a new architectural restriction as in most of the previous approaches, we present a regularizer-based method for building a model for a dataset with mixed approximate symmetries.	Hyunsu Kim; Hyungi Lee; Hongseok Yang; Juho Lee;
1103	Spurious Valleys and Clustering Behavior of Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We prove two main results concerning the geometry of the loss landscape of a neural network.	Samuele Pollaci;
1104	RSC: Accelerate Graph Neural Networks Training Via Randomized Sparse Computations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the issues, our key idea is to control the accuracy-efficiency trade off by optimizing computation resource allocation layer-wisely and epoch-wisely.	Zirui Liu; CHEN SHENGYUAN; Kaixiong Zhou; Daochen Zha; Xiao Huang; Xia Hu;
1105	Image Restoration with Mean-Reverting Stochastic Differential Equations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper presents a stochastic differential equation (SDE) approach for general-purpose image restoration.	Ziwei Luo; Fredrik K. Gustafsson; Zheng Zhao; Jens Sjölund; Thomas B. Schön;
1106	Efficient Rate Optimal Regret for Adversarial Contextual MDPs Using Online Function Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Abstract: We present the OMG-CMDP! algorithm for regret minimization in adversarial Contextual MDPs. The algorithm operates under the minimal assumptions of realizable function class and …	Orin Levy; Alon Cohen; Asaf Cassel; Yishay Mansour;
1107	Sequential Kernelized Independence Testing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Classical batch tests are not tailored for streaming data: valid inference after data peeking requires correcting for multiple testing which results in low power. Following the principle of testing by betting, we design sequential kernelized independence tests that overcome such shortcomings.	Aleksandr Podkopaev; Patrick Blöbaum; Shiva Kasiviswanathan; Aaditya Ramdas;
1108	MultiRobustBench: Benchmarking Robustness Against Multiple Attacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present the first unified framework for considering multiple attacks against ML models.	Sihui Dai; Saeed Mahloujifar; Chong Xiang; Vikash Sehwag; Pin-Yu Chen; Prateek Mittal;
1109	ILLUME: Rationalizing Vision-Language Models Through Human Interactions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, outputs of these models rarely align with user’s rationales for specific answers. In order to improve this alignment and reinforce commonsense reasons, we propose a tuning paradigm based on human interactions with machine-generated data.	Manuel Brack; Patrick Schramowski; Björn Deiseroth; Kristian Kersting;
1110	Parameter-Level Soft-Masking for Continual Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of this paper is threefold: (1) overcoming CF, (2) encouraging KT, and (3) tackling the capacity problem.	Tatsuya Konishi; Mori Kurokawa; Chihiro Ono; Zixuan Ke; Gyuhak Kim; Bing Liu;
1111	Are Diffusion Models Vulnerable to Membership Inference Attacks? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate the vulnerability of diffusion models to Membership Inference Attacks (MIAs), a common privacy concern.	Jinhao Duan; Fei Kong; Shiqi Wang; Xiaoshuang Shi; Kaidi Xu;
1112	Parallel Online Clustering of Bandits Via Hedonic Game Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose CLUB-HG, a novel algorithm that integrates a game-theoretic approach into clustering inference.	Xiaotong Cheng; Cheng Pan; Setareh Maghsudi;
1113	The Computational Complexity of Concise Hypersphere Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we perform the first complexity-theoretic study of the hypersphere classification problem for binary data.	Eduard Eiben; Robert Ganian; Iyad A. Kanj; Sebastian Ordyniak; Stefan Szeider;
1114	Demystifying Disagreement-on-the-Line in High Dimensions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we develop a theoretical foundation for analyzing disagreement in high-dimensional random features regression; and study under what conditions the disagreement-on-the-line phenomenon occurs in our setting.	Donghwan Lee; Behrad Moniri; Xinmeng Huang; Edgar Dobriban; Hamed Hassani;
1115	Feed Two Birds with One Scone: Exploiting Wild Data for Both Out-of-Distribution Generalization and Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a margin-based learning framework that exploits freely available unlabeled data in the wild that captures the environmental test-time OOD distributions under both covariate and semantic shifts.	Haoyue Bai; Gregory Canal; Xuefeng Du; Jeongyeol Kwon; Robert D Nowak; Yixuan Li;
1116	Robust Weak Supervision with Variational Auto-Encoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, LFs need to be carefully designed, often requiring expert domain knowledge and extensive validation for existing WS methods to be effective. To tackle this, we propose the Weak Supervision Variational Auto-Encoder (WS-VAE), a novel framework that combines unsupervised representation learning and weak labelling to reduce the dependence of WS on expert and manual engineering of LFs.	Francesco Tonolini; Nikolaos Aletras; Yunlong Jiao; Gabriella Kazai;
1117	Thompson Sampling with Diffusion Generative Prior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we initiate the idea of using denoising diffusion models to learn priors for online decision making problems.	Yu-Guan Hsieh; Shiva Kasiviswanathan; Branislav Kveton; Patrick Blöbaum;
1118	Distilling Internet-Scale Vision-Language Models Into Embodied Agents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Learning to ground language is challenging, typically requiring domain-specific engineering or large quantities of human interaction data. To address this challenge, we propose using pretrained vision-language models (VLMs) to supervise embodied agents.	Theodore Sumers; Kenneth Marino; Arun Ahuja; Rob Fergus; Ishita Dasgupta;
1119	Explainable Data-Driven Optimization: From Context to Decision and Back Again Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce two classes of explanations and develop methods to find nearest explanations of random forest and nearest-neighbor predictors.	Alexandre Forel; Axel Parmentier; Thibaut Vidal;
1120	Why Random Pruning Is All We Need to Start Sparse Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We demonstrate the feasibility of this approach in experiments for different pruning methods and propose particularly effective choices of initial layer-wise sparsity ratios of the random source network.	Advait Harshal Gadhikar; Sohom Mukherjee; Rebekka Burkholz;
1121	ContraBAR: Contrastive Bayes-Adaptive Deep RL Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We begin by proving that representations learned by CPC are indeed sufficient for Bayes optimality. Based on this observation, we propose a simple meta RL algorithm that uses CPC in lieu of variational belief inference.	Era Choshen; Aviv Tamar;
1122	FREDIS: A Fusion Framework of Refinement and Disambiguation for Unreliable Partial Label Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a fusion framework of refinement and disambiguation named FREDIS to handle the UPLL problem.	Congyu Qiao; Ning Xu; Jiaqi Lv; yi Ren; Xin Geng;
1123	Learning Useful Representations for Shifting Tasks and Distributions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Does the dominant approach to learn representations (as a side effect of optimizing an expected cost for a single training distribution) remain a good approach when we are dealing with multiple distributions? Our thesis is that *such scenarios are better served by representations that are richer than those obtained with a single optimization episode.	Jianyu Zhang; Leon Bottou;
1124	Computational Doob H-transforms for Online Filtering of Discretely Observed Diffusions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our approach is based on the fully adapted auxiliary particle filter, which involves Doob’s $h$-transforms that are typically intractable. We propose a computational framework to approximate these $h$-transforms by solving the underlying backward Kolmogorov equations using nonlinear Feynman-Kac formulas and neural networks.	Nicolas Chopin; Andras Fulop; Jeremy Heng; Alexandre H. Thiery;
1125	XTab: Cross-table Pretraining for Tabular Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce XTab, a framework for cross-table pretraining of tabular transformers on datasets from various domains.	Bingzhao Zhu; Xingjian Shi; Nick Erickson; Mu Li; George Karypis; Mahsa Shoaran;
1126	On The Robustness of Text Vectorizers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our work formally proves that popular embedding schemes, such as concatenation, TF-IDF, and Paragraph Vector (a.k.a. doc2vec), exhibit robustness in the Hölder or Lipschitz sense with respect to the Hamming distance.	Rémi Catellier; Samuel Vaiter; Damien Garreau;
1127	Beyond Reward: Offline Preference-guided Policy Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, this requires the separate learning of a scalar reward function, which is assumed to be an information bottleneck of the learning process. To address this issue, we propose the offline preference-guided policy optimization (OPPO) paradigm, which models offline trajectories and preferences in a one-step process, eliminating the need for separately learning a reward function.	Yachen Kang; Diyuan Shi; Jinxin Liu; Li He; Donglin Wang;
1128	DIFF2: Differential Private Optimization Via Gradient Differences for Nonconvex Distributed Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve the best known utility bound, we propose a new differential private optimization framework called DIFF2 (DIFFerential private optimization via gradient DIFFerences) that constructs a differential private global gradient estimator with possibly quite small variance based on communicated gradient differences rather than gradients themselves.	Tomoya Murata; Taiji Suzuki;
1129	On Many-Actions Policy Gradient Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Model-Based Many-Actions (MBMA), an approach leveraging dynamics models for many-actions sampling in the context of SPG.	Michal Nauman; Marek Cygan;
1130	Online Nonstochastic Control with Adversarial and Static Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose online nonstochastic control algorithms that achieve both sublinear regret and sublinear adversarial constraint violation while keeping static constraint violation minimal against the optimal constrained linear control policy in hindsight.	Xin Liu; Zixian Yang; Lei Ying;
1131	Near-Minimax-Optimal Risk-Sensitive Reinforcement Learning with CVaR Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study risk-sensitive Reinforcement Learning (RL), focusing on the objective of Conditional Value at Risk (CVaR) with risk tolerance $\tau$.	Kaiwen Wang; Nathan Kallus; Wen Sun;
1132	The Blessing of Heterogeneity in Federated Q-Learning: Linear Speedup and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider federated Q-learning, which aims to learn an optimal Q-function by periodically aggregating local Q-estimates trained on local data alone.	Jiin Woo; Gauri Joshi; Yuejie Chi;
1133	Multi-task Hierarchical Adversarial Inverse Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We develop Multi-task Hierarchical Adversarial Inverse Reinforcement Learning (MH-AIRL) to learn hierarchically-structured multi-task policies, which is more beneficial for compositional tasks with long horizons and has higher expert data efficiency through identifying and transferring reusable basic skills across tasks.	Jiayu Chen; Dipesh Tamboli; Tian Lan; Vaneet Aggarwal;
1134	On The Generalization of Multi-modal Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Through this unified perspective, we characterize the advantage of MMCL by showing that text pairs induce more semantically consistent and diverse positive pairs, which, according to our analysis, provably benefit downstream generalization. Inspired by this finding, we propose several methods to significantly improve the downstream performance of SSCL on ImageNet by leveraging multi-modal information.	Qi Zhang; Yifei Wang; Yisen Wang;
1135	Alternating Local Enumeration (TnALE): Solving Tensor Network Structure Search with Fewer Evaluations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose TnALE, a surprisingly simple algorithm that updates each structure-related variable alternately by local enumeration, greatly reducing the number of evaluations compared to TNLS.	Chao Li; Junhua Zeng; Chunmei Li; Cesar F Caiafa; Qibin Zhao;
1136	Shapley Based Residual Decomposition for Instance Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce the idea of decomposing the residuals of regression with respect to the data instances instead of features.	Tommy Liu; Amanda Susan Barnard;
1137	Less Is More: Task-aware Layer-wise Distillation for Language Model Compression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Furthermore, the hidden representations of the teacher contain redundant information that the student does not necessarily need for the target task’s learning. To address these challenges, we propose a novel Task-aware layEr-wise Distillation (TED).	Chen Liang; Simiao Zuo; Qingru Zhang; Pengcheng He; Weizhu Chen; Tuo Zhao;
1138	Metagenomic Binning Using Connectivity-constrained Variational Autoencoders Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the use of a Variational AutoEncoder (VAE) tailored to leverage auxiliary structural information about contig relations when learning contig representations for subsequent metagenomic binning.	Andre Lamurias; Alessandro Tibo; Katja Hose; Mads Albertsen; Thomas Dyhre Nielsen;
1139	Relevant Walk Search for Explaining Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose polynomial-time algorithms for finding top-$K$ relevant walks, which drastically reduces the computation and thus increases the applicability of GNN-LRP to large-scale problems.	Ping Xiong; Thomas Schnake; Michael Gastegger; Grégoire Montavon; Klaus Robert Muller; Shinichi Nakajima;
1140	Modality-Agnostic Variational Compression of Implicit Neural Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a modality-agnostic neural compression algorithm based on a functional view of data and parameterised as an Implicit Neural Representation (INR).	Jonathan Richard Schwarz; Jihoon Tack; Yee Whye Teh; Jaeho Lee; Jinwoo Shin;
1141	VectorMapNet: End-to-end Vectorized HD Map Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these predictions do not include instance information of individual map elements and require heuristic post-processing to obtain vectorized maps. To tackle these challenges, we introduce an end-to-end vectorized HD map learning pipeline, termed VectorMapNet.	Yicheng Liu; Tianyuan Yuan; Yue Wang; Yilun Wang; Hang Zhao;
1142	Hierarchical Imitation Learning with Vector Quantized Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, learning the models for both low and high-level planning from demonstrations has proven challenging, especially with higher-dimensional inputs. To address this issue, we propose to use reinforcement learning to identify subgoals in expert trajectories by associating the magnitude of the rewards with the predictability of low-level actions given the state and the chosen subgoal.	Kalle Kujanpää; Joni Pajarinen; Alexander Ilin;
1143	Bandit Multi-linear DR-Submodular Maximization and Its Applications on Adversarial Submodular Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate the online bandit learning of the monotone multi-linear DR-submodular functions, designing the algorithm $\mathtt{BanditMLSM}$ that attains $O(T^{2/3}\log T)$ of $(1-1/e)$-regret.	Zongqi Wan; Jialin Zhang; Wei Chen; Xiaoming SUN; Zhijie Zhang;
1144	Temporal Label Smoothing for Early Event Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Following an analysis of objectives from both fields, we propose Temporal Label Smoothing (TLS), a simpler, yet best-performing method that preserves prediction monotonicity over time.	Hugo Yèche; Alizée Pace; Gunnar Ratsch; Rita Kuznetsova;
1145	Reconstructive Neuron Pruning for Backdoor Defense Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a novel defense called Reconstructive Neuron Pruning* (RNP) to expose and prune backdoor neurons via an unlearning and then recovering process.*	Yige Li; Xixiang Lyu; Xingjun Ma; Nodens Koren; Lingjuan Lyu; Bo Li; Yu-Gang Jiang;
1146	Minimalistic Predictions to Schedule Jobs with Online Precedence Constraints Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present lower bounds and algorithmic upper bounds for different precedence topologies, and thereby give a structured overview on which and how additional (possibly erroneous) information helps for designing better algorithms.	Alexandra Lassota; Alexander Lindermayr; Nicole Megow; Jens Schlöter;
1147	Estimating The Contamination Factor’s Distribution in Unsupervised Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unfortunately, there are no good methods for estimating the contamination factor itself. We address this need from a Bayesian perspective, introducing a method for estimating the posterior distribution of the contamination factor for a given unlabeled dataset.	Lorenzo Perini; Paul-Christian Bürkner; Arto Klami;
1148	Implicit Neural Spatial Representations for Time-dependent PDEs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work explores solving time-dependent PDEs with INSR.	Honglin Chen; Rundi Wu; Eitan Grinspun; Changxi Zheng; Peter Yichen Chen;
1149	Cell-Free Latent Go-Explore Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce Latent Go-Explore (LGE), a simple and general approach based on the Go-Explore paradigm for exploration in reinforcement learning (RL).	Quentin Gallouédec; Emmanuel Dellandrea;
1150	Unlocking Slot Attention By Changing Optimal Transport Costs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To overcome this limitation, we first establish a connection between slot attention and optimal transport. Based on this new perspective we propose MESH (Minimize Entropy of Sinkhorn): a cross-attention module that combines the tiebreaking properties of unregularized optimal transport with the speed of regularized optimal transport.	Yan Zhang; David W Zhang; Simon Lacoste-Julien; Gertjan J. Burghouts; Cees G. M. Snoek;
1151	Generalized Polyak Step Size for First Order Optimization with Momentum Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper presents a general framework to set the learning rate adaptively for first-order optimization methods with momentum, motivated by the derivation of Polyak step size.	Xiaoyu Wang; Mikael Johansson; Tong Zhang;
1152	Towards Understanding Generalization of Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Even though GNNs have achieved remarkable success in real-world applications, understanding their working mechanism in theory is still on primary stage. In this paper, we move towards this goal from the perspective of generalization.	Huayi Tang; Yong Liu;
1153	Projected Tensor Power Method for Hypergraph Community Recovery Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the non-convex and discrete nature of the maximum likelihood estimation problem, we develop a simple yet efficient iterative method, called the projected tensor power method, to tackle it.	Jinxin Wang; Yuen-Man Pun; Xiaolu Wang; Peng Wang; Anthony Man-Cho So;
1154	Optimal Rates and Efficient Algorithms for Online Bayesian Persuasion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We focus on the online Bayesian persuasion framework, in which the sender repeatedly faces one or more receivers with unknown and adversarially selected types.	Martino Bernasconi; Matteo Castiglioni; Andrea Celli; Alberto Marchesi; Francesco Trovò; Nicola Gatti;
1155	Trustworthy Policy Learning Under The Counterfactual No-Harm Criterion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we first formalize the counterfactual no-harm criterion for policy learning from a principal stratification perspective. Next, we propose a novel upper bound for the fraction negatively affected by the policy and show the consistency and asymptotic normality of the estimator.	Haoxuan Li; Chunyuan Zheng; Yixiao Cao; Zhi Geng; Yue Liu; Peng Wu;
1156	TabDDPM: Modelling Tabular Data with Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate if the framework of diffusion models can be advantageous for general tabular problems, where data points are typically represented by vectors of heterogeneous features.	Akim Kotelnikov; Dmitry Baranchuk; Ivan Rubachev; Artem Babenko;
1157	Graph Neural Tangent Kernel: Convergence on Large Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We investigate the training dynamics of large-graph GNNs using graph neural tangent kernels (GNTKs) and graphons.	Sanjukta Krishnagopal; Luana Ruiz;
1158	Fascinating Supervisory Signals and Where to Find Them: Deep Anomaly Detection with Scale Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Different from current reconstruction-guided generative models and transformation-based contrastive models, we devise novel data-driven supervision for tabular data by introducing a characteristic — scale — as data labels.	Hongzuo Xu; Yijie Wang; Juhui Wei; Songlei Jian; Yizhou Li; Ning Liu;
1159	Aligning Language Models with Preferences Through $f$-divergence Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new approach, $f$-DPG, which allows the use of any $f$-divergence to approximate any target distribution that can be evaluated.	Dongyoung Go; Tomasz Korbak; Germán Kruszewski; Jos Rozen; Nahyeon Ryu; Marc Dymetman;
1160	Speed-Oblivious Online Scheduling: Knowing (Precise) Speeds Is Not Necessary Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider online scheduling on unrelated (heterogeneous) machines in a speed-oblivious setting, where an algorithm is unaware of the exact job-dependent processing speeds. We show strong impossibility results for clairvoyant and non-clairvoyant algorithms and overcome them in models inspired by practical settings: (i) we provide competitive learning-augmented algorithms, assuming that (possibly erroneous) predictions on the speeds are given, and (ii) we provide competitive algorithms for the speed-ordered model, where a single global order of machines according to their unknown job-dependent speeds is known.	Alexander Lindermayr; Nicole Megow; Martin Rapp;
1161	Differentiable Simulations for Enhanced Sampling of Rare Events Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose using differentiable simulations (DiffSim) for the discovery and enhanced sampling of chemical transformations without a need to resort to preselected CVs, using only a distance metric.	Martin Sipka; Johannes Carl Bertold Dietschreit; Lukáš Grajciar; Rafael Gomez-Bombarelli;
1162	Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we innovatively combine the backward-looking and forward-looking aspects of the optimizer algorithm and propose a novel Admeta (A Double exponential Moving averagE To Adaptive and non-adaptive momentum) optimizer framework.	Yineng Chen; Zuchao Li; Lefei Zhang; Bo Du; hai zhao;
1163	Vector-Valued Control Variates Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose vector-valued control variates, an extension of control variates which can be used to reduce the variance of multiple Monte Carlo estimators jointly.	Zhuo Sun; Alessandro Barp; Francois-Xavier Briol;
1164	Causal Structure Learning for Latent Intervened Non-stationary Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, for non-stationary time series data, domain indexes are often unavailable, making it difficult to distinguish observational samples from interventional samples. To address these issues, we propose a novel Latent Intervened Non-stationary learning (LIN) method to make the domain indexes recovery process and the causal structure learning process mutually promote each other.	Chenxi Liu; Kun Kuang;
1165	Neural Inverse Operators for Solving PDE Inverse Problems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Existing operator learning frameworks map functions to functions and need to be modified to learn inverse maps from data. We propose a novel architecture termed Neural Inverse Operators (NIOs) to solve these PDE inverse problems.	Roberto Molinaro; Yunan Yang; Björn Engquist; Siddhartha Mishra;
1166	A Distribution Optimization Framework for Confidence Bounds of Risk Measures Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a distribution optimization framework that significantly improves confidence bounds for various risk measures compared to previous methods.	Hao Liang; Zhi-Quan Luo;
1167	On The Complexity of Bayesian Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Established computational modes (i.e., rule-based or similarity-based) are primarily studied isolated, focusing on confined and abstract problem spaces. In this work, we study these two modes when the problem space scales up and when the complexity of concepts becomes diverse.	Yu-Zhe Shi; Manjie Xu; John E. Hopcroft; Kun He; Joshua B. Tenenbaum; Song-Chun Zhu; Ying Nian Wu; Wenjuan Han; Yixin Zhu;
1168	Vertical Federated Graph Neural Network for Recommender System Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our study proposes the first vertical federated GNN-based recommender system, called VerFedGNN.	Peihua Mai; Yan Pang;
1169	Feature Directions Matter: Long-Tailed Learning Via Rotated Balanced Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Holding a different view, in this paper, we show that features with fixed directions may be harmful to the generalization of models, even if it is completely symmetric. To avoid this issue, we propose Representation-Balanced Learning Framework (RBL), which introduces orthogonal matrices to learn directions while maintaining the geometric structure of ETF.	Gao Peifeng; Qianqian Xu; Peisong Wen; Zhiyong Yang; Huiyang Shao; Qingming Huang;
1170	SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite being well studied, existing analyses of this method suffer from various shortcomings: they either assume some knowledge of the problem parameters, impose strong global Lipschitz conditions, or fail to give bounds that hold with high probability. We provide a comprehensive analysis of this basic method without any of these limitations, in both the convex and non-convex (smooth) cases, that additionally supports a general “affine variance” noise model and provides sharp rates of convergence in both the low-noise and high-noise regimes.	Amit Attia; Tomer Koren;
1171	Learning Hidden Markov Models When The Locations of Missing Observations Are Unknown Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper we consider the general problem of learning an HMM from data with unknown missing observation locations.	Binyamin Perets; Mark Kozdoba; Shie Mannor;
1172	Unscented Autoencoder Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The Variational Autoencoder (VAE) is a seminal approach in deep generative modeling with latent variables. Interpreting its reconstruction process as a nonlinear transformation of samples from the latent posterior distribution, we apply the Unscented Transform (UT) — a well-known distribution approximation used in the Unscented Kalman Filter (UKF) from the field of filtering.	Faris Janjos; Lars Rosenbaum; Maxim Dolgov; J. Marius Zoellner;
1173	Finding Generalization Measures By Contrasting Signal and Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new generalization measure REF Complexity (RElative Fitting degree between signal and noise), motivated by the intuition that a given model-algorithm pair may generalize well if it fits signal (e.g., true labels) fast while fitting noise (e.g., random labels) slowly.	Jiaye Teng; Bohang Zhang; Ruichen Li; Haowei He; Yequan Wang; Yan Tian; Yang Yuan;
1174	Regression with Sensor Data Containing Incomplete Observations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, because an incomplete observation does not provide any tags indicating incompleteness, we cannot eliminate or impute them. To address this issue, we propose a learning algorithm that explicitly models incomplete observations corrupted with an asymmetric noise that always has a negative value.	Takayuki Katsuki; Takayuki Osogami;
1175	Hierarchical Grammar-Induced Geometry for Data-Efficient Molecular Property Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a data-efficient property predictor by utilizing a learnable hierarchical molecular grammar that can generate molecules from grammar production rules.	Minghao Guo; Veronika Thost; Samuel W Song; Adithya Balachandran; Payel Das; Jie Chen; Wojciech Matusik;
1176	Detecting Out-of-distribution Data Through In-distribution Class Prior Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, some representative methods share an unproven assumption that the probability that OOD data belong to every ID class should be the same, i.e., these OOD-to-ID probabilities actually form a uniform distribution. In this paper, we show that this assumption makes the above methods incapable when the ID model is trained with class-imbalanced data.Fortunately, by analyzing the causal relations between ID/OOD classes and features, we identify several common scenarios where the OOD-to-ID probabilities should be the ID-class-prior distribution and propose two strategies to modify existing inference-time detection methods: 1) replace the uniform distribution with the ID-class-prior distribution if they explicitly use the uniform distribution; 2) otherwise, reweight their scores according to the similarity between the ID-class-prior distribution and the softmax outputs of the pre-trained model.	Xue Jiang; Feng Liu; Zhen Fang; Hong Chen; Tongliang Liu; Feng Zheng; Bo Han;
1177	Learning Unnormalized Statistical Models Via Compositional Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study a direct approach for optimizing the negative log-likelihood of unnormalized models from the perspective of compositional optimization.	Wei Jiang; Jiayu Qin; Lingyu Wu; Changyou Chen; Tianbao Yang; Lijun Zhang;
1178	DoG Is SGD’s Best Friend: A Parameter-Free Dynamic Step Size Schedule Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a tuning-free dynamic SGD step size formula, which we call Distance over Gradients (DoG).	Maor Ivgi; Oliver Hinder; Yair Carmon;
1179	CrossSplit: Mitigating Label Noise Memorization Through Data Splitting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We approach the problem of improving robustness of deep learning algorithms in the presence of label noise.	Jihye Kim; Aristide Baratin; Yan Zhang; Simon Lacoste-Julien;
1180	An Adaptive Entropy-Regularization Framework for Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an adaptive entropy-regularization framework (ADER) for multi-agent reinforcement learning (RL) to learn the adequate amount of exploration of each agent for entropy-based exploration.	Woojun Kim; Youngchul Sung;
1181	Generalizing Neural Wave Functions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these networks can only solve different spatial arrangements of the same set of atoms. To overcome this limitation, we present Graph-learned orbital embeddings (Globe), a neural network-based reparametrization method that can adapt neural wave functions to different molecules.	Nicholas Gao; Stephan Günnemann;
1182	Deep Laplacian-based Options for Temporally-Extended Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These assumptions are fundamentally not scalable. In this paper we address these limitations and show how recent results for directly approximating the eigenfunctions of the Laplacian can be leveraged to truly scale up options-based exploration.	Martin Klissarov; Marlos C. Machado;
1183	One-vs-the-Rest Loss to Focus on Important Samples in Adversarial Training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a new loss function for adversarial training.	Sekitoshi Kanai; Shin’ya Yamaguchi; Masanori Yamada; Hiroshi Takahashi; Kentaro Ohno; Yasutoshi Ida;
1184	Multi-Modal Classifiers for Open-Vocabulary Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The goal of this paper is open-vocabulary object detection (OVOD) — building a model that can detect objects beyond the set of categories seen at training, thus enabling the user to specify categories of interest at inference without the need for model retraining.	Prannay Kaul; Weidi Xie; Andrew Zisserman;
1185	Multi-Task Structural Learning Using Local Task Similarity Induced Neuron Creation and Removal Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: On the contrary, learning in the brain occurs through structural changes that are in tandem with changes in synaptic strength. Thus, we propose Multi-Task Structural Learning (MTSL) that simultaneously learns the multi-task architecture and its parameters.	Naresh Kumar Gurulingan; Bahram Zonooz; Elahe Arani;
1186	A Group Symmetric Stochastic Differential Equation Model for Molecule Multi-modal Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Meanwhile, existing molecule multi-modal pretraining approaches approximate MI based on the representation space encoded from the topology and geometry, thus resulting in the loss of critical structural information of molecules. To address this issue, we propose MoleculeSDE.	Shengchao Liu; weitao Du; Zhi-Ming Ma; Hongyu Guo; Jian Tang;
1187	Reachability-Aware Laplacian Representation in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Such a mismatch would impede the learning process in reward shaping. To fix this issue, we introduce a Reachability-Aware Laplacian Representation (RA-LapRep), by properly scaling each dimension of LapRep.	Kaixin Wang; Kuangqi Zhou; Jiashi Feng; Bryan Hooi; Xinchao Wang;
1188	Adversarial Collaborative Learning on Non-IID Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Different from typical FL approaches, the paper proposes a new learning concept called ADCOL (Adversarial Collaborative Learning) for non-IID features.	Qinbin Li; Bingsheng He; Dawn Song;
1189	Provably Invariant Learning Without Domain Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we present TIVA for environment-independent invariance learning, which requires no environment-specific information in training data.	Xiaoyu Tan; LIN Yong; Shengyu Zhu; Chao Qu; Xihe Qiu; Xu Yinghui; Peng Cui; Yuan Qi;
1190	GRAFENNE: Learning on Graphs with Heterogeneous and Dynamic Feature Sets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these techniques (i) assume uniformity of feature set across nodes, (ii) are transductive by nature, and (iii) fail to work when features are added or removed over time. In this work, we address these limitations through a novel GNN framework called GRAFENNE.	Shubham Gupta; Sahil Manchanda; Sayan Ranu; Srikanta J. Bedathur;
1191	DevFormer: A Symmetric Transformer for Context-Aware Device Placement Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present DevFormer, a novel transformer-based architecture for addressing the complex and computationally demanding problem of hardware design optimization.	Haeyeon Kim; Minsu Kim; Federico Berto; Joungho Kim; Jinkyoo Park;
1192	Continual Learning in Linear Classification on Separable Data Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We analyze continual learning on a sequence of separable linear classification tasks with binary labels.	Itay Evron; Edward Moroshko; Gon Buzaglo; Maroun Khriesh; Badea Marjieh; Nathan Srebro; Daniel Soudry;
1193	Total Variation Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a GNN model that computes cluster assignments by optimizing a tighter relaxation of the minimum cut based on graph total variation (GTV).	Jonas Berg Hansen; Filippo Maria Bianchi;
1194	Learning Control By Iterative Inversion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose iterative inversion – an algorithm for learning an inverse function without input-output pairs, but only with samples from the desired output distribution and access to the forward function.	Gal Leibovich; Guy Jacob; Or Avner; Gal Novik; Aviv Tamar;
1195	Expected Gradients of Maxout Networks and Consequences to Parameter Initialization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the gradients of a maxout network with respect to inputs and parameters and obtain bounds for the moments depending on the architecture and the parameter distribution.	Hanna Tseran; Guido Montufar;
1196	Topological Singularity Detection at Multiple Scales Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Detecting such singularities is therefore crucial as a precursor to interpolation and inference tasks. We address this issue by developing a topological framework that (i) quantifies the local intrinsic dimension, and (ii) yields a Euclidicity score for assessing the ‘manifoldness’ of a point along multiple scales.	Julius Von Rohrscheidt; Bastian Rieck;
1197	Finding The Missing-half: Graph Complementary Learning for Homophily-prone and Heterophily-prone Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In our paper, we introduce Graph cOmplementAry Learning, namely GOAL, which consists of two components: graph complementation and complemented graph convolution.	YIZHEN ZHENG; He Zhang; Vincent Lee; Yu Zheng; Xiao Wang; Shirui Pan;
1198	Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The orthogonal weight modification to overcome catastrophic forgetting does not consider the similarity of genuine audio across different datasets. To overcome this limitation, we propose a continual learning algorithm for fake audio detection to overcome catastrophic forgetting, called Regularized Adaptive Weight Modification (RAWM).	XiaoHui Zhang; Jiangyan Yi; Jianhua Tao; Chenglong Wang; Chu Yuan Zhang;
1199	Homomorphism AutoEncoder — Learning Group Structured Representations from Observed Transitions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose methods enabling an agent acting upon the world to learn internal representations of sensory information that are consistent with actions that modify it.	Hamza Keurti; Hsiao-Ru Pan; Michel Besserve; Benjamin F Grewe; Bernhard Schölkopf;
1200	Semi-Dual Unbalanced Quadratic Optimal Transport: Fast Statistical Rates and Convergent Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we derive a semi-dual formulation for the problem of unbalanced quadratic optimal transport and we study its stability properties, namely we give upper and lower bounds for the Bregman divergence of the new objective that hold globally.	Adrien Vacher; François-Xavier Vialard;
1201	Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and adaptively provides high-quality intrinsic rewards to enhance exploration in reinforcement learning (RL).	Mingqi Yuan; Bo Li; Xin Jin; Wenjun Zeng;
1202	Symmetry-Aware Robot Design with Structured Subgroups Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, these works searched the robots directly from the vast design space and ignored common structures, resulting in abnormal robots and poor performance. To tackle this problem, we propose a Symmetry-Aware Robot Design (SARD) framework that exploits the structure of the design space by incorporating symmetry searching into the robot design process.	Heng Dong; Junyu Zhang; Tonghan Wang; Chongjie Zhang;
1203	Gradient Descent Monotonically Decreases The Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this model, we prove that the GFS sharpness decreases monotonically.	Itai Kreisler; Mor Shpigel Nacson; Daniel Soudry; Yair Carmon;
1204	Self-supervised Neural Factor Analysis for Disentangling Utterance-level Speech Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Inspired by how HuBERT uses clustering to discover hidden acoustic units, we formulate a factor analysis (FA) model that uses the discovered hidden acoustic units to align the SSL features.	Weiwei Lin; Chenhang HE; Man-Wai Mak; Youzhi Tu;
1205	Refined Regret for Adversarial MDPs with Linear Function Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper provides two algorithms that improve the regret to $\tilde{\mathcal O}(\sqrt K)$ in the same setting.	Yan Dai; Haipeng Luo; Chen-Yu Wei; Julian Zimmert;
1206	EM-Network: Oracle Guided Self-distillation for Sequence Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce EM-Network, a novel self-distillation approach that effectively leverages target information for supervised sequence-to-sequence (seq2seq) learning.	Ji Won Yoon; SungHwan Ahn; Hyeonseung Lee; Minchan Kim; Seok Min Kim; Nam Soo Kim;
1207	A Robust Optimisation Perspective on Counterexample-Guided Repair of Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, whether counterexample-guided repair is guaranteed to terminate remains an open question. We approach this question by showing that counterexample-guided repair can be viewed as a robust optimisation algorithm.	David Boetius; Stefan Leue; Tobias Sutter;
1208	One-Shot Federated Conformal Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present a Conformal Prediction method that computes prediction sets in a one-shot Federated Learning (FL) setting.	Pierre Humbert; Batiste Le bars; Aurélien Bellet; Sylvain Arlot;
1209	Optimality of Thompson Sampling with Noninformative Priors for Pareto Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider the optimality of TS for the Pareto model that has a heavy tail and is parameterized by two unknown parameters.	Jongyeong Lee; Junya Honda; Chao-Kai Chiang; Masashi Sugiyama;
1210	Dimension-independent Certified Neural Network Watermarks Via Mollifier Smoothing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By leveraging mollifier theory, this paper proposes a mollifier smoothing method with dimension-independent certified radius of our proposed smooth classifier, for conducting the certified watermark problem against the $l_p$-norm watermark removal attacks ($1 \leq p \leq \infty$) for high parameter dimension $d$.	Jiaxiang Ren; Yang Zhou; Jiayin Jin; Lingjuan Lyu; Da Yan;
1211	Regret Bounds for Markov Decision Processes with Recursive Optimized Certainty Equivalents Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new episodic risk-sensitive reinforcement learning formulation based on tabular Markov decision processes with recursive OCEs.	Wenhao XU; Xuefeng Gao; Xuedong He;
1212	A Generalization of ViT/MLP-Mixer to Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paradigm suffers from two major limitations, over-squashing and poor long-range dependencies, that can be solved using global attention but significantly increases the computational cost to quadratic complexity. In this work, we propose an alternative approach to overcome these structural limitations by leveraging the ViT/MLP-Mixer architectures introduced in computer vision.	Xiaoxin He; Bryan Hooi; Thomas Laurent; Adam Perold; Yann LeCun; Xavier Bresson;
1213	Global Context Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose global context vision transformer (GC ViT), a novel architecture that enhances parameter and compute utilization for computer vision.	Ali Hatamizadeh; Hongxu Yin; Greg Heinrich; Jan Kautz; Pavlo Molchanov;
1214	Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study variance-dependent regret bounds for Markov decision processes (MDPs).	Runlong Zhou; Zhang Zihan; Simon Shaolei Du;
1215	Controllability-Aware Unsupervised Skill Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a novel unsupervised skill discovery method, Controllability-aware Skill Discovery (CSD), which actively seeks complex, hard-to-control skills without supervision.	Seohong Park; Kimin Lee; Youngwoon Lee; Pieter Abbeel;
1216	Improving Graph Neural Networks with Learnable Propagation Operators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: On the other hand, Convolutional Neural Networks (CNNs) can learn diverse propagation filters, and phenomena like over-smoothing are typically not apparent in CNNs. In this paper, we bridge these gaps by incorporating trainable channel-wise weighting factors $\omega$ to learn and mix multiple smoothing and sharpening propagation operators at each layer.	Moshe Eliasof; Lars Ruthotto; Eran Treister;
1217	Multiplier Bootstrap-based Exploration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Multiplier Bootstrap-based Exploration (MBE), a novel exploration strategy that is applicable to any reward model amenable to weighted loss minimization.	Runzhe Wan; Haoyu Wei; Branislav Kveton; Rui Song;
1218	Predictable MDP Abstraction for Unsupervised Model-Based RL Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Errors in this predictive model can degrade the performance of model-based controllers, and complex Markov decision processes (MDPs) can present exceptionally difficult prediction problems. To mitigate this issue, we propose predictable MDP abstraction (PMA): instead of training a predictive model on the original MDP, we train a model on a transformed MDP with a learned action space that only permits predictable, easy-to-model actions, while covering the original state-action space as much as possible.	Seohong Park; Sergey Levine;
1219	CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Classification with Hierarchical Label Sets (or CHiLS), an alternative strategy for zero-shot classification specifically designed for datasets with implicit semantic hierarchies.	Zachary Novack; Julian McAuley; Zachary Chase Lipton; Saurabh Garg;
1220	Towards Credible Visual Model Interpretation with Path Attribution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Not only that, for deep visual models, the methods may also not conform to the original game-theoretic intuitions that are the basis of their axiomatic nature. To address these issues, we perform a systematic investigation of the path attribution framework.	NAVEED AKHTAR; Mohammad A. A. K. Jalwana;
1221	Optimistic Online Mirror Descent for Bridging Stochastic and Adversarial Online Convex Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For convex and smooth functions, we obtain the same $\mathcal{O}(\sqrt{\sigma_{1:T}^2}+\sqrt{\Sigma_{1:T}^2})$ regret bound, without the convexity requirement of individual functions. For strongly convex and smooth functions, we establish an $\mathcal{O}(\min\{\log (\sigma_{1:T}^2+\Sigma_{1:T}^2), (\sigma_{\max}^2 + \Sigma_{\max}^2) \log T\})$ bound, better than their $\mathcal{O}((\sigma_{\max}^2 + \Sigma_{\max}^2) \log T)$ result. For exp-concave and smooth functions, we achieve a new $\mathcal{O}(d\log(\sigma_{1:T}^2+\Sigma_{1:T}^2))$ bound.	Sijia Chen; Wei-Wei Tu; Peng Zhao; Lijun Zhang;
1222	Interactive Object Placement with Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these random vectors are not interpretable, which prevents users from interacting with the object placement process. To address this problem, we propose an Interactive Object Placement method with Reinforcement Learning, dubbed IOPRE, to make sequential decisions for producing a reasonable placement given an initial location and size of the foreground.	Shengping Zhang; Quanling Meng; Qinglin Liu; Liqiang Nie; Bineng Zhong; Xiaopeng Fan; Rongrong Ji;
1223	Mechanistic Mode Connectivity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study neural network loss landscapes through the lens of mode connectivity, the observation that minimizers of neural networks retrieved via training on a dataset are connected via simple paths of low loss.	Ekdeep Singh Lubana; Eric J Bigelow; Robert P. Dick; David Krueger; Hidenori Tanaka;
1224	Contrastive Learning Meets Homophily: Two Birds with One Stone Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduced a new parameterized neighbor sampling component to replace the conventional sub-optimal samplings.	Dongxiao He; JiTao Zhao; Rui Guo; Zhiyong Feng; Di Jin; Yuxiao Huang; Zhen Wang; Weixiong Zhang;
1225	Adversarial Parameter Attack on Deep Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such attacks could be detected by the user, because the accuracy of the attacked network will reduce and the network cannot work normally. To make the attack more stealthy, in this paper, the adversarial parameter attack is proposed, in which small perturbations to the parameters of the network are made such that the accuracy of the attacked network does not decrease much, but its robustness against adversarial example attacks becomes much lower.	Lijia Yu; Yihan Wang; Xiao-Shan Gao;
1226	AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we adopt Non-Parametric Classifier to perform the test-time Adaptation (AdaNPC).	YiFan Zhang; xue wang; Kexin Jin; Kun Yuan; Zhang Zhang; Liang Wang; Rong Jin; Tieniu Tan;
1227	Tuning Language Models As Training Data Generators for Augmentation-Enhanced Few-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study few-shot learning with PLMs from a different perspective: We first tune an autoregressive PLM on the few-shot samples and then use it as a generator to synthesize a large amount of novel training samples which augment the original training set. To encourage the generator to produce label-discriminative samples, we train it via weighted maximum likelihood where the weight of each token is automatically adjusted based on a discriminative meta-learning objective.	Yu Meng; Martin Michalski; Jiaxin Huang; Yu Zhang; Tarek Abdelzaher; Jiawei Han;
1228	NeuralSlice: Neural 3D Triangle Mesh Reconstruction Via Slicing 4D Tetrahedral Meshes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel 3D shape representation named NeuralSlice, which represents a 3D shape as the intersection of a 4D tetrahedral mesh and a 4D hyperplane.	Chenbo Jiang; Jie Yang; Shwai He; Yu-Kun Lai; Lin Gao;
1229	Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we give a different parametrization of the model which leads to a new implicit regularization effect that combines the benefit of $\ell_1$ and $\ell_2$ interpolators.	Mo Zhou; Rong Ge;
1230	Leveraging Offline Data in Online Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Practical scenarios often motivate an intermediate setting: if we have some set of offline data and may also interact with the environment, how can we best use the offline data to minimize the number of online interactions necessary to learn an $\epsilon$-optimal policy. In this work, we consider this setting, which we call the FineTuneRL setting, for MDPs with linear structure.	Andrew Wagenmaker; Aldo Pacchiano;
1231	Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce efficient $(1+\varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem, where the inputs are a matrix $\mathbf{A}\in\{0,1\}^{n\times d}$, a rank parameter $k>0$, as well as an accuracy parameter $\varepsilon>0$, and the goal is to approximate $\mathbf{A}$ as a product of low-rank factors $\mathbf{U}\in\{0,1\}^{n\times k}$ and $\mathbf{V}\in\{0,1\}^{k\times d}$.	Ameya Velingker; Maximilian Vötsch; David Woodruff; Samson Zhou;
1232	Pareto Regret Analyses in Multi-objective Multi-armed Bandit Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study Pareto optimality in multi-objective multi-armed bandit by providing a formulation of adversarial multi-objective multi-armed bandit and defining its Pareto regrets that can be applied to both stochastic and adversarial settings.	Mengfan Xu; Diego Klabjan;
1233	Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the reinforcement learning (RL) problem with general utilities which consists in maximizing a function of the state-action occupancy measure.	Anas Barakat; Ilyas Fatkhullin; Niao He;
1234	Meta Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the use of amortized optimization to predict optimal transport (OT) maps from the input measures, which we call Meta OT.	Brandon Amos; Giulia Luise; Samuel Cohen; Ievgen Redko;
1235	Hyperbolic Image-text Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose MERU, a contrastive model that yields hyperbolic representations of images and text.	Karan Desai; Maximilian Nickel; Tanmay Rajpurohit; Justin Johnson; Shanmukha Ramakrishna Vedantam;
1236	LongCoder: A Long-Range Pre-trained Language Model for Code Completion Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a new task for code completion that focuses on handling long code input and propose a sparse Transformer model, called LongCoder, to address this task.	Daya Guo; Canwen Xu; Nan Duan; Jian Yin; Julian McAuley;
1237	Which Is Better for Learning with Noisy Labels: The Semi-supervised Method or Modeling Label Noise? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we answer the question from the perspective of causal data generative process.	Yu Yao; Mingming Gong; Yuxuan Du; Jun Yu; Bo Han; Kun Zhang; Tongliang Liu;
1238	Surface Snapping Optimization Layer for Single Image Object Shape Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we leverage recent advances in monocular scene understanding to incorporate an additional geometric cue of surface normals.	Yuan-Ting Hu; Alex Schwing; Raymond A. Yeh;
1239	A Closer Look at Self-Supervised Lightweight Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we develop and benchmark several self-supervised pre-training methods on image classification tasks and some downstream dense prediction tasks.	Shaoru Wang; Jin Gao; Zeming Li; Xiaoqin Zhang; Weiming Hu;
1240	NP-SemiSeg: When Neural Processes Meet Semi-Supervised Semantic Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we move one step forward by adapting NPs to semi-supervised semantic segmentation, resulting in a new model called NP-SemiSeg.	Jianfeng Wang; Daniela Massiceti; Xiaolin Hu; Vladimir Pavlovic; Thomas Lukasiewicz;
1241	A Critical Revisit of Adversarial Robustness in 3D Point Cloud Recognition with Diffusion-Driven Purification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we pinpoint a major limitation of the leading empirical defense, adversarial training, when applied to 3D point cloud models: gradient obfuscation, which significantly hampers robustness against potent attacks.	Jiachen Sun; Jiongxiao Wang; Weili Nie; Zhiding Yu; Zhuoqing Mao; Chaowei Xiao;
1242	Regret-Minimizing Double Oracle for Extensive-Form Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Moreover, we demonstrate that the sample complexity of XDO can be exponential in the number of information sets $\|S\|$, owing to the exponentially decaying stopping threshold of restricted games. To solve this problem, we propose the Periodic Double Oracle (PDO) method, which has the lowest sample complexity among regret minimization-based double oracle methods, being only polynomial in $\|S\|$.	Xiaohang Tang; Le Cong Dinh; Stephen Marcus McAleer; Yaodong Yang;
1243	Representations and Exploration for Deep Reinforcement Learning Using Singular Value Decomposition Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we provide a singular value decomposition based method that can be used to obtain representations that preserve the underlying transition structure in the domain.	Yash Chandak; Shantanu Thakoor; Zhaohan Daniel Guo; Yunhao Tang; Remi Munos; Will Dabney; Diana L Borsa;
1244	Graph Ladling: Shockingly Simple Parallel GNN Training Without Intermediate Communication Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite their popularity, scaling GNNs either by deepening or widening suffers from prevalent issues of $\textit{unhealthy gradients, over-smoothening, information squashing}$, which often lead to sub-standard performance. In this work, we are interested in exploring a principled way to scale GNNs capacity without deepening or widening, which can improve its performance across multiple small and large graphs.	AJAY KUMAR JAISWAL; Shiwei Liu; Tianlong Chen; Ying Ding; Zhangyang Wang;
1245	Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we leverage the notion of barrier function to explicitly encode the hard safety chance constraints, and given that the environment is unknown, relax them to our design of generative-model-based soft barrier functions. Based on such soft barriers, we propose a novel safe RL approach with bi-level optimization that can jointly learn the unknown environment and optimize the control policy, while effectively avoiding the unsafe region with safety probability optimization.	Yixuan Wang; Simon Sinong Zhan; Ruochen Jiao; Zhilu Wang; Wanxin Jin; Zhuoran Yang; Zhaoran Wang; Chao Huang; Qi Zhu;
1246	Adaptive Identification of Populations with Treatment Benefit in Clinical Trials: Machine Learning Challenges and Solutions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We find that the unique characteristics of the subpopulation selection problem — most importantly that (i) one is usually interested in finding subpopulations with any treatment benefit (and not necessarily the single subgroup with largest effect) given a limited budget and that (ii) effectiveness only has to be demonstrated across the subpopulation on average — give rise to interesting challenges and new desiderata when designing algorithmic solutions. Building on these findings, we propose AdaGGI and AdaGCPI, two meta-algorithms for subpopulation construction.	Alicia Curth; Alihan Hüyük; Mihaela van der Schaar;
1247	Efficient Sequence Transduction By Jointly Predicting Tokens and Durations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper introduces a novel Token-and-Duration Transducer (TDT) architecture for sequence-to-sequence tasks.	Hainan Xu; Fei Jia; Somshubra Majumdar; He Huang; Shinji Watanabe; Boris Ginsburg;
1248	Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the stochastic linear contextual bandit problem with high-dimensional features.	Sunrit Chakraborty; Saptarshi Roy; Ambuj Tewari;
1249	Distributed Contextual Linear Bandits with Minimax Optimal Communication Cost Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study distributed contextual linear bandits with stochastic contexts, where $N$ agents/learners act cooperatively to solve a linear bandit-optimization problem with $d$-dimensional features over the course of $T$ rounds. For this problem, we derive the first ever information-theoretic lower bound $\Omega(dN)$ on the communication cost of any algorithm that performs optimally in a regret minimization setup.	Sanae Amani; Tor Lattimore; András György; Lin Yang;
1250	Preprocessors Matter! Realistic Decision-Based Attacks on Machine Learning Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, attacks that are unaware of this invariance inevitably waste a large number of queries to re-discover or overcome it. We, therefore, develop techniques to (i) reverse-engineer the preprocessor and then (ii) use this extracted information to attack the end-to-end system.	Chawin Sitawarin; Florian Tramèr; Nicholas Carlini;
1251	MixFlows: Principled Variational Inference Via Mixed Flows Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work presents mixed variational flows (MixFlows), a new variational family that consists of a mixture of repeated applications of a map to an initial reference distribution.	Zuheng Xu; Naitong Chen; Trevor Campbell;
1252	Quantum Ridgelet Transform: Winning Lottery Ticket of Neural Networks with Quantum Computation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Ridgelet transform has been a fundamental mathematical tool in the theoretical studies of neural networks, but the practical applicability of ridgelet transform to conducting learning tasks was limited since its numerical implementation by conventional classical computation requires an exponential runtime $\exp(O(D))$ as data dimension $D$ increases. To address this problem, we develop a quantum ridgelet transform (QRT), which implements the ridgelet transform of a quantum state within a linear runtime $O(D)$ of quantum computation.	Hayata Yamasaki; Sathyawageeswar Subramanian; Satoshi Hayakawa; Sho Sonoda;
1253	Federated Conformal Predictors for Distributed Uncertainty Quantification Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we extend conformal prediction to the federated learning setting.	Charles Lu; Yaodong Yu; Sai Praneeth Karimireddy; Michael Jordan; Ramesh Raskar;
1254	Quantifying Human Priors Over Social and Navigation Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: can I walk from here to there? In this work, we leverage the combinatorial structure of graphs to quantify human priors over such relational data.	Gecia Bravo-Hermsdorff;
1255	Learning in POMDPs Is Sample-Efficient with Hindsight Observability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce new algorithms for the tabular and function approximation settings that are provably sample-efficient with hindsight observability, even in POMDPs that would otherwise be statistically intractable.	Jonathan Lee; Alekh Agarwal; Christoph Dann; Tong Zhang;
1256	Efficient Graph Field Integrators Meet Point Clouds Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present two new classes of algorithms for efficient field integration on graphs encoding point cloud data.	Krzysztof Marcin Choromanski; Arijit Sehanobish; Han Lin; YUNFAN ZHAO; Eli Berger; Tetiana Parshakova; Alvin Pan; David Watkins; Tianyi Zhang; Valerii Likhosherstov; Somnath Basu Roy Chowdhury; Kumar Avinava Dubey; Deepali Jain; Tamas Sarlos; Snigdha Chaturvedi; Adrian Weller;
1257	Are Large Kernels Better Teachers Than Transformers for ConvNets? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper reveals a new appeal of the recently emerged large-kernel Convolutional Neural Networks (ConvNets): as the teacher in Knowledge Distillation (KD) for small-kernel ConvNets.	Tianjin Huang; Lu Yin; Zhenyu Zhang; Li Shen; Meng Fang; Mykola Pechenizkiy; Zhangyang Wang; Shiwei Liu;
1258	MonoNeRF: Learning Generalizable NeRFs from Monocular Videos Without Camera Poses Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a generalizable neural radiance fields – MonoNeRF, that can be trained on large-scale monocular videos of moving in static scenes without any ground-truth annotations of depth and camera poses.	Yang Fu; Ishan Misra; Xiaolong Wang;
1259	Fair Densities Via Boosting The Sufficient Statistics of Exponential Families Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a boosting algorithm to pre-process data for fairness.	Alexander Soen; Hisham Husain; Richard Nock;
1260	Fast Rates for Maximum Entropy Exploration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we study the maximum entropy exploration problem of two different types.	Daniil Tiapkin; Denis Belomestny; Daniele Calandriello; Eric Moulines; Remi Munos; Alexey Naumov; pierre perrault; Yunhao Tang; Michal Valko; Pierre MENARD;
1261	On The Robustness of Randomized Ensembles to Adversarial Perturbations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: . In this work, we first demystify RECs as we derive fundamental results regarding their theoretical limits, necessary and sufficient conditions for them to be useful, and more.	Hassan Dbouk; Naresh Shanbhag;
1262	Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver.	Runlong Zhou; Ruosong Wang; Simon Shaolei Du;
1263	Multi-Environment Pretraining Enables Transfer to Action Limited Datasets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In reinforcement learning, however, a key challenge is that available data of sequential decision making is often not annotated with actions – for example, videos of game-play are much more available than sequences of frames paired with their logged game controls. We propose to circumvent this challenge by combining large but sparsely-annotated datasets from a target environment of interest with fully-annotated datasets from various other source environments.	David Venuto; Sherry Yang; Pieter Abbeel; Doina Precup; Igor Mordatch; Ofir Nachum;
1264	Analyzing Convergence in Quantum Neural Networks: Deviations from Neural Tangent Kernels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study the dynamics of QNNs and show that contrary to popular belief it is qualitatively different from that of any kernel regression: due to the unitarity of quantum operations, there is a non-negligible deviation from the tangent kernel regression derived at the random initialization.	Xuchen You; Shouvanik Chakrabarti; Boyang Chen; Xiaodi Wu;
1265	Latent Traversals in Generative Models As Potential Flows Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we instead propose to model latent structures with a learned dynamic potential landscape, thereby performing latent traversals as the flow of samples down the landscape’s gradient.	Yue Song; T. Anderson Keller; Nicu Sebe; Max Welling;
1266	Taxonomy-Structured Domain Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we tackle a generalization with taxonomy-structured domains, which formalizes domains with nested, hierarchical similarity structures such as animal species and product catalogs.	Tianyi Liu; Zihao Xu; Hao He; Guang-Yuan Hao; Guang-He Lee; Hao Wang;
1267	Hybrid Energy Based Model in The Feature Space for Out-of-Distribution Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces the HEAT model, a new post-hoc OOD detection method estimating the density of in-distribution (ID) samples using hybrid energy-based models (EBM) in the feature space of a pre-trained backbone.	Marc Lafon; Elias Ramzi; Clément Rambour; Nicolas THOME;
1268	A New PHO-rmula for Improved Performance of Semi-Structured Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We show that techniques to properly identify the contributions of the different model components in SSNs, however, lead to suboptimal network estimation, slower convergence, and degenerated or erroneous predictions. In order to solve these problems while preserving favorable model properties, we propose a non-invasive post-hoc orthogonalization (PHO) that guarantees identifiability of model components and provides better estimation and prediction quality.	David Rügamer;
1269	Offline Reinforcement Learning with Closed-Form Policy Improvement Operators Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose our closed-form policy improvement operators.	Jiachen Li; Edwin Zhang; Ming Yin; Qinxun Bai; Yu-Xiang Wang; William Yang Wang;
1270	Uncovering Adversarial Risks of Test-Time Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, we uncover a novel security vulnerability of TTA based on the insight that predictions on benign samples can be impacted by malicious samples in the same batch. To exploit this vulnerability, we propose Distribution Invading Attack (DIA), which injects a small fraction of malicious data into the test batch.	Tong Wu; Feiran Jia; Xiangyu Qi; Jiachen T. Wang; Vikash Sehwag; Saeed Mahloujifar; Prateek Mittal;
1271	DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Fundamentally, this might be because multi-step policy improvements require operations that cannot be approximated by stochastic samples, hence hindering the widespread adoption of such methods in practice. To address such limitations, we introduce doubly multi-step off-policy VI (DoMo-VI), a novel oracle algorithm that combines multi-step policy improvements and policy evaluations.	Yunhao Tang; Tadashi Kozuno; Mark Rowland; Anna Harutyunyan; Remi Munos; Bernardo Avila Pires; Michal Valko;
1272	Reward-Mixing MDPs with Few Latent Contexts Are Learnable Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we resolve several open questions for the general RMMDP setting.	Jeongyeol Kwon; Yonathan Efroni; Constantine Caramanis; Shie Mannor;
1273	Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Under the function approximation setup where the optimal latent state-action $Q$-function is linear in the state feature, and the optimal $Q$-function has a gap in actions, we provide a computationally and statistically efficient algorithm for finding the exact optimal policy.	Masatoshi Uehara; Ayush Sekhari; Jason D. Lee; Nathan Kallus; Wen Sun;
1274	A Flexible Diffusion Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a general framework for parameterizing diffusion models, particularly the spatial part of forward SDEs, by leveraging the symplectic and Riemannian geometry of the data manifold.	weitao Du; He Zhang; Tao Yang; Yuanqi Du;
1275	On Sampling with Approximate Transport Maps Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In both cases, the quality of the learned transport conditions performance. The present work clarifies for the first time the relative strengths and weaknesses of these two approaches.	Louis Grenioux; Alain Durmus; Eric Moulines; Marylou Gabrié;
1276	COLA: Orchestrating Error Coding and Learning for Robust Neural Network Inference Against Hardware Defects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose to reduce inner layer feature error correlation by 1) adopting a separated architecture, where the last portions of the paths to all output nodes are separated, and 2) orthogonalizing weights in common DNN layers so that the intermediate features are orthogonal with each other.	Anlan Yu; Ning Lyu; Jieming Yin; Zhiyuan Yan; Wujie Wen;
1277	On The Occupancy Measure of Non-Markovian Policies in Continuous MDPs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While expected, for technical reasons, the translation of this result to continuous state space has resisted until now. Our main contribution is to fill this gap and to provide a general measure-theoretic treatment of the problem, permitting, in particular, its extension to continuous MDPs.	Romain Laroche; Remi Tachet des Combes;
1278	How Jellyfish Characterise Alternating Group Equivariant Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide a full characterisation of all of the possible alternating group ($A_n$) equivariant neural networks whose layers are some tensor power of $\mathbb{R}^{n}$.	Edward Pearce-Crump;
1279	Learning The Dynamics of Sparsely Observed Interacting Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We address the problem of learning the dynamics of an unknown non-parametric system linking a target and a feature time series.	Linus Bleistein; Adeline Fermanian; Anne-Sophie Jannot; Agathe Guilloux;
1280	Differentiable Multi-Target Causal Bayesian Experimental Design Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce a gradient-based approach for the problem of Bayesian optimal experimental design to learn causal models in a batch setting — a critical component for causal discovery from finite data where interventions can be costly or risky.	Panagiotis Tigas; Yashas Annadani; Desi R. Ivanova; Andrew Jesson; Yarin Gal; Adam Foster; Stefan Bauer;
1281	Fair Yet Asymptotically Equal Collaborative Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper explores an incentive design that guarantees fairness so that nodes receive rewards commensurate to their contributions.	Xiaoqiang Lin; Xinyi Xu; See-Kiong Ng; Chuan-Sheng Foo; Bryan Kian Hsiang Low;
1282	From Noisy Fixed-Point Iterations to Private ADMM for Centralized and Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study differentially private (DP) machine learning algorithms as instances of noisy fixed-point iterations, in order to derive privacy and utility results from this well-studied framework.	Edwige Cyffers; Aurélien Bellet; Debabrota Basu;
1283	Safe Offline Reinforcement Learning with Real-Time Budget Constraints Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Besides, in many realworld applications, the learned policy is required to respond to dynamically determined safety budgets (i.e., constraint threshold) in real time. In this paper, we target at the above real-time budget constraint problem under the offline setting, and propose Trajectory-based REal-time Budget Inference (TREBI) as a novel solution that approaches this problem from the perspective of trajectory distribution.	Qian Lin; Bo Tang; Zifan Wu; Chao Yu; Shangqin Mao; Qianlong Xie; Xingxing Wang; Dong Wang;
1284	Differential Privacy Has Bounded Impact on Fairness in Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We theoretically study the impact of differential privacy on fairness in classification.	Paul Mangold; Michaël Perrot; Aurélien Bellet; Marc Tommasi;
1285	Rethinking Visual Reconstruction: Experience-Based Content Completion Guided By Visual Cues Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Most existing approaches ignored the brain completion mechanism. In this work, we propose to reconstruct seen images with both the visual perception and the brain completion process, and design a simple, yet effective visual decoding framework to achieve this goal.	Jiaxuan Chen; Yu Qi; Gang Pan;
1286	The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the relation between other MVSSL methods and MI remains unclear. We consider a different lower bound on the MI consisting of an entropy and a reconstruction term (ER), and analyze the main MVSSL families through its lens.	Borja Rodríguez Gálvez; Arno Blaas; Pau Rodriguez; Adam Golinski; Xavier Suau; Jason Ramapuram; Dan Busbridge; Luca Zappella;
1287	SlotGAT: Slot-based Message Passing for Heterogeneous Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the issue, we propose SlotGAT with separate message passing processes in slots, one for each node type, to maintain the representations in their own node-type feature spaces.	Ziang Zhou; Jieming Shi; Renchi Yang; Yuanhang Zou; Qing Li;
1288	WL Meet VC Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study GNNs’ generalization ability through the lens of Vapnik-Chervonenkis (VC) dimension theory in two settings, focusing on graph-level predictions.	Christopher Morris; Floris Geerts; Jan Tönshoff; Martin Grohe;
1289	Hierarchical Diffusion for Offline Decision Making Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we first formulate the problem of offline long-horizon decision-$\mathbf{M}$ak$\mathbf{I}$ng from the perspective of conditional generative modeling by incorporating goals into the control-as-inference graphic models. A $\mathbf{H}$ierarchical trajectory-level $\mathbf{D}$iffusion probabilistic model is then proposed with classifier-free guidance. HDMI employs a cascade framework that utilizes the reward-conditional goal diffuser for the subgoal discovery and the goal-conditional trajectory diffuser for generating the corresponding action sequence of subgoals.	Wenhao Li; Xiangfeng Wang; Bo Jin; Hongyuan Zha;
1290	A Connection Between One-Step RL and Critic Regularization in Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: These methods typically require more compute but have appealing lower-bound guarantees. In this paper, we draw a connection between these methods: applying a multi-step critic regularization method with a regularization coefficient of 1 yields the same policy as one-step RL.	Benjamin Eysenbach; Matthieu Geist; Sergey Levine; Ruslan Salakhutdinov;
1291	SOM-CPC: Unsupervised Contrastive Learning with Self-Organizing Maps for Structured Representations of High-Rate Time Series Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work we propose SOM-CPC, a model that visualizes data in an organized 2D manifold, while preserving higher-dimensional information.	Iris A.M. Huijben; Arthur Andreas Nijdam; Sebastiaan Overeem; Merel M Van Gilst; Ruud Van Sloun;
1292	Quantized Distributed Training of Large Models with Convergence Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present QSDP, a variant of FSDP which supports both gradient and weight quantization with theoretical guarantees, is simple to implement and has essentially no overheads.	Ilia Markov; Adrian Vladu; Qi Guo; Dan Alistarh;
1293	LipsNet: A Smooth and Robust Neural Network with Adaptive Lipschitz Constant for High Accuracy Optimal Control Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The action fluctuation is caused by the high Lipschitz constant of actor networks. To address this problem, we propose a neural network named LipsNet.	Xujie Song; Jingliang Duan; Wenxuan Wang; Shengbo Eben Li; Chen Chen; Bo Cheng; Bo Zhang; Junqing Wei; Xiaoming Simon Wang;
1294	Fast Excess Risk Rates Via Offset Rademacher Complexity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on the offset Rademacher complexity, this work outlines a systematical framework for deriving sharp excess risk bounds in statistical learning without Bernstein condition.	Chenguang Duan; Yuling Jiao; Lican Kang; Xiliang Lu; Jerry Zhijian Yang;
1295	Doubly Adversarial Federated Bandits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For the bandit feedback setting, we propose a near-optimal federated bandit algorithm called FEDEXP3.	Jialin Yi; Milan Vojnovic;
1296	Architecture-Agnostic Masked Image Modeling — From ViT Back to CNN Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we observe that MIM essentially teaches the model to learn better middle-order interactions among patches for more generalized feature extraction.	Siyuan Li; Di Wu; Fang Wu; Zelin Zang; Stan Z. Li;
1297	On Strengthening and Defending Graph Reconstruction Attack with Markov Chain Approximation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Especially by taking GNNs as a Markov chain and attacking GNNs via a flexible chain approximation, we systematically explore the underneath principles of graph reconstruction attack, and propose two information theory-guided mechanisms: (1) the chain-based attack method with adaptive designs for extracting more private information; (2) the chain-based defense method that sharply reduces the attack fidelity with moderate accuracy loss.	Zhanke Zhou; Chenyu Zhou; Xuan Li; Jiangchao Yao; quanming yao; Bo Han;
1298	Cooperative Open-ended Learning Framework for Zero-Shot Coordination Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, these approaches can result in a loss of learning and an inability to cooperate with certain strategies within the population, known as cooperative incompatibility. To address this issue, we propose the Cooperative Open-ended LEarning (COLE) framework, which constructs open-ended objectives in cooperative games with two players from the perspective of graph theory to assess and identify the cooperative ability of each strategy.	Yang Li; Shao Zhang; Jichen Sun; Yali Du; Ying Wen; Xinbing Wang; Wei Pan;
1299	Stochastic Gradient Descent Under Markovian Sampling Schemes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study a variation of vanilla stochastic gradient descent where the optimizer only has access to a Markovian sampling scheme.	Mathieu Even;
1300	Learning Instance-Specific Augmentations By Capturing Local Invariances Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce InstaAug, a method for automatically learning input-specific augmentations from data.	Ning Miao; Tom Rainforth; Emile Mathieu; Yann Dubois; Yee Whye Teh; Adam Foster; Hyunjik Kim;
1301	Sampling-based Nyström Approximation and Kernel Quadrature Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We analyze the Nyström approximation of a positive definite kernel associated with a probability measure.	Satoshi Hayakawa; Harald Oberhauser; Terry Lyons;
1302	CO-BED: Information-Theoretic Contextual Optimization Via Bayesian Experimental Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We formalize the problem of contextual optimization through the lens of Bayesian experimental design and propose CO-BED—a general, model-agnostic framework for designing contextual experiments using information-theoretic principles.	Desi R. Ivanova; Joel Jennings; Tom Rainforth; Cheng Zhang; Adam Foster;
1303	Nonlinear Advantage: Trained Networks Might Not Be As Complex As You Think Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To characterize the depth of the resulting partially linearized network, we introduce a measure called average path length, representing the average number of active nonlinearities encountered along a path in the network graph.	Christian H.X. Ali Mehmeti-Göpel; Jan Disselhoff;
1304	Bidirectional Learning for Offline Model-based Biological Sequence Design Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we focus on biological sequence design to maximize some sequence score.	Can Chen; Yingxue Zhang; Xue Liu; Mark Coates;
1305	Achieving High Accuracy with PINNs Via Energy Natural Gradient Descent Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose energy natural gradient descent, a natural gradient method with respect to a Hessian-induced Riemannian metric as an optimization algorithm for physics-informed neural networks (PINNs) and the deep Ritz method.	Johannes Müller; Marius Zeinhofer;
1306	Graph Switching Dynamical Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we propose a novel graph-based approach for switching dynamical systems, GRAph Switching dynamical Systems (GRASS), in which we use a dynamic graph to characterize interactions between objects and learn both intra-object and inter-object mode-switching behaviour.For benchmarking, we create two new datasets, a synthesized ODE-driven particles dataset and a real-world Salsa-couple dancing dataset.We will release code and data after acceptance.	Yongtuo Liu; Sara Magliacane; Miltiadis Kofinas; Efstratios Gavves;
1307	Improving Expert Predictions with Conformal Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we develop an automated decision support system that, by design, does not require experts to understand when to trust the system to improve performance.	Eleni Straitouri; Lequn Wang; Nastaran Okati; Manuel Gomez Rodriguez;
1308	Neural Wasserstein Gradient Flows for Discrepancies with Riesz Kernels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Wasserstein gradient flows of maximum mean discrepancy (MMD) functionals with non-smooth Riesz kernels show a rich structure as singular measures can become absolutely continuous ones and conversely. In this paper we contribute to the understanding of such flows.	Fabian Altekrüger; Johannes Hertrich; Gabriele Steidl;
1309	Boosting Graph Contrastive Learning Via Graph Contrastive Saliency Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Random augmentations may inevitably lead to semantic information corruption during the training, and force the network to mistakenly focus on semantically irrelevant environmental background structures. To address these limitations and to improve generalization, we propose a novel self-supervised learning framework for GCL, which can adaptively screen the semantic-related substructure in graphs by capitalizing on the proposed gradient-based Graph Contrastive Saliency (GCS).	Chunyu Wei; Yu Wang; Bing Bai; Kai Ni; David J. Brady; LU FANG;
1310	Bigger, Better, Faster: Human-level Atari with Human-level Efficiency Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: BBF relies on scaling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work.	Max Schwarzer; Johan Samir Obando Ceron; Aaron Courville; Marc G Bellemare; Rishabh Agarwal; Pablo Samuel Castro;
1311	Nonparametric Iterative Machine Teaching Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we consider the problem of Iterative Machine Teaching (IMT), where the teacher provides examples to the learner iteratively such that the learner can achieve fast convergence to a target model.	Chen Zhang; Xiaofeng Cao; Weiyang Liu; Ivor Tsang; James Kwok;
1312	On The Convergence Rate of Gaussianization with Random Rotations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We explore potential speed-ups and formulate challenges for further research.	Felix Draxler; Lars Kühmichel; Armand Rousselot; Jens Müller; Christoph Schnoerr; Ullrich Koethe;
1313	Invariance in Policy Optimisation and Partial Identifiability in Reward Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we formally characterise the partial identifiability of the reward function given several popular reward learning data sources, including expert demonstrations and trajectory comparisons.	Joar Max Viktor Skalse; Matthew Farrugia-Roberts; Stuart Russell; Alessandro Abate; Adam Gleave;
1314	Multi-User Reinforcement Learning with Low Rank Rewards Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our main contribution is an algorithm which explores rewards collaboratively with $N$ user-specific MDPs and can learn rewards efficiently in two key settings: tabular MDPs and linear MDPs.	Dheeraj Mysore Nagaraj; Suhas S Kowshik; Naman Agarwal; Praneeth Netrapalli; Prateek Jain;
1315	Gibbsian Polar Slice Sampling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: By updating the directional and radial components of chain iterates separately, we obtain a family of samplers that mimic polar slice sampling, and yet can be implemented efficiently.	Philip Schär; Michael Habeck; Daniel Rudolf;
1316	HarsanyiNet: Computing Accurate Shapley Values in A Single Forward Propagation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, when people use Shapley values to explain the attribution of input variables of a deep neural network (DNN), it usually requires a very high computational cost to approximate relatively accurate Shapley values in real-world applications. Therefore, we propose a novel network architecture, the HarsanyiNet, which makes inferences on the input sample and simultaneously computes the exact Shapley values of the input variables in a single forward propagation.	Lu Chen; Siyu Lou; Keyan Zhang; Jin Huang; Quanshi Zhang;
1317	A Fast Optimistic Method for Monotone Variational Inequalities Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study monotone variational inequalities that can arise as optimality conditions for constrained convex optimization or convex-concave minimax problems and propose a novel algorithm that uses only one gradient/operator evaluation and one projection onto the constraint set per iteration.	Michael Sedlmayer; Dang-Khoa Nguyen; Radu Ioan Bot;
1318	Improving Hyperparameter Learning Under Approximate Inference in Gaussian Process Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We improve hyperparameter learning in GP models and focus on the interplay between variational inference (VI) and the learning target.	Rui Li; S. T. John; Arno Solin;
1319	Understanding Self-Predictive Learning for Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Building on the theoretical insights, we propose bidirectional self-predictive learning, a novel self-predictive algorithm that learns two representations simultaneously.	Yunhao Tang; Zhaohan Daniel Guo; Pierre Harvey Richemond; Bernardo Avila Pires; Yash Chandak; Remi Munos; Mark Rowland; Mohammad Gheshlaghi Azar; Charline Le Lan; Clare Lyle; András György; Shantanu Thakoor; Will Dabney; Bilal Piot; Daniele Calandriello; Michal Valko;
1320	Towards A Better Understanding of Representation Dynamics Under TD-learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we consider the question: how does end-to-end TD-learning impact the representation over time?Complementary to prior work, we provide a set of analysis that sheds further light on the representation dynamics under TD-learning.	Yunhao Tang; Remi Munos;
1321	VA-learning As A More Efficient Alternative to Q-learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce VA-learning, which directly learns advantage function and value function using bootstrapping, without explicit reference to Q-functions.	Yunhao Tang; Remi Munos; Mark Rowland; Michal Valko;
1322	CRISP: Curriculum Based Sequential Neural Decoders for Polar Code Family Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Motivated by recent successes of data-driven channel decoders, we introduce a novel $\textbf{ C}$ur${\textbf{RI}}$culum based $\textbf{S}$equential neural decoder for $\textbf{P}$olar codes (CRISP).	S Ashwin Hebbar; Viraj Vivek Nadkarni; Ashok Vardhan Makkuva; Suma Bhat; Sewoong Oh; Pramod Viswanath;
1323	Sliced-Wasserstein on Symmetric Positive Definite Matrices for M/EEG Signals Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Learning with these matrices requires the usage of Riemanian geometry to account for their structure. In this paper, we propose a new method to deal with distributions of covariance matrices, and demonstrate its computational efficiency on M/EEG multivariate time series.	Clément Bonet; Benoît Malézieux; Alain Rakotomamonjy; Lucas Drumetz; Thomas Moreau; Matthieu Kowalski; Nicolas Courty;
1324	On Uni-Modal Feature Learning in Supervised Multi-Modal Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose to choose a targeted late-fusion learning method for the given supervised multi-modal task from Uni-Modal Ensemble (UME) and the proposed Uni-Modal Teacher (UMT), according to the distribution of uni-modal and paired features.	Chenzhuang Du; Jiaye Teng; Tingle Li; Yichen Liu; Tianyuan Yuan; Yue Wang; Yang Yuan; Hang Zhao;
1325	Revisiting Weighted Aggregation in Federated Learning with Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we revisit the weighted aggregation process and gain new insights into the training dynamics of FL.	Zexi Li; Tao Lin; Xinyi Shang; Chao Wu;
1326	Why Do Nearest Neighbor Language Models Work? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we set out to understand why retrieval-augmented language models, and specifically why k-nearest neighbor language models (kNN-LMs) perform better than standard parametric LMs, even when the k-nearest neighbor component retrieves examples from the same training set that the LM was originally trained on.	Frank F. Xu; Uri Alon; Graham Neubig;
1327	Implicit Jacobian Regularization Weighted with Impurity of Probability Output Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate that SGD has an implicit regularization effect on the logit-weight Jacobian norm of neural networks.	Sungyoon Lee; Jinseong Park; Jaewook Lee;
1328	Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We consider the problem of decentralized multi-agent reinforcement learning in Markov games.	Dylan J Foster; Noah Golowich; Sham M. Kakade;
1329	SRATTA: Sample Re-ATTribution Attack of Secure Aggregation in Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present SRATTA an attack relying only on aggregated models which, under realistic assumptions, (i) recovers data samples from the different clients, and (ii) groups data samples coming from the same client together.	Tanguy Marchand; Regis Loeb; Ulysse Marteau-Ferey; Jean Ogier du Terrail; Arthur Pignet;
1330	Shape-Guided Dual-Memory Learning for 3D Anomaly Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a shape-guided expert-learning framework to tackle the problem of unsupervised 3D anomaly detection.	Yu-Min Chu; Liu Chieh; Ting-I Hsieh; Hwann-Tzong Chen; Tyng-Luh Liu;
1331	StriderNet: A Graph Reinforcement Learning Approach to Optimize Atomic Structures on Rough Energy Landscapes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we present a graph reinforcement learning approach, StriderNet, that learns a policy to displace the atoms towards low energy configurations.	Vaibhav Bihani; Sahil Manchanda; Srikanth Sastry; Sayan Ranu; N M Anoop Krishnan;
1332	A Deep Conjugate Direction Method for Iteratively Solving Linear Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel deep learning approach to approximate the solution of large, sparse, symmetric, positive-definite linear systems of equations.	Ayano Kaneda; Osman Akar; Jingyu Chen; Victoria Alicia Trevino Kala; David Hyde; Joseph Teran;
1333	Banker Online Mirror Descent: A Universal Approach for Delayed Online Bandit Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Banker Online Mirror Descent (Banker-OMD), a novel framework generalizing the classical Online Mirror Descent (OMD) technique in the online learning literature.	Jiatai Huang; Yan Dai; Longbo Huang;
1334	HyperTuning: Toward Adapting Large Language Models Without Back-propagation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose HyperTuning, a novel approach to model adaptation that uses a hypermodel to generate task-specific parameters for a fixed downstream model.	Jason Phang; Yi Mao; Pengcheng He; Weizhu Chen;
1335	Meta Learning of Interface Conditions for Multi-Domain Physics-Informed Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: While quite a few conditions have been proposed, there is no suggestion about how to select the conditions according to specific problems. To address this gap, we propose META Learning of Interface Conditions (METALIC), a simple, efficient yet powerful approach to dynamically determine appropriate interface conditions for solving a family of parametric PDEs.	Shibo Li; Michael Penwarden; Yiming Xu; Conor Tillinghast; Akil Narayan; Robert Kirby; Shandian Zhe;
1336	Contextual Combinatorial Bandits with Probabilistically Triggered Arms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Under the triggering probability modulated (TPM) condition, we devise the C$^2$-UCB-T algorithm and propose a novel analysis that achieves an $\tilde{O}(d\sqrt{KT})$ regret bound, removing a potentially exponentially large factor $O(1/p_{\min})$, where $d$ is the dimension of contexts, $p_{\min}$ is the minimum positive probability that any arm can be triggered, and batch-size $K$ is the maximum number of arms that can be triggered per round.	Xutong Liu; Jinhang Zuo; Siwei Wang; John C.S. Lui; Mohammad Hajiesmaili; Adam Wierman; Wei Chen;
1337	SAAL: Sharpness-Aware Active Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To overcome overfitting, this paper introduces the first active learning method to incorporate the sharpness of loss space into the acquisition function.	Yoon-Yeong Kim; Youngjae Cho; JoonHo Jang; Byeonghu Na; Yeongmin Kim; Kyungwoo Song; Wanmo Kang; Il-chul Moon;
1338	Towards Deep Attention in Graph Neural Networks: Problems and Remedies Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate some problematic phenomena related to deep graph attention, including vulnerability to over-smoothed features and smooth cumulative attention.	Soo Yong Lee; Fanchen Bu; Jaemin Yoo; Kijung Shin;
1339	Understanding The Role of Feedback in Online Learning with Switching Costs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the role of feedback in online learning with switching costs.	Duo Cheng; Xingyu Zhou; Bo Ji;
1340	Existence and Estimation of Critical Batch Size for Training Generative Adversarial Networks with Two Time-Scale Update Rule Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper studies the relationship between batch size and the number of steps needed for training GANs with TTURs based on constant learning rates.	Naoki Sato; Hideaki Iiduka;
1341	Neural Prediction Errors Enable Analogical Visual Reasoning in Human Standard Intelligence Tests Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Here, we propose a neural network model to solve Raven’s Progressive Matrices (RPM) – one of the standard intelligence tests in human psychology.	Lingxiao Yang; Hongzhi You; Zonglei Zhen; Dahui Wang; Xiaohong Wan; Xiaohua Xie; Ru-Yuan Zhang;
1342	Live in The Moment: Learning Dynamics Model Adapted to Evolving Policy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Previous works learn a dynamics model that fits under the empirical state-action visitation distribution for all historical policies, i.e., the sample replay buffer. However, in this paper, we observe that fitting the dynamics model under the distribution for all historical policies does not necessarily benefit model prediction for the current policy since the policy in use is constantly evolving over time.	Xiyao Wang; Wichayaporn Wongkamjan; Ruonan Jia; Furong Huang;
1343	Theoretical Guarantees of Learning Ensembling Strategies with Applications to Time Series Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we prove a novel result, showing that choosing the best stacked generalization from a (finite or finite-dimensional) family of stacked generalizations based on cross-validated performance does not perform much worse than the oracle best.	Hilaf Hasson; Danielle C. Maddix; Bernie Wang; Gaurav Gupta; Youngsuk Park;
1344	Straightening Out The Straight-Through Estimator: Overcoming Optimization Challenges in Vector Quantized Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We identify the factors that contribute to this issue, including the codebook gradient sparsity and the asymmetric nature of the commitment loss, which leads to misaligned code-vector assignments. We propose to address this issue via affine re-parameterization of the code vectors.	Minyoung Huh; Brian Cheung; Pulkit Agrawal; Phillip Isola;
1345	Mitigating Memorization of Noisy Labels By Clipping The Model Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, our key idea is to induce a loss bound at the logit level, thus universally enhancing the noise robustness of existing losses.	Hongxin Wei; HUIPING ZHUANG; RENCHUNZI XIE; Lei Feng; Gang Niu; Bo An; Yixuan Li;
1346	Approximately Optimal Core Shapes for Tensor Decompositions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we introduce a novel Tucker packing problem, which we prove is NP-hard, and give a polynomial-time approximation scheme based on a reduction to the 2-dimensional knapsack problem with a matroid constraint.	Mehrdad Ghadiri; Matthew Fahrbach; Gang Fu; Vahab Mirrokni;
1347	A Gromov–Wasserstein Geometric View of Spectrum-Preserving Graph Coarsening Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we consider a graph as an element on a metric space equipped with the Gromov–Wasserstein (GW) distance, and bound the difference between the distance of two graphs and their coarsened versions.	Yifan Chen; Rentian Yao; Yun Yang; Jie Chen;
1348	Understanding The Impact of Adversarial Robustness on Accuracy Disparity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While it has long been empirically observed that adversarial robustness may be at odds with standard accuracy and may have further disparate impacts on different classes, it remains an open question to what extent such observations hold and how the class imbalance plays a role within. In this paper, we attempt to understand this question of accuracy disparity by taking a closer look at linear classifiers under a Gaussian mixture model.	Yuzheng Hu; Fan Wu; Hongyang Zhang; Han Zhao;
1349	Revisiting Simple Regret: Fast Rates for Returning A Good Arm Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we make a significant progress on minimizing simple regret in both data-rich ($T\ge n$) and data-poor regime ($T \le n$) where $n$ is the number of arms and $T$ is the number of samples.	Yao Zhao; Connor Stephens; Csaba Szepesvari; Kwang-Sung Jun;
1350	Improved Online Conformal Prediction Via Strongly Adaptive Online Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the problem of uncertainty quantification via prediction sets, in an online setting where the data distribution may vary arbitrarily over time.	Aadyot Bhatnagar; Huan Wang; Caiming Xiong; Yu Bai;
1351	Sample Complexity Bounds for Learning High-dimensional Simplices in Noisy Regimes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose sample complexity bounds for learning a simplex from noisy samples.	seyed amir hossein saberi; Amir Najafi; Abolfazl Motahari; Babak Khalaj;
1352	Analyzing Privacy Leakage in Machine Learning Via Multiple Hypothesis Testing: A Lesson From Fano Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we study data reconstruction attacks for discrete data and analyze it under the framework of multiple hypothesis testing.	Chuan Guo; Alexandre Sablayrolles; Maziar Sanjabi;
1353	Fast Combinatorial Algorithms for Min Max Correlation Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce fast algorithms for correlation clustering with respect to the Min Max objective that provide constant factor approximations on complete graphs.	Sami Davies; Benjamin Moseley; Heather Newman;
1354	Grounding Language Models to Images for Multimodal Inputs and Outputs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose an efficient method to ground pretrained text-only language models to the visual domain, enabling them to process arbitrarily interleaved image-and-text data, and generate text interleaved with retrieved images.	Jing Yu Koh; Ruslan Salakhutdinov; Daniel Fried;
1355	Causal Proxy Models for Concept-based Model Explanations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that robust causal explainability methods can be created using approximate counterfactuals, which can be written by humans to approximate a specific counterfactual or simply sampled using metadata-guided heuristics.	Zhengxuan Wu; Karel D’Oosterlinck; Atticus Geiger; Amir Zur; Christopher Potts;
1356	Scaling Laws for Reward Model Overoptimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we use a synthetic setup in which a fixed “gold-standard” reward model plays the role of humans, providing labels used to train a proxy reward model.	Leo Gao; John Schulman; Jacob Hilton;
1357	Escaping Saddle Points in Zeroth-order Optimization: The Power of Two-point Estimators Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we show that by adding an appropriate isotropic perturbation at each iteration, a zeroth-order algorithm based on $2m$ (for any $1 \leq m \leq d$) function evaluations per iteration can not only find $\epsilon$-second order stationary points polynomially fast, but do so using only $\tilde{O}(\frac{d}{m\epsilon^{2}\bar{\psi}})$ function evaluations, where $\bar{\psi} \geq \tilde{\Omega}(\sqrt{\epsilon})$ is a parameter capturing the extent to which the function of interest exhibits the strict saddle property.	Zhaolin Ren; Yujie Tang; Na Li;
1358	SE(3) Diffusion Model with Application to Protein Backbone Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, there exists no principled methodological framework for diffusion on SE(3), the space of orientation preserving rigid motions in R3, that operates on frames and confers the group invariance. We address these shortcomings by developing theoretical foundations of SE(3) invariant diffusion models on multiple frames followed by a novel framework, FrameDiff, for estimating the SE(3) equivariant score over multiple frames.	Jason Yim; Brian L. Trippe; Valentin De Bortoli; Emile Mathieu; Arnaud Doucet; Regina Barzilay; Tommi S. Jaakkola;
1359	FeDXL: Provable Federated Learning for Deep X-Risk Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we tackle a novel federated learning (FL) problem for optimizing a family of X-risks, to which no existing FL algorithms are applicable.	Zhishuai Guo; Rong Jin; Jiebo Luo; Tianbao Yang;
1360	On The Correctness of Automatic Differentiation for Neural Networks with Machine-Representable Parameters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the correctness of AD when the parameter space of a neural network consists solely of machine-representable numbers.	Wonyeol Lee; Sejun Park; Alex Aiken;
1361	What Can Online Reinforcement Learning with Function Approximation Benefit from General Coverage Conditions? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In online reinforcement learning (RL), instead of employing standard structural assumptions on Markov decision processes (MDPs), using a certain coverage condition (original from offline RL) is enough to ensure sample-efficient guarantees (Xie et al. 2023). In this work, we focus on this new direction by digging more possible and general coverage conditions, and study the potential and the utility of them in efficient online RL.	Fanghui Liu; Luca Viano; Volkan Cevher;
1362	Graph Mixup with Soft Alignments Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose S-Mixup, a simple yet effective mixup method for graph classification by soft alignments.	Hongyi Ling; Zhimeng Jiang; Meng Liu; Shuiwang Ji; Na Zou;
1363	Transformers Meet Directed Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose two direction- and structure-aware positional encodings for directed graphs: (1) the eigenvectors of the Magnetic Laplacian — a direction-aware generalization of the combinatorial Laplacian; (2) directional random walk encodings.	Simon Geisler; Yujia Li; Daniel J Mankowitz; Ali Taylan Cemgil; Stephan Günnemann; Cosmin Paduraru;
1364	Unveiling The Mask of Position-Information Pattern Through The Mist of Image Features Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing metrics for quantifying the strength of positional information remain unreliable and frequently lead to erroneous results. To address this issue, we propose novel metrics for measuring and visualizing the encoded positional information.	Chieh Hubert Lin; Hung-Yu Tseng; Hsin-Ying Lee; Maneesh Kumar Singh; Ming-Hsuan Yang;
1365	The Fast Johnson-Lindenstrauss Transform Is Even Faster Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we give a surprising new analysis of the Fast JL transform, showing that the $k \ln^2 n$ term in the embedding time can be improved to $(k \ln^2 n)/\alpha$ for an $\alpha = \Omega(\min\{\varepsilon^{-1}\ln(1/\varepsilon), \ln n\})$.	Ora Nova Fandina; Mikael Møller Høgsgaard; Kasper Green Larsen;
1366	Perturbation Analysis of Neural Collapse Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, with practical networks and datasets, the features typically do not reach exact collapse, e.g., because deep layers cannot arbitrarily modify intermediate features that are far from being collapsed. In this paper, we propose a richer model that can capture this phenomenon by forcing the features to stay in the vicinity of a predefined features matrix (e.g., intermediate features).	Tom Tirer; Haoxiang Huang; Jonathan Niles-Weed;
1367	Stabilizing Transformer Training By Preventing Attention Entropy Collapse Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we investigate the training dynamics of Transformers by examining the evolution of the attention layers.	Shuangfei Zhai; Tatiana Likhomanenko; Etai Littwin; Dan Busbridge; Jason Ramapuram; Yizhe Zhang; Jiatao Gu; Joshua M. Susskind;
1368	Normalizing Flows for Interventional Density Estimation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we estimate the density of potential outcomes after interventions from observational data.	Valentyn Melnychuk; Dennis Frauen; Stefan Feuerriegel;
1369	IncDSI: Incrementally Updatable Document Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose IncDSI, a method to add documents in real time (about 20-50ms per document), without retraining the model on the entire dataset (or even parts thereof).	Varsha Kishore; Chao Wan; Justin Lovelace; Yoav Artzi; Kilian Q Weinberger;
1370	Efficient Approximations of Complete Interatomic Potentials for Crystal Property Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Current methods construct graphs by establishing edges only between nearby nodes, thereby failing to faithfully capture infinite repeating patterns and distant interatomic interactions. In this work, we propose several innovations to overcome these limitations.	Yuchao Lin; Keqiang Yan; Youzhi Luo; Yi Liu; Xiaoning Qian; Shuiwang Ji;
1371	The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose $\textsf{ScaledGD($\lambda$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill-conditioned.	Xingyu Xu; Yandi Shen; Yuejie Chi; Cong Ma;
1372	Tighter Bounds on The Expressivity of Transformer Encoders Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Bhattamishra and others have shown that transformer encoders are at least as expressive as a certain kind of counter machine, while Merrill and Sabharwal have shown that fixed-precision transformer encoders recognize only languages in uniform $TC^0$. We connect and strengthen these results by identifying a variant of first-order logic with counting quantifiers that is simultaneously an upper bound for fixed-precision transformer encoders and a lower bound for transformer encoders.	David Chiang; Peter Cholak; Anand Pillay;
1373	Learning Distributions Over Quantum Measurement Outcomes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider a general problem of learning properties from quantum states: given an unknown $d$-dimensional quantum state $\rho$ and $M$ unknown quantum measurements $\mathcal{M}_1,…,\mathcal{M}_M$ with $K\geq 2$ outcomes, estimating the probability distribution for applying $\mathcal{M}_i$ on $\rho$ to within total variation distance $\epsilon$.	Weiyuan Gong; Scott Aaronson;
1374	DDGR: Continual Learning with Deep Diffusion-based Generative Replay Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In addition, most generative replay methods typically reuse the generated samples to update the generator, which causes the samples regenerated by the generator deviating from the distribution of previous tasks. To overcome these two issues, we propose a novel approach, called deep diffusion-based generative replay (DDGR), which adopts a diffusion model as the generator and calculates an instruction-operator through the classifier to instruct the generation of samples.	Rui Gao; Weiwei Liu;
1375	Efficiently Predicting High Resolution Mass Spectra with Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, current approaches to spectrum prediction model the output space in ways that force a tradeoff between capturing high resolution mass information and tractable learning. We resolve this tradeoff by casting spectrum prediction as a mapping from an input molecular graph to a probability distribution over chemical formulas.	Michael Murphy; Stefanie Jegelka; Ernest Fraenkel; Tobias Kind; David Healey; Thomas Butler;
1376	Model-Aware Contrastive Learning: Towards Escaping The Dilemmas Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We argue that the fixity of temperature is to blame for UTD. To tackle this challenge, we enrich the CL loss family by presenting a Model-Aware Contrastive Learning (MACL) strategy, whose temperature is adaptive to the magnitude of alignment that reflects the basic confidence of the instance discrimination task, then enables CL loss to adjust the penalty strength for hard negatives adaptively.	Zizheng Huang; Haoxing Chen; Ziqi Wen; Chao Zhang; Huaxiong Li; Bo Wang; Chunlin Chen;
1377	ConCerNet: A Contrastive Learning Based Framework for Automated Conservation Law Discovery and Trustworthy Dynamical System Prediction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes a new learning framework named $\textbf{ConCerNet}$ to improve the trustworthiness of the DNN based dynamics modeling to endow the invariant properties.	Wang Zhang; Tsui-Wei Weng; Subhro Das; Alexandre Megretski; Luca Daniel; Lam M. Nguyen;
1378	Coordinated Dynamic Bidding in Repeated Second-Price Auctions with Budgets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose algorithms that guarantee every client a higher utility than the best she can get under independent bidding.	Yurong Chen; Qian Wang; Zhijian Duan; Haoran Sun; Zhaohua Chen; Xiang Yan; Xiaotie Deng;
1379	Wasserstein Barycenter Matching for Graph Size Generalization of Message Passing Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the uncontrollable convergence rate caused by correlations across nodes in the underlying dimensional signal-generating space, we propose to use Wasserstein barycenters as graph-level consensus to combat node-level correlations.	Xu Chu; Yujie Jin; Xin Wang; Shanghang Zhang; Yasha Wang; Wenwu Zhu; Hong Mei;
1380	OpenFE: Automated Feature Generation with Expert-level Performance Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present OpenFE, an automated feature generation tool that provides competitive results against machine learning experts.	Tianping Zhang; Zheyu Zhang; Zhiyuan Fan; Haoyan Luo; Fengyuan Liu; Qian Liu; Wei Cao; Jian Li;
1381	Random Grid Neural Processes for Parametric Partial Differential Equations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new class of spatially stochastic physics and data informed deep latent models for parametric partial differential equations (PDEs) which operate through scalable variational neural processes.	Arnaud Vadeboncoeur; Ieva Kazlauskaite; Yanni Papandreou; Fehmi Cirak; Mark Girolami; Omer Deniz Akyildiz;
1382	PreNAS: Preferred One-Shot Learning Towards Efficient Neural Architecture Search Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we present PreNAS, a search-free NAS approach that accentuates target models in one-shot training.	Haibin Wang; Ce Ge; Hesen Chen; Xiuyu Sun;
1383	Enhancing Activity Prediction Models in Drug Discovery with The Ability to Understand Human Language Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we envision a novel type of activity prediction model that is able to adapt to new prediction tasks at inference time, via understanding textual information describing the task.	Philipp Seidl; Andreu Vall; Sepp Hochreiter; Günter Klambauer;
1384	Semi-Parametric Contextual Pricing Algorithm Using Cox Proportional Hazards Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A challenge is that customer valuation is almost never observable in practice and is instead type-I interval censored by the offered price. To address this challenge, we propose a novel semi-parametric contextual pricing algorithm for stochastic contexts, called the epoch-based Cox proportional hazards Contextual Pricing (CoxCP) algorithm.	Young-Geun Choi; Gi-Soo Kim; Yunseo Choi; Wooseong Cho; Myunghee Cho Paik; Min-hwan Oh;
1385	Dual Focal Loss for Calibration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In our work, we propose a new loss function by focusing on dual logits.	Linwei Tao; Minjing Dong; Chang Xu;
1386	On The Identifiability and Estimation of Causal Location-Scale Noise Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite the generality of the model class, we show the causal direction is identifiable up to some pathological cases. To empirically validate these theoretical findings, we propose two estimators for LSNMs: an estimator based on (non-linear) feature maps, and one based on neural networks.	Alexander Immer; Christoph Schultheiss; Julia E Vogt; Bernhard Schölkopf; Peter Bühlmann; Alexander Marx;
1387	PWSHAP: A Path-Wise Explanation Model for Targeted Variables Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Path-Wise Shapley effects (PWSHAP), a framework for assessing the targeted effect of a binary (e.g. treatment) variable from a complex outcome model.	Lucile Ter-Minassian; Oscar Clivio; Karla DiazOrdaz; Robin J. Evans; Christopher C. Holmes;
1388	ModelDiff: A Framework for Comparing Learning Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the problem of (learning) algorithm comparison, where the goal is to find differences between models trained with two different learning algorithms.	Harshay Shah; Sung Min Park; Andrew Ilyas; Aleksander Madry;
1389	Shedding A PAC-Bayesian Light on Adaptive Sliced-Wasserstein Distances Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Yet, the literature on its statistical properties — or, more accurately, its generalization properties — with respect to the distribution of slices, beyond the uniform measure, is scarce. To bring new contributions to this line of research, we leverage the PAC-Bayesian theory and a central observation that SW may be interpreted as an average risk, the quantity PAC-Bayesian bounds have been designed to characterize.	Ruben Ohana; Kimia Nadjahi; Alain Rakotomamonjy; Liva Ralaivola;
1390	GAT: Guided Adversarial Training with Pareto-optimal Auxiliary Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To mitigate the costs, we propose Guided Adversarial Training * (GAT), a novel adversarial training technique that exploits auxiliary tasks under a limited set of training data.*	Salah GHAMIZI; Jingfeng Zhang; Maxime Cordy; Mike Papadakis; Masashi Sugiyama; YVES LE TRAON;
1391	Cold Analysis of Rao-Blackwellized Straight-Through Gumbel-Softmax Gradient Estimator Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The state-of-the art in this family, the Gumbel-Rao estimator uses an extra internal sampling to reduce the variance, which may be costly. We analyze this estimator and show that it possesses a zero temperature limit with a surprisingly simple closed form.	Alexander Shekhovtsov;
1392	End-to-End Full-Atom Antibody Design Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: There are two major defects in current learning-based methods: 1) tackling only a certain subtask of the whole antibody design pipeline, making them suboptimal or resource-intensive. 2) omitting either the framework regions or side chains, thus incapable of capturing the full-atom geometry. To address these pitfalls, we propose dynamic Multi-channel Equivariant grAph Network (dyMEAN), an end-to-end full-atom model for E(3)-equivariant antibody design given the epitope and the incomplete sequence of the antibody.	Xiangzhe Kong; Wenbing Huang; Yang Liu;
1393	Efficient Personalized Federated Learning Via Sparse Model-Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: The capacity and efficiency of personalized models are restricted by the lowest-resource clients, leading to sub-optimal performance and limited practicality of personalized FL. To overcome these challenges, we propose a novel approach named pFedGate for efficient personalized FL by adaptively and efficiently learning sparse local models.	Daoyuan Chen; Liuyi Yao; Dawei Gao; Bolin Ding; Yaliang Li;
1394	Disentangled Multiplex Graph Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we argue that it is essential for conducting effective and robust UMGRL to extract complete and clean common information, as well as more-complementarity and less-noise private information.	Yujie Mo; Yajie Lei; Jialie Shen; Xiaoshuang Shi; Heng Tao Shen; Xiaofeng Zhu;
1395	Byzantine-Robust Learning on Heterogeneous Data Via Gradient Splitting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we first reveal the root causes of performance degradation of current robust AGRs in non-IID settings: the curse of dimensionality and gradient heterogeneity. In order to address this issue, we propose GAS, a GrAdient Splitting approach that can successfully adapt existing robust AGRs to non-IID settings.	Yuchen Liu; Chen Chen; Lingjuan Lyu; Fangzhao Wu; Sai Wu; Gang Chen;
1396	Personalized Federated Learning with Inferred Collaboration Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address this issue, our core idea is to learn a collaboration graph, which models the benefits from each pairwise collaboration and allocates appropriate collaboration strengths. Based on this, we propose a novel personalized FL algorithm, pFedGraph, which consists of two key modules: (1) inferring the collaboration graph based on pairwise model similarity and dataset size at server to promote fine-grained collaboration and (2) optimizing local model with the assistance of aggregated model at client to promote personalization.	Rui Ye; Zhenyang Ni; Fangzhao Wu; Siheng Chen; Yanfeng Wang;
1397	Progressive Purification for Instance-Dependent Partial Label Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, a theoretically grounded and practically effective approach named POP, i.e. PrOgressive Purification for instance-dependent partial label learning, is proposed.	Ning Xu; Biao Liu; Jiaqi Lv; Congyu Qiao; Xin Geng;
1398	SNeRL: Semantic-aware Neural Radiance Fields for Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present Semantic-aware Neural Radiance Fields for Reinforcement Learning (SNeRL), which jointly optimizes semantic-aware neural radiance fields (NeRF) with a convolutional encoder to learn 3D-aware neural implicit representation from multi-view images.	Dongseok Shim; Seungjae Lee; H. Jin Kim;
1399	Constrained Efficient Global Optimization of Expensive Black-box Functions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose CONFIG (CONstrained efFIcient Global Optimization), a simple and effective algorithm to solve it.	Wenjie Xu; Yuning Jiang; Bratislav Svetozarevic; Colin Jones;
1400	How Powerful Are Shallow Neural Networks with Bandlimited Random Weights? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the well-known fact that a neural network is a universal approximator, in this study, we mathematically show that when hidden parameters are distributed in a bounded domain, the network may not achieve zero approximation error.	Ming Li; Sho Sonoda; Feilong Cao; Yu Guang Wang; Jiye Liang;
1401	FusionRetro: Molecule Representation Fusion Via In-Context Learning for Retrosynthetic Planning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a novel framework that utilizes context information for improved retrosynthetic planning.	Songtao Liu; Zhengkai Tu; Minkai Xu; Zuobai Zhang; Lu Lin; Zhitao Ying; Jian Tang; Peilin Zhao; Dinghao Wu;
1402	FedBR: Improving Federated Learning on Heterogeneous Data Via Local Learning Bias Reduction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, such a scheme is currently constrained by slow and unstable convergence due to the variety of data on different clients’ devices. In this work, we identify three under-explored phenomena of biased local learning that may explain these challenges caused by local updates in supervised FL.	Yongxin Guo; Xiaoying Tang; Tao Lin;
1403	The Implicit Regularization of Dynamical Stability in Stochastic Gradient Descent Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we study the implicit regularization of stochastic gradient descent (SGD) through the lens of dynamical stability* (Wu et al., 2018).*	Lei Wu; Weijie J Su;
1404	Conformal Inference Is (almost) Free for Neural Networks Trained with Early Stopping Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Models trained with early stopping often provide relatively accurate predictions, but they generally still lack precise statistical guarantees unless they are further calibrated using independent hold-out data. This paper addresses the above limitation with conformalized early stopping: a novel method that combines early stopping with conformal calibration while efficiently recycling the same hold-out data.	Ziyi Liang; Yanfei Zhou; Matteo Sesia;
1405	Fed-CBS: A Heterogeneity-Aware Client Sampling Mechanism for Federated Learning Via Class-Imbalance Reduction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Compared with engaging all the available clients, such a random-selection mechanism could lead to significant performance degradation on non-IID (independent and identically distributed) data. In this paper, we present our key observation that the essential reason resulting in such performance degradation is the class-imbalance of the grouped data from randomly selected clients.	Jianyi Zhang; Ang Li; Minxue Tang; Jingwei Sun; Xiang Chen; Fan Zhang; Changyou Chen; Yiran Chen; Hai Li;
1406	Orthogonality-Enforced Latent Space in Autoencoders: An Approach to Learning Disentangled Representations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Noting the importance of factorizing (or disentangling) the latent space, we propose a novel, non-probabilistic disentangling framework for autoencoders, based on the principles of symmetry transformations that are independent of one another.	Jaehoon Cha; Jeyan Thiyagalingam;
1407	Building Neural Networks on Matrix Manifolds: A Gyrovector Space Approach Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, due to the lack of many concepts in gyrovector spaces for the considered manifolds, e.g., the inner product and gyroangles, techniques and mathematical tools provided by these works are still limited compared to those developed for studying hyperbolic geometry. In this paper, we generalize some notions in gyrovector spaces for SPD and Grassmann manifolds, and propose new models and layers for building neural networks on these manifolds.	Xuan Son Nguyen; Shuo Yang;
1408	When and How Does Known Class Help Discover Unknown Ones? Provable Understanding Through Spectral Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Tailored to the NCD problem, we introduce a graph-theoretic representation that can be learned by a novel NCD Spectral Contrastive Loss (NSCL).	Yiyou Sun; Zhenmei Shi; Yingyu Liang; Yixuan Li;
1409	Federated Adversarial Learning: A Framework with Convergence Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we formulate a general form of federated adversarial learning (FAL) that is adapted from adversarial learning in the centralized setting.	Xiaoxiao Li; Zhao Song; Jiaming Yang;
1410	Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a computationally efficient policy optimization algorithm for the challenging general setting of unknown dynamics and bandit feedback, featuring a combination of mirror-descent and least squares policy evaluation in an auxiliary MDP used to compute exploration bonuses.	Uri Sherman; Tomer Koren; Yishay Mansour;
1411	Bayesian Online Change Point Detection with Hilbert Space Approximate Student-t Process Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a variant of Bayesian online change point detection with a reducedrank Student-t process (TP) and dependent Student-t noise, as a nonparametric time series model.	Jeremy Sellier; Petros Dellaportas;
1412	Graph Positional Encoding Via Random Feature Propagation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Surprisingly, however, there is still no clear understanding of the relation between these two augmentation schemes. Here we propose a novel family of positional encoding schemes which draws a link between the above two approaches and improves over both.	Moshe Eliasof; Fabrizio Frasca; Beatrice Bevilacqua; Eran Treister; Gal Chechik; Haggai Maron;
1413	Averaged Method of Multipliers for Bi-Level Optimization Without Lower-Level Strong Convexity Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, by averaging the upper and lower level objectives, we propose a single loop Bi-level Averaged Method of Multipliers (sl-BAMM) for BLO that is simple yet efficient for large-scale BLO and gets rid of the limited LLSC restriction.	Risheng Liu; Yaohua Liu; Wei Yao; Shangzhi Zeng; Jin Zhang;
1414	Adaptive Computation with Elastic Input Sequence Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a new approach called AdaTape, which allows for dynamic computation in neural networks through adaptive tape tokens.	Fuzhao Xue; Valerii Likhosherstov; Anurag Arnab; Neil Houlsby; Mostafa Dehghani; Yang You;
1415	Policy Contrastive Imitation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since the AIL discriminator is trained via binary classification that does not necessarily discriminate the policy from the expert in a meaningful way, the resulting reward might not be meaningful either. We propose a new method called Policy Contrastive Imitation Learning (PCIL) to resolve this issue.	Jialei Huang; Zhao-Heng Yin; Yingdong Hu; Yang Gao;
1416	CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose Comprehensive Attention Benchmark (CAB) under a fine-grained attention taxonomy with four distinguishable attention patterns, namely, noncausal self, causal self, noncausal cross, and causal cross attentions.	Jun Zhang; Shuyang Jiang; Jiangtao Feng; Lin Zheng; Lingpeng Kong;
1417	Robust One-Class Classification with Signed Distance Function Using 1-Lipschitz Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a new method, dubbed One Class Signed Distance Function (OCSDF), to perform One Class Classification (OCC) by provably learning the Signed Distance Function (SDF) to the boundary of the support of any distribution.	Louis Béthune; Paul Novello; Guillaume Coiffier; Thibaut Boissin; Mathieu Serrurier; Quentin VINCENOT; Andres Troya-Galvis;
1418	Set-membership Belief State-based Reinforcement Learning for POMDPs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a novel algorithm called Set-membership Belief state-based Reinforcement Learning (SBRL), which consists of two parts: a Set-membership Belief state learning Model (SBM) for learning bounded belief state sets and an RL controller for making decisions based on SBM.	Wei Wei; Lijun Zhang; Lin Li; Huizhong Song; Jiye Liang;
1419	Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we consider contextual bandits with general function approximation and propose a computationally efficient algorithm to achieve a regret of $\tilde{\mathcal O}(\sqrt{T}+\zeta)$.	Chenlu Ye; Wei Xiong; Quanquan Gu; Tong Zhang;
1420	Accuracy on The Curve: On The Nonlinear Correlation of ML Performance Between Data Subpopulations Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Through rigorous experimentation and analysis across a variety of datasets, models, and training epochs, we demonstrate that OOD performance often has a nonlinear correlation with ID performance in subpopulation shifts.	Weixin Liang; Yining Mao; Yongchan Kwon; Xinyu Yang; James Zou;
1421	Robust Satisficing MDPs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes robust satisficing MDPs (RSMDPs), where the expected returns of feasible policies are softly-constrained to achieve a user-specified target under ambiguity.	Haolin Ruan; Siyu Zhou; Zhi Chen; Chin Pang Ho;
1422	Off-Policy Evaluation for Large Action Spaces Via Conjunct Effect Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study off-policy evaluation (OPE) of contextual bandit policies for large discrete action spaces where conventional importance-weighting approaches suffer from excessive variance. To circumvent this variance issue, we propose a new estimator, called OffCEM, that is based on the conjunct effect model (CEM), a novel decomposition of the causal effect into a cluster effect and a residual effect.	Yuta Saito; Qingyang Ren; Thorsten Joachims;
1423	Beyond Homophily: Reconstructing Structure for Graph-agnostic Clustering Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Hence, clustering on real-world graph with various levels of homophily poses a new challenge to the graph research community. To fill this gap, we propose a novel graph clustering method, which contains three key components: graph reconstruction, a mixed filter, and dual graph clustering network.	Erlin Pan; zhao kang;
1424	Poisoning Language Models During Instruction Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Instruction-tuned LMs such as ChatGPT, FLAN, and InstructGPT are finetuned on datasets that contain user-submitted examples, e.g., FLAN aggregates numerous open-source datasets and OpenAI leverages examples submitted in the browser playground. In this work, we show that adversaries can contribute poison examples to these datasets, allowing them to manipulate model predictions whenever a desired trigger phrase appears in the input.	Alexander Wan; Eric Wallace; Sheng Shen; Dan Klein;
1425	Learning to Optimize Differentiable Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing algorithms for solving games suffer from empirical instability, hence demanding heavy ad-hoc tuning in practice. To tackle these challenges, we resort to the emerging scheme of Learning to Optimize (L2O), which discovers problem-specific efficient optimization algorithms through data-driven training.	Xuxi Chen; Nelson Vadori; Tianlong Chen; Zhangyang Wang;
1426	Decentralized SGD and Average-direction SAM Are Asymptotically Equivalent Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we challenge the conventional belief and present a completely new perspective for understanding decentralized learning.	Tongtian Zhu; Fengxiang He; Kaixuan Chen; Mingli Song; Dacheng Tao;
1427	A Closer Look at The Intervention Procedure of Concept Bottleneck Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While such intervenability provides a powerful avenue of control, many aspects of the intervention procedure remain rather unexplored. In this work, we develop various ways of selecting intervening concepts to improve the intervention effectiveness and conduct an array of in-depth analyses as to how they evolve under different circumstances.	Sungbin Shin; Yohan Jo; Sungsoo Ahn; Namhoon Lee;
1428	Regression with Label Permutation in Generalized Linear Model Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present a relatively complete analysis of label permutation problem for the generalized linear model with multivariate responses.	Guanhua Fang; Ping Li;
1429	A Coupled Flow Approach to Imitation Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we investigate applications of a normalizing flow based model for the aforementioned distributions.	Gideon Joseph Freund; Elad Sarafian; Sarit Kraus;
1430	Surrogate Model Extension (SME): A Fast and Accurate Weight Update Attack on Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a principled way to extend gradient inversion attacks to weight updates in FL, thereby better exposing weaknesses in the presumed privacy protection inherent in FL.	Junyi Zhu; Ruicong Yao; Matthew B. Blaschko;
1431	Reliable Measures of Spread in High Dimensional Latent Spaces Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate that the commonly used measures of data spread, average cosine similarity and a partition function min/max ratio I(V), do not provide reliable metrics to compare the use of latent space across data distributions. We propose and examine six alternative measures of data spread, all of which improve over these current metrics when applied to seven synthetic data distributions.	Anna Marbut; Katy McKinney-Bock; Travis J Wheeler;
1432	Adaptive Annealed Importance Sampling with Constant Rate Progress Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we prove that the geometric annealing corresponds to the distribution path that minimizes the KL divergence between the current particle distribution and the desired target when the feasible change in the particle distribution is constrained.	Shirin Goshtasbpour; Victor Cohen; Fernando Perez-Cruz;
1433	Large Language Models Struggle to Learn Long-Tail Knowledge Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the relationship between the knowledge memorized by large language models and the information in pre-training datasets scraped from the web.	Nikhil Kandpal; Haikang Deng; Adam Roberts; Eric Wallace; Colin Raffel;
1434	Neural Diffusion Processes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Neural network approaches for meta-learning distributions over functions have desirable properties such as increased flexibility and a reduced complexity of inference. Building on the successes of denoising diffusion models for generative modelling, we propose Neural Diffusion Processes (NDPs), a novel approach that learns to sample from a rich distribution over functions through its finite marginals.	Vincent Dutordoir; Alan Saul; Zoubin Ghahramani; Fergus Simpson;
1435	Diversity-enhancing Generative Network for Few-shot Hypothesis Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a diversity-enhancing generative network (DEG-Net) for the FHA problem, which can generate diverse unlabeled data with the help of a kernel independence measure: the Hilbert-Schmidt independence criterion (HSIC).	Ruijiang Dong; Feng Liu; Haoang Chi; Tongliang Liu; Mingming Gong; Gang Niu; Masashi Sugiyama; Bo Han;
1436	Target-Aware Generative Augmentations for Single-Shot Adaptation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we address the problem of adapting models from a source domain to a target domain, a task that has become increasingly important due to the brittle generalization of deep neural networks.	Kowshik Thopalli; Rakshith Subramanyam; Pavan K. Turaga; Jayaraman J. Thiagarajan;
1437	Constant Matters: Fine-grained Error Bound on Differentially Private Continual Observation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Our main insight is that the matrix mechanism when using lower-triangular matrices can be used in the continual observation model.	Hendrik Fichtenberger; Monika Henzinger; Jalaj Upadhyay;
1438	PixelAsParam: A Gradient View on Diffusion Sampling with Guidance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, the current guidance on denoising processes suffers from the trade-off between diversity, image quality, and conditional information. In this work, we propose to view this guidance sampling process from a gradient view, where image pixels are treated as parameters being optimized, and each mathematical term in the sampling process represents one update direction.	Anh-Dung Dinh; Daochang Liu; Chang Xu;
1439	Rigid Body Flows for Sampling Molecular Crystal Structures Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a new type of normalizing flow that is tailored for modeling positions and orientations of multiple objects in three-dimensional space, such as molecules in a crystal.	Jonas Köhler; Michele Invernizzi; Pim De Haan; Frank Noe;
1440	On The Power of Foundation Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This question cannot be answered by the existing theory of representation, optimization or generalization, because the issues they mainly investigate are assumed to be nonexistent here. In this paper, we show that category theory provides powerful machinery to answer this question.	Yang Yuan;
1441	Detecting Adversarial Data By Probing Multiple Perturbations Using Expected Perturbation Score Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a new statistic called expected perturbation score (EPS), which is essentially the expected score of a sample after various perturbations.	Shuhai Zhang; Feng Liu; Jiahao Yang; Yifan Yang; Changsheng Li; Bo Han; Mingkui Tan;
1442	CircuitNet: A Generic Neural Network to Realize Universal Circuit Motif Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a new type of neural network inspired by the architectures of neuronal circuits, namely Circuit Neural Network (CircuitNet).	Yansen Wang; XINYANG JIANG; Kan Ren; Caihua Shan; Xufang Luo; Dongqi Han; Kaitao Song; Yifei Shen; Dongsheng Li;
1443	Controlled Differential Equations on Long Sequences Via Non-standard Wavelets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For tasks where it is sensible to assume that the (long) sequences in the training data are a fixed length of temporal measurements — this assumption holds in most experiments tackled in the literature — we describe an efficient simplification.	Sourav Pal; Zhanpeng Zeng; Sathya N. Ravi; Vikas Singh;
1444	InfoOT: Information Maximizing Optimal Transport Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Yet, it ignores coherence structure in the data such as clusters, does not handle outliers well, and cannot integrate new data points. To address these drawbacks, we propose InfoOT, an information-theoretic extension of optimal transport that maximizes the mutual information between domains while minimizing geometric distances.	Ching-Yao Chuang; Stefanie Jegelka; David Alvarez-Melis;
1445	On The Connection Between MPNN and Graph Transformer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we study the inverse connection and show that MPNN with virtual node (VN), a commonly used heuristic with little theoretical understanding, is powerful enough to arbitrarily approximate the self-attention layer of GT.	Chen Cai; Truong Son Hy; Rose Yu; Yusu Wang;
1446	Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper provides a fine-grained analysis of the dynamics of GD for the matrix sensing problem, whose goal is to recover a low-rank ground-truth matrix from near-isotropic linear measurements.	Jikai Jin; Zhiyuan Li; Kaifeng Lyu; Simon Shaolei Du; Jason D. Lee;
1447	Interval Bound Interpolation for Few-shot Learning with Few Tasks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we introduce the notion of interval bounds from the provably robust training literature to few-shot learning.	Shounak Datta; Sankha Subhra Mullick; Anish Chakrabarty; Swagatam Das;
1448	NTK-approximating MLP Fusion for Efficient Language Model Fine-tuning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate the neural tangent kernel (NTK)–which reveals the gradient descent dynamics of neural networks–of the multilayer perceptrons (MLP) modules in a PLM and propose to coin a lightweight PLM through NTK-approximating MLP fusion.	Tianxin Wei; Zeming Guo; Yifan Chen; Jingrui He;
1449	Are Equivariant Equilibrium Approximators Beneficial? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we theoretically characterize the benefits and limitations of equivariant equilibrium approximators.	Zhijian Duan; Yunxuan Ma; Xiaotie Deng;
1450	Everyone’s Preference Changes Differently: A Weighted Multi-Interest Model For Retrieval Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose the Multi-Interest Preference (MIP) model, an approach that not only produces multi-interest for users by using the user’s sequential engagement more effectively but also automatically learns a set of weights to represent the preference over each embedding so that the candidates can be retrieved from each interest proportionally.	Hui Shi; Yupeng Gu; Yitong Zhou; Bo Zhao; Sicun Gao; Jishen Zhao;
1451	ChiPFormer: Transferable Chip Placement Via Offline Decision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such an RL-based approach suffers from long training time and low transfer ability in unseen chip circuits. To resolve these challenges, we cast the chip placement as an offline RL formulation and present ChiPFormer that enables learning a transferable placement policy from fixed offline data.	Yao Lai; Jinxin Liu; Zhentao Tang; Bin Wang; Jianye HAO; Ping Luo;
1452	MetricGAN-OKD: Multi-Metric Optimization of MetricGAN Via Online Knowledge Distillation for Speech Enhancement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose an effective multi-metric optimization method in MetricGAN via online knowledge distillation—MetricGAN-OKD.	Wooseok Shin; Byung Hoon Lee; Jin Sob Kim; Hyun Joon Park; Sung Won Han;
1453	Learning to Maximize Mutual Information for Dynamic Feature Selection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information.	Ian Connick Covert; Wei Qiu; MingYu Lu; Na Yoon Kim; Nathan J White; Su-In Lee;
1454	Complementary Attention for Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Complementary Attention for Multi-Agent reinforcement learning (CAMA), which applies a divide-and-conquer strategy on input entities accompanied with the complementary attention of enhancement and replenishment.	Jianzhun Shao; Hongchang Zhang; Yun Qu; Chang Liu; Shuncheng He; Yuhang Jiang; Xiangyang Ji;
1455	2D-Shapley: A Framework for Fragmented Data Valuation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We start by presenting a method to calculate the counterfactual of removing a fragment from the aggregated data matrix. Based on the counterfactual calculation, we further propose 2D-Shapley, a theoretical framework for fragmented data valuation that uniquely satisfies some appealing axioms in the fragmented data context. 2D-Shapley empowers a range of new use cases, such as selecting useful data fragments, providing interpretation for sample-wise data values, and fine-grained data issue diagnosis.	Liu Zhihong; Hoang Anh Just; Xiangyu Chang; Xi Chen; Ruoxi Jia;
1456	Explaining The Effects of Non-convergent MCMC in The Training of Energy-Based Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we quantify the impact of using non-convergent Markov chains to train Energy-Based models (EBMs).	Elisabeth Agoritsas; Giovanni Catania; Aurélien Decelle; Beatriz Seoane;
1457	Tight Certification of Adversarially Trained Neural Networks Via Nonconvex Low-Rank Semidefinite Relaxations Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a nonconvex certification technique, based on a low-rank restriction of a semidefinite programming (SDP) relaxation.	Hong-Ming Chiu; Richard Y. Zhang;
1458	Interpolation for Robust Learning: Data Augmentation on Wasserstein Geodesics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose to study and promote the robustness of a model as per its performance on a continuous geodesic interpolation of subpopulations, e.g., a class of samples in a classification problem.	Jiacheng Zhu; Jielin Qiu; Aritra Guha; Zhuolin Yang; XuanLong Nguyen; Bo Li; Ding Zhao;
1459	Generative Causal Representation Learning for Out-of-Distribution Motion Forecasting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose Generative Causal Representation Learning (GCRL) which leverages causality to facilitate knowledge transfer under distribution shifts.	Shayan Shirahmad Gale Bagi; Zahra Gharaee; Oliver Schulte; Mark Crowley;
1460	Drug Discovery Under Covariate Shift with Domain-Informed Prior Distributions Over Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, real-world drug discovery tasks are often characterized by a scarcity of labeled data and significant covariate shift—a setting that poses a challenge to standard deep learning methods. In this paper, we present Q-SAVI, a probabilistic model able to address these challenges by encoding explicit prior knowledge of the data-generating process into a prior distribution over functions, presenting researchers with a transparent and probabilistically principled way to encode data-driven modeling preferences.	Leo Klarner; Tim G. J. Rudner; Michael Reutlinger; Torsten Schindler; Garrett M Morris; Charlotte Deane; Yee Whye Teh;
1461	Context-Aware Bayesian Network Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce a Bayesian network to inaugurate correlations between agents’ action selections in their joint policy.	Dingyang Chen; Qi Zhang;
1462	Sketching for First Order Method: Efficient Algorithm for Low-Bandwidth Channel and Vulnerability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: It enables runtime and memory saving via randomly compressing the original large problem into lower dimensions. In this paper, we propose a novel sketching scheme for the first order method in large-scale distributed learning setting, such that the communication costs between distributed agents are saved while the convergence of the algorithms is still guaranteed.	Zhao Song; Yitan Wang; Zheng Yu; Lichen Zhang;
1463	DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce DS-1000, a code generation benchmark with a thousand data science problems spanning seven Python libraries, such as Numpy and Pandas.We release our benchmark at https://ds1000-code-gen.github.io.	Yuhang Lai; Chengxi Li; Yiming Wang; Tianyi Zhang; Ruiqi Zhong; Luke Zettlemoyer; Wen-tau Yih; Daniel Fried; Sida Wang; Tao Yu;
1464	Optimal Goal-Reaching Reinforcement Learning Via Quasimetric Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper introduces Quasimetric Reinforcement Learning (QRL), a new RL method that utilizes quasimetric models to learn optimal value functions.	Tongzhou Wang; Antonio Torralba; Phillip Isola; Amy Zhang;
1465	MEWL: Few-shot Multimodal Word Learning with Referential Uncertainty Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite recent advancements in multimodal learning, a systematic and rigorous evaluation is still missing for human-like word learning in machines. To fill in this gap, we introduce the MachinE Word Learning (MEWL) benchmark to assess how machines learn word meaning in grounded visual scenes.	Guangyuan Jiang; Manjie Xu; Shiji Xin; Wei Liang; Yujia Peng; Chi Zhang; Yixin Zhu;
1466	A Kernel Stein Test of Goodness of Fit for Sequential Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a goodness-of-fit measure for probability densities modeling observations with varying dimensionality, such as text documents of differing lengths or variable-length sequences.	Jerome Baum; Heishiro Kanagawa; Arthur Gretton;
1467	Learning Functional Distributions with Private Labels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the problem of learning functional distributions in the presence of noise.	Changlong Wu; Yifan Wang; Ananth Grama; Wojciech Szpankowski;
1468	I$^2$SB: Image-to-Image Schrödinger Bridge Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose Image-to-Image Schrödinger Bridge (I$^2$SB), a new class of conditional diffusion models that directly learn the nonlinear diffusion processes between two given distributions.	Guan-Horng Liu; Arash Vahdat; De-An Huang; Evangelos Theodorou; Weili Nie; Anima Anandkumar;
1469	LookupFFN: Making Transformers Compute-lite for CPU Inference Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Specifically, we propose an alternative formulation (we call it LookupFFN) to GEMM based FFNs inspired by the recent studies of using Locality Sensitive Hashing (LSH) to approximate FFNs.	Zhanpeng Zeng; Michael Davies; Pranav Pulijala; Karthikeyan Sankaralingam; Vikas Singh;
1470	Learning to Learn from APIs: Black-Box Data-Free Meta-Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Existing DFML work can only meta-learn from (i) white-box and (ii) small-scale pre-trained models (iii) with the same architecture, neglecting the more practical setting where the users only have inference access to the APIs with arbitrary model architectures and model scale inside. To solve this issue, we propose a Bi-level Data-free Meta Knowledge Distillation (BiDf-MKD) framework to transfer more general meta knowledge from a collection of black-box APIs to one single meta model.	Zixuan Hu; Li Shen; Zhenyi Wang; Baoyuan Wu; Chun Yuan; Dacheng Tao;
1471	Continual Vision-Language Representation Learning with Off-Diagonal Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To alleviate SD, we propose a new continual vision-language representation learning framework Mod-X: Maintain off-diagonal information-matriX.	Zixuan Ni; Longhui Wei; Siliang Tang; Yueting Zhuang; Qi Tian;
1472	On The Convergence of The MLE As An Estimator of The Learning Rate in The Exp3 Algorithm Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our objective in this work is to show that the estimation of the learning rate cannot be efficient if the learning rate is constant in the classical Exp3 (Exponential weights for Exploration and Exploitation) algorithm.	Julien Aubert; Luc Lehéricy; Patricia Reynaud-Bouret;
1473	DugMatting: Decomposed-Uncertainty-Guided Matting Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a decomposed-uncertainty-guided matting (dugMatting) algorithm, which explores the explicitly decomposed uncertainties to efficiently and effectively improve the results.	Jiawei Wu; Changqing Zhang; Zuoyong Li; Huazhu Fu; Xi Peng; Joey Tianyi Zhou;
1474	Differentially Private Stochastic Convex Optimization Under A Quantile Loss Function Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study $(\varepsilon,\delta)$-differentially private (DP) stochastic convex optimization under an $r$-th quantile loss function taking the form $c(u) = ru^+ + (1-r)(-u)^+$.	Du Chen; Geoffrey A. Chua;
1475	Revisiting Discriminative Vs. Generative Classifiers: Theory and Implications Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To establish it, we present a multiclass $\mathcal{H}$-consistency bound framework and an explicit bound for logistic loss, which are of independent interests.	Chenyu Zheng; Guoqiang Wu; Fan Bao; Yue Cao; Chongxuan Li; Jun Zhu;
1476	Quantum Policy Gradient Algorithm with Optimized Action Decoding Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Focusing on applications in quantum reinforcement learning, we propose an action decoding procedure for a quantum policy gradient approach.	Nico Meyer; Daniel D. Scherer; Axel Plinge; Christopher Mutschler; Michael J. Hartmann;
1477	Self-Attention Amortized Distributional Projection Optimization for Sliced Wasserstein Point-Cloud Reconstruction Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Despite being efficient, Max-SW and its amortized version cannot guarantee metricity property due to the sub-optimality of the projected gradient ascent and the amortization gap. Therefore, we propose to replace Max-SW with distributional sliced Wasserstein distance with von Mises-Fisher (vMF) projecting distribution (v-DSW).	Khai Nguyen; Dang Nguyen; Nhat Ho;
1478	Learning Affinity with Hyperbolic Representation for Spatial Propagation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we demonstrate that the properties of hyperbolic geometry serve as a valuable alternative to learning hierarchical affinity for spatial propagation tasks.	Jin-Hwi Park; Jaesung Choe; Inhwan Bae; Hae-Gon Jeon;
1479	Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Its application to audio still lags behind for two main reasons: the lack of large-scale datasets with high-quality text-audio pairs, and the complexity of modeling long continuous audio data. In this work, we propose Make-An-Audio with a prompt-enhanced diffusion model that addresses these gaps by 1) introducing pseudo prompt enhancement with a distill-then-reprogram approach, it alleviates data scarcity with orders of magnitude concept compositions by using language-free audios; 2) leveraging spectrogram autoencoder to predict the self-supervised audio representation instead of waveforms.	Rongjie Huang; Jiawei Huang; Dongchao Yang; Yi Ren; Luping Liu; Mingze Li; Zhenhui Ye; Jinglin Liu; Xiang Yin; Zhou Zhao;
1480	Synthetic Data for Model Selection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In contrast to using synthetic data for training, in this work we explore whether synthetic data can be beneficial for model selection.	Alon Shoshan; Nadav Bhonker; Igor Kviatkovsky; Matan Fintz; Gerard Medioni;
1481	Learning Subpocket Prototypes for Generalizable Structure-based Drug Design Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel method DrugGPS for generalizable structure-based drug design.	ZAIXI ZHANG; Qi Liu;
1482	Quantum Lower Bounds for Finding Stationary Points of Nonconvex Functions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we conduct a systematic study of quantum lower bounds on finding $\epsilon$-approximate stationary points of nonconvex functions, and we consider the following two important settings: 1) having access to $p$-th order derivatives; or 2) having access to stochastic gradients.	Chenyi Zhang; Tongyang Li;
1483	Graph Neural Networks Can Recover The Hidden Features Solely from The Graph Structure Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we investigate whether GNNs can exploit the graph structure from the perspective of the expressive power of GNNs.	Ryoma Sato;
1484	For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods Are Created Equal Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge this gap in understanding, we conduct a comprehensive study on 14 pre-trained vision models using 3 distinct classes of policy learning methods, including reinforcement learning (RL), imitation learning through behavior cloning (BC), and imitation learning with a visual reward function (VRF).To facilitate more universal evaluations of pre-trained models and their policy learning methods in the future, we also release a benchmark of 21 tasks across 3 different environments alongside our work.	Yingdong Hu; Renhao Wang; Li Erran Li; Yang Gao;
1485	SurProGenes: Survival Risk-Ordered Representation of Cancer Patients and Genes for The Identification of Prognostic Genes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Furthermore, most cancer genomics studies lack appropriate low-risk groups against which to compare. To address these issues, we present a framework that identifies candidate prognostic genes by integrating representation learning and statistical analysis approaches.	Junetae Kim; Kyoungsuk Park; Hanseok Jeong; Youngwook KIM; Jeongseon Kim; Sun-Young Kim;
1486	Better Training of GFlowNets with Local Credit and Incomplete Trajectories Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we consider the case where the energy function can be applied not just to terminal states but also to intermediate states.	Ling Pan; Nikolay Malkin; Dinghuai Zhang; Yoshua Bengio;
1487	Understanding Int4 Quantization for Language Models: Latency Speedup, Composability, and Failure Cases Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we explore the feasibility of employing INT4 weight and activation (W4A4) quantization for language models.	Xiaoxia Wu; Cheng Li; Reza Yazdani Aminabadi; Zhewei Yao; Yuxiong He;
1488	Defects of Convolutional Decoder Networks in Frequency Representation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we prove the representation defects of a cascaded convolutional decoder network, considering the capacity of representing different frequency components of an input sample.	Ling Tang; Wen Shen; Zhanpeng Zhou; YueFeng Chen; Quanshi Zhang;
1489	Exploring Chemical Space with Score-based Out-of-distribution Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose Molecular Out-Of-distribution Diffusion(MOOD), a score-based diffusion scheme that incorporates out-of-distribution (OOD) control in the generative stochastic differential equation (SDE) with simple control of a hyperparameter, thus requires no additional costs.	Seul Lee; Jaehyeong Jo; Sung Ju Hwang;
1490	Concept-based Explanations for Out-of-Distribution Detectors Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We first propose two new metrics for assessing the effectiveness of a particular set of concepts for explaining OOD detectors: 1) detection completeness, which quantifies the sufficiency of concepts for explaining an OOD-detector’s decisions, and 2) concept separability, which captures the distributional separation between in-distribution and OOD data in the concept space. Based on these metrics, we propose an unsupervised framework for learning a set of concepts that satisfy the desired properties of high detection completeness and concept separability, and demonstrate its effectiveness in providing concept-based explanations for diverse off-the-shelf OOD detectors.	Jihye Choi; Jayaram Raghuram; Ryan Feng; Jiefeng Chen; Somesh Jha; Atul Prakash;
1491	Optimal Shrinkage for Distributed Second-Order Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we address the problem of Hessian inversion bias in distributed second-order optimization algorithms.	Fangzhao Zhang; Mert Pilanci;
1492	Not All Strongly Rayleigh Distributions Have Small Probabilistic Generating Circuits Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: They raised the question whether every strongly Rayleigh distribution can be efficiently represented by such circuits. We prove that this question has a negative answer, there are strongly Rayleigh distributions that cannot be represented by polynomial-sized probabilistic generating circuits, assuming a widely accepted complexity theoretic conjecture.	Markus Bläser;
1493	Simplifying Momentum-based Positive-definite Submanifold Optimization with Applications to Deep Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we simplify such difficulties for a class of structured symmetric positive-definite matrices with the affine-invariant metric.	Wu Lin; Valentin Duruisseaux; Melvin Leok; Frank Nielsen; Mohammad Emtiyaz Khan; Mark Schmidt;
1494	Nugget: Neural Agglomerative Embeddings of Text Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a solution called Nugget, which encodes language into a representation based on a dynamically selected subset of input tokens.	Guanghui Qin; Benjamin Van Durme;
1495	Automatically Marginalized MCMC in Probabilistic Programming Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose to use automatic marginalization as part of the sampling process using HMC in a graphical model extracted from a PPL, which substantially improves sampling from real-world hierarchical models.	Jinlin Lai; Javier Burroni; Hui Guan; Daniel Sheldon;
1496	Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We demonstrate that least-squares regression weighted by the variance of an estimated optimal value function of the next state is crucial to achieving minimax optimality. Based on this observation, we present Variance-Weighted Least-Squares MDVI (VWLS-MDVI), the first theoretical algorithm that achieves nearly minimax optimal sample complexity for infinite-horizon linear MDPs.	Toshinori Kitamura; Tadashi Kozuno; Yunhao Tang; Nino Vieillard; Michal Valko; Wenhao Yang; Jincheng Mei; Pierre MENARD; Mohammad Gheshlaghi Azar; Remi Munos; Olivier Pietquin; Matthieu Geist; Csaba Szepesvari; Wataru Kumagai; Yutaka Matsuo;
1497	Neural Networks Trained with SGD Learn Distributions of Increasing Complexity Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we show that neural networks trained using stochastic gradient descent initially classify their inputs using lower-order input statistics, like mean and covariance, and exploit higher-order statistics only later during training.	Maria Refinetti; Alessandro Ingrosso; Sebastian Goldt;
1498	Regret Minimization and Convergence to Equilibria in General-sum Markov Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present the first (to our knowledge) algorithm for learning in general-sum Markov games that provides sublinear regret guarantees when executed by all agents.	Liad Erez; Tal Lancewicki; Uri Sherman; Tomer Koren; Yishay Mansour;
1499	Learning to Decouple Complex Systems Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This setting is fairly common in the real world but has been less considered. In this paper, we propose a sequential learning approach under this setting by decoupling a complex system for handling irregularly sampled and cluttered sequential observations.	Zihan Zhou; Tianshu Yu;
1500	Prompting Large Language Model for Machine Translation: A Case Study Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Research on prompting has shown excellent performance with little or even no supervised training across many tasks. However, prompting for machine translation is still under-explored in the literature. We fill this gap by offering a systematic study on prompting strategies for translation, examining various factors for prompt template and demonstration example selection.	Biao Zhang; Barry Haddow; Alexandra Birch;
1501	Non-autoregressive Conditional Diffusion Models for Time Series Prediction Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose TimeDiff, a non-autoregressive diffusion model that achieves high-quality time series prediction with the introduction of two novel conditioning mechanisms: future mixup and autoregressive initialization.	Lifeng Shen; James Kwok;
1502	FedDisco: Federated Learning with Discrepancy-Aware Collaboration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We thus propose a novel aggregation method, Federated Learning with Discrepancy-Aware Collaboration (FedDisco), whose aggregation weights not only involve both the dataset size and the discrepancy value, but also contribute to a tighter theoretical upper bound of the optimization error.	Rui Ye; Mingkai Xu; Jianyu Wang; Chenxin Xu; Siheng Chen; Yanfeng Wang;
1503	Enabling First-Order Gradient-Based Learning for Equilibrium Computation in Markets Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel smoothing technique that creates a surrogate market game, in which first-order methods can be applied.	Nils Kohring; Fabian Raoul Pieroth; Martin Bichler;
1504	SpeedDETR: Speed-aware Transformers for End-to-end Object Detection Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The main issue is that the current literature solely concentrates on building algorithms with minimal computation, oblivious that the practical latency can also be affected by the memory access cost and the degree of parallelism. Therefore, we propose SpeedDETR, a novel speed-aware transformer for end-to-end object detectors, achieving high-speed inference on multiple devices.	Peiyan Dong; Zhenglun Kong; Xin Meng; PENG ZHANG; Hao Tang; Yanzhi Wang; Chih-Hsien Chou;
1505	CLIPood: Generalizing CLIP to Out-of-Distributions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on the unseen test data.	Yang Shu; Xingzhuo Guo; Jialong Wu; Ximei Wang; Jianmin Wang; Mingsheng Long;
1506	Randomized Gaussian Process Upper Confidence Bound with Tighter Bayesian Regret Bounds Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This study first generalizes the regret analysis of RGP-UCB to a wider class of distributions, including the Gamma distribution. Furthermore, we propose improved RGP-UCB (IRGP-UCB) based on a two-parameter exponential distribution, which achieves tighter Bayesian regret bounds.	Shion Takeno; Yu Inatsu; Masayuki Karasuyama;
1507	Scaling of Class-wise Training Losses for Post-hoc Calibration Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To resolve the issue, we propose a new calibration method to synchronize the class-wise training losses.	Seungjin Jung; Seungmo Seo; Yonghyun Jeong; Jongwon Choi;
1508	Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Inspired by the recent success of chain-of-thought prompting, we propose ChainCoder, a program synthesis language model that generates Python code progressively, i.e. from coarse to fine in multiple passes.	Wenqing Zheng; S P Sharan; AJAY KUMAR JAISWAL; Kevin Wang; Yihan Xi; Dejia Xu; Zhangyang Wang;
1509	Two-Scale Gradient Descent Ascent Dynamics Finds Mixed Nash Equilibria of Continuous Games: A Mean-Field Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: More precisely we show that for each finite temperature (or regularization parameter), the two-scale Mean-Field GDA with a suitable finite scale ratio converges exponentially to the unique MNE without assuming the convexity or concavity of the interaction potential.	Yulong Lu;
1510	Learning Antidote Data to Individual Unfairness Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such adversarial perturbations along a direction covering sensitive information used in DRO do not consider the inherent feature correlations or innate data constraints, therefore could mislead the model to optimize at off-manifold and unrealistic samples. In light of this drawback, in this paper, we propose to learn and generate antidote data that approximately follows the data distribution to remedy individual unfairness.	Peizhao Li; Ethan Xia; Hongfu Liu;
1511	A Law of Robustness Beyond Isoperimetry Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We study the robust interpolation problem of arbitrary data distributions supported on a bounded space and propose a two-fold law of robustness.	Yihan Wu; Heng Huang; Hongyang Zhang;
1512	Tight and Fast Generalization Error Bound of Graph Embedding in Metric Space Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper provides a novel upper bound of graph embedding’s generalization error by evaluating the local Rademacher complexity of the model as a function set of the distances of representation couples.	Atsushi Suzuki; Atsushi Nitanda; Taiji Suzuki; jing wang; Feng Tian; Kenji Yamanishi;
1513	ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we present a novel zero-shot object navigation method, Exploration with Soft Commonsense constraints (ESC), that transfers commonsense knowledge in pre-trained models to open-world object navigation without any navigation experience nor any other training on the visual environments.	Kaiwen Zhou; Kaizhi Zheng; Connor Pryor; Yilin Shen; Hongxia Jin; Lise Getoor; Xin Eric Wang;
1514	Stratified Adversarial Robustness with Rejection Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We theoretically analyze the stratified rejection setting and propose a novel defense method — Adversarial Training with Consistent Prediction-based Rejection (CPR) — for building a robust selective classifier.	Jiefeng Chen; Jayaram Raghuram; Jihye Choi; Xi Wu; Yingyu Liang; Somesh Jha;
1515	No One Idles: Efficient Heterogeneous Federated Learning with Parallel Edge and Server Computation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By swapping the computational order of aggregation and broadcasting, we propose a novel and efficient parallel federated learning (PFL) framework that unlocks the edge nodes during global computation and the central server during local computation.	Feilong Zhang; Xianming Liu; Shiyi Lin; Gang Wu; Xiong Zhou; Junjun Jiang; Xiangyang Ji;
1516	Hierarchical Programmatic Reinforcement Learning Via Learning to Compose Programs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite the encouraging results, the program policies that LEAPS can produce are limited by the distribution of the program dataset. Furthermore, during searching, LEAPS evaluates each candidate program solely based on its return, failing to precisely reward correct parts of programs and penalize incorrect parts. To address these issues, we propose to learn a meta-policy that composes a series of programs sampled from the learned program embedding space.	Guan-Ting Liu; En-Pei Hu; Pu-Jen Cheng; Hung-yi Lee; Shao-Hua Sun;
1517	Generalization Analysis for Contrastive Representation Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we establish novel generalization bounds for contrastive learning which do not depend on $k$, up to logarithmic terms.	Yunwen Lei; Tianbao Yang; Yiming Ying; Ding-Xuan Zhou;
1518	Fundamental Tradeoffs in Learning with Prior Information Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a general reduction-based approach for extending classical minimax lower-bound techniques in order to lower bound the prioritized risk for statistical estimation problems.	Anirudha Majumdar;
1519	Understanding Backdoor Attacks Through The Adaptability Hypothesis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose the adaptability hypothesis to understand when and why a backdoor attack works for general learning models, including deep neural networks, based on the theoretical investigation of classical kernel-based learning models.	Xun Xian; Ganghua Wang; Jayanth Srinivasa; Ashish Kundu; Xuan Bi; Mingyi Hong; Jie Ding;
1520	From Hypergraph Energy Functions to Hypergraph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Somewhat differently, in this paper we begin by presenting an expressive family of parameterized, hypergraph-regularized energy functions.	Yuxin Wang; Quan Gan; Xipeng Qiu; Xuanjing Huang; David Wipf;
1521	Continual Learners Are Incremental Model Generalizers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Motivated by the efficiency and rapid convergence of pre-trained models for solving downstream tasks, this paper extensively studies the impact of Continual Learning (CL) models as pre-trainers.	Jaehong Yoon; Sung Ju Hwang; Yue Cao;
1522	Does A Neural Network Really Encode Symbolic Concepts? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, strictly speaking, there still lacks a solid guarantee whether such interactions indeed represent meaningful concepts. Therefore, in this paper, we examine the trustworthiness of interaction concepts from four perspectives.	Mingjie Li; Quanshi Zhang;
1523	Towards Unbiased Training in Federated Open-world Semi-supervised Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel Federatedopen-world Semi-Supervised Learning (FedoSSL) framework, which can solve the key challenge in distributed and open-world settings, i.e., the biased training process for heterogeneously distributed unseen classes.	Jie ZHANG; Xiaosong Ma; Song Guo; Wenchao Xu;
1524	LinSATNet: The Positive Linear Satisfiability Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose the first differentiable satisfiability layer based on an extension of the classic Sinkhorn algorithm for jointly encoding multiple sets of marginal distributions.	Runzhong Wang; Yunhao Zhang; Ziao Guo; Tianyi Chen; Xiaokang Yang; Junchi Yan;
1525	Semiparametrically Efficient Off-Policy Evaluation in Linear Markov Decision Processes Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We characterize the variance lower bound for regular estimators in the linear MDP setting and propose an efficient estimator whose variance achieves that lower bound.	Chuhan Xie; Wenhao Yang; Zhihua Zhang;
1526	Optimal Arms Identification with Knapsacks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a novel OAK algorithm and prove the upper bound of our algorithm by exploring the relationship between selecting optimal actions and the structure of the feasible region.	Shaoang Li; Lan Zhang; Yingqi Yu; Xiangyang Li;
1527	DIVISION: Memory Efficient Training Via Dual Activation Precision Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we propose a simple and effective method to compress DNN training.	Guanchu Wang; Zirui Liu; Zhimeng Jiang; Ninghao Liu; Na Zou; Xia Hu;
1528	Open-VCLIP: Transforming CLIP to An Open-vocabulary Video Model Via Interpolated Weight Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We introduce Open-VCLIP, a simple yet effective approach that transforms CLIP into a strong zero-shot video classifier that can recognize unseen actions and events at test time.	Zejia Weng; Xitong Yang; Ang Li; Zuxuan Wu; Yu-Gang Jiang;
1529	Learning to Boost Training By Periodic Nowcasting Near Future Weights Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on our observations on 1) high correlation between past eights and future weights, 2) conditions for beneficial weight prediction, and 3) feasibility of weight prediction, we propose a more general framework by intermittently skipping a handful of epochs by periodically forecasting near future weights, i.e., a Weight Nowcaster Network (WNN).	Jinhyeok Jang; Woo-han Yun; Won Hwa Kim; Youngwoo Yoon; Jaehong Kim; Jaeyeon Lee; ByungOk Han;
1530	Efficient and Degree-Guided Graph Generation Via Discrete Diffusion Modeling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose EDGE, a new diffusion-based generative graph model that addresses generative tasks with large graphs.	Xiaohui Chen; Jiaxing He; Xu Han; Liping Liu;
1531	Scalable Set Encoding with Universal Mini-Batch Consistency and Unbiased Full Set Gradient Approximation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Additionally, prior work has not addressed how to deal with large sets during training when the full set gradient is required. To address these issues, we propose a Universally MBC (UMBC) class of set functions which can be used in conjunction with arbitrary non-MBC components while still satisfying MBC, enabling a wider range of function classes to be used in MBC settings.	Jeffrey Willette; Seanie Lee; Bruno Andreis; Kenji Kawaguchi; Juho Lee; Sung Ju Hwang;
1532	RLEG: Vision-Language Representation Learning with Diffusion-based Embedding Generation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: By virtue of generative capability in this paper, we propose a novel vision-language Representation Learning method with diffusion-based Embedding Generation (RLEG), which exploits diffusion models to generate feature embedding online for learning effective vision-language representation.	Liming Zhao; Kecheng Zheng; Yun Zheng; Deli Zhao; Jingren Zhou;
1533	The Ideal Continual Learner: An Agent That Never Forgets Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, a rigorous theoretical understanding of these methods remains elusive. This paper aims to bridge this gap between theory and practice by proposing a new continual learning framework called Ideal Continual Learner (ICL), which is guaranteed to avoid catastrophic forgetting by construction.	Liangzu Peng; Paris Giampouras; Rene Vidal;
1534	TIPS: Topologically Important Path Sampling for Anytime Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To address the limitations of existing hand-crafted approaches, we first model the training process of AnytimeNNs as a discrete-time Markov chain (DTMC) and use it to identify the paths that contribute the most to the training of AnytimeNNs. Based on this new DTMC-based analysis, we further propose TIPS, a framework to automatically design AnytimeNNs under various hardware constraints.	Guihong Li; Kartikeya Bhardwaj; Yuedong Yang; Radu Marculescu;
1535	Unleashing Mask: Explore The Intrinsic Out-of-Distribution Detection Capability Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we generally discover the existence of an intermediate stage of a model trained on in-distribution (ID) data having higher OOD detection performance than that of its final stage across different settings, and further identify one critical data-level attribution to be learning with the atypical samples. Based on such insights, we propose a novel method, Unleashing Mask, which aims to restore the OOD discriminative capabilities of the well-trained model with ID data.	Jianing Zhu; Hengzhuang Li; Jiangchao Yao; Tongliang Liu; Jianliang Xu; Bo Han;
1536	Exploring Model Dynamics for Accumulative Poisoning Discovery Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we dive into the perspective of model dynamics and propose a novel information measure, namely, Memorization Discrepancy, to explore the defense via the model-level information.	Jianing Zhu; Xiawei Guo; Jiangchao Yao; Chao Du; LI He; Shuo Yuan; Tongliang Liu; Liang Wang; Bo Han;
1537	NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, under severe occlusion, this projection fails to resolve uncertainty, resulting in blurry renderings that lack details. In this work, we propose NerfDiff, which addresses this issue by distilling the knowledge of a 3D-aware conditional diffusion model (CDM) into NeRF through synthesizing and refining a set of virtual views at test-time.	Jiatao Gu; Alex Trevithick; Kai-En Lin; Joshua M. Susskind; Christian Theobalt; Lingjie Liu; Ravi Ramamoorthi;
1538	Men Also Do Laundry: Multi-Attribute Bias Amplification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We demonstrate models can learn to exploit correlations with respect to multiple attributes, which are not accounted for by current metrics. Moreover, we show that current metrics can give the erroneous impression that little to no bias amplification has occurred as they aggregate positive and negative bias scores. Further, these metrics lack an ideal value, making them difficult to interpret. To address these shortcomings, we propose a new metric: $\textit{Multi-Attribute Bias Amplification}$.	Dora Zhao; Jerone Andrews; Alice Xiang;
1539	Muse: Text-To-Image Generation Via Masked Generative Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Muse, a text-to-image Transformermodel that achieves state-of-the-art image genera-tion performance while being significantly moreefficient than diffusion or autoregressive models.Muse is trained on a masked modeling task indiscrete token space: given the text embeddingextracted from a pre-trained large language model(LLM), Muse learns to predict randomly maskedimage tokens.	Huiwen Chang; Han Zhang; Jarred Barber; Aaron Maschinot; Jose Lezama; Lu Jiang; Ming-Hsuan Yang; Kevin Patrick Murphy; William T. Freeman; Michael Rubinstein; Yuanzhen Li; Dilip Krishnan;
1540	Learning Noisy OR Bayesian Networks with Max-Product Belief Propagation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose parallel max-product as an alternative algorithm for learning noisy-OR BNs with complex latent structures and we derive a fast stochastic training scheme that scales to large datasets.	Antoine Dedieu; Guangyao Zhou; Dileep George; Miguel Lazaro-Gredilla;
1541	Learning Preconditioners for Conjugate Gradient PDE Solvers Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present a new method that leverages learning-based approach to obtain an approximate matrix factorization to the system matrix to be used as a preconditioner in the context of PCG solvers.	Yichen Li; Peter Yichen Chen; Tao Du; Wojciech Matusik;
1542	The Unintended Consequences of Discount Regularization: Improving Regularization in Certainty Equivalence Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we reveal an alternate view of discount regularization that exposes unintended consequences.	Sarah Rathnam; Sonali Parbhoo; Weiwei Pan; Susan Murphy; Finale Doshi-Velez;
1543	FaDIn: Fast Discretized Inference for Hawkes Processes with General Parametric Kernels Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work aims to offer an efficient solution to TPP inference using general parametric kernels with finite support.	Guillaume Staerman; Cédric Allain; Alexandre Gramfort; Thomas Moreau;
1544	Is Learning Summary Statistics Necessary for Likelihood-free Inference? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A longstanding question in LFI has been how to design or learn good summary statistics of data, but this might now seem unnecessary due to the advent of recent end-to-end (i.e. neural network-based) LFI methods. In this work, we rethink this question with a new method for learning summary statistics.	Yanzhi Chen; Michael U. Gutmann; Adrian Weller;
1545	FLEX: An Adaptive Exploration Algorithm for Nonlinear Systems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we introduce FLEX, an exploration algorithm for nonlinear dynamics based on optimal experimental design.	Matthieu Blanke; Marc Lelarge;
1546	Bayesian Progressive Deep Topic Model with Knowledge Informed Textual Data Coarsening Process Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, in this paper, we propose a novel progressive deep topic model that consists of a knowledge-informed textural data coarsening process and a corresponding progressive generative model.	Zhibin Duan; Xinyang Liu; Yudi Su; Yi.shi Xu; Bo Chen; Mingyuan Zhou;
1547	ClusterFuG: Clustering Fully Connected Graphs By Multicut Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a graph clustering formulation based on multicut (a.k.a. weighted correlation clustering) on the complete graph.	Ahmed Abbas; Paul Swoboda;
1548	A Unified Optimization Framework of ANN-SNN Conversion: Towards Optimal Mapping from Activation Values to Firing Rates Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a unified optimization framework for ANN-SNN conversion that considers both performance loss and conversion error.	Haiyan Jiang; Srinivas Anumasa; Giulia De Masi; Huan Xiong; Bin Gu;
1549	PFNs4BO: In-Context Learning for Bayesian Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we use Prior-data Fitted Networks (PFNs) as a flexible surrogate for Bayesian Optimization (BO).	Samuel Müller; Matthias Feurer; Noah Hollmann; Frank Hutter;
1550	Meta-SAGE: Scale Meta-Learning Scheduled Adaptation with Guided Exploration for Mitigating Scale Shift on Combinatorial Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes Meta-SAGE, a novel approach for improving the scalability of deep reinforcement learning models for combinatorial optimization (CO) tasks.	Jiwoo Son; Minsu Kim; Hyeonah Kim; Jinkyoo Park;
1551	Improving The Model Consistency of Decentralized Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing DFL suffers from high inconsistency among local clients, which results in severe distribution shift and inferior performance compared with centralized FL (CFL), especially on heterogeneous data or sparse communication topologies. To alleviate this issue, we propose two DFL algorithms named DFedSAM and DFedSAM-MGS to improve the performance of DFL.	Yifan Shi; Li Shen; Kang Wei; Yan Sun; Bo Yuan; Xueqian Wang; Dacheng Tao;
1552	Dink-Net: Neural Clustering on Large Graphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, the existing methods fail to scale to the large graph with million nodes. To solve this problem, a scalable deep graph clustering method (Dink-Net) is proposed with the idea of dilation and shrink.	Yue Liu; KE LIANG; Jun Xia; sihang zhou; Xihong Yang; Xinwang Liu; Stan Z. Li;
1553	Weak Proxies Are Sufficient and Preferable for Fairness with Missing Sensitive Attributes Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Our theoretical analyses show that directly using proxy models can give a false sense of (un)fairness. Second, we develop an algorithm that is able to measure fairness (provably) accurately with only three properly identified proxies.	Zhaowei Zhu; Yuanshun Yao; Jiankai Sun; Hang Li; Yang Liu;
1554	Improving Visual Prompt Tuning for Self-supervised Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Unfortunately, identifying the optimal blocks for prompts within each self-supervised ViT for diverse future scenarios is a costly process. To mitigate this problem, we propose a simple yet effective method that learns a gate for each ViT block to adjust its intervention into the prompt tokens.	Seungryong Yoo; Eunji Kim; Dahuin Jung; Jungbeom Lee; Sungroh Yoon;
1555	Nearly Optimal Algorithms with Sublinear Computational Complexity for Online Kernel Regression Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose two new algorithms, AOGD-ALD and NONS-ALD, which can keep nearly optimal regret bounds at a sublinear computational complexity, and give sufficient conditions under which our algorithms work.	Junfan Li; Shizhong Liao;
1556	Online Prototype Alignment for Few-shot Policy Transfer Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Besides, they often rely on visual clues to learn the mapping function and may fail when the source domain looks quite different from the target domain. To address these problems, in this paper, we propose a novel framework Online Prototype Alignment (OPA) to learn the mapping function based on the functional similarity of elements and is able to achieve few-shot policy transfer within only several episodes.	Qi Yi; Rui Zhang; Shaohui Peng; Jiaming Guo; Yunkai Gao; Kaizhao Yuan; Ruizhi Chen; Siming Lan; Xing Hu; Zidong Du; Xishan Zhang; Qi Guo; Yunji Chen;
1557	Submodular Order Functions and Assortment Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We give fast algorithms with strong approximation guarantees for maximizing submodular order functions under a variety of constraints.	Rajan Udwani;
1558	On The Impact of Knowledge Distillation for Model Interpretability Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this study, we have attempted to show that KD enhances the interpretability as well as the accuracy of models.	Hyeongrok Han; Siwon Kim; Hyun-Soo Choi; Sungroh Yoon;
1559	Probabilistic Concept Bottleneck Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we address the ambiguity issue that can harm reliability.	Eunji Kim; Dahuin Jung; Sangha Park; Siwon Kim; Sungroh Yoon;
1560	A General Representation Learning Framework with Generalization Performance Guarantees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we prove that generalization error of representation learning function can be estimated effectively by solving two convex optimization problems.	Junbiao Cui; Jianqing Liang; Qin Yue; Jiye Liang;
1561	Moderately Distributional Exploration for Domain Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: It is because a large uncertainty set could introduce domains containing semantically different factors from training domains. To address this issue, we propose to perform a $\textit{mo}$derately $\textit{d}$istributional $\textit{e}$xploration (MODE) for domain generalization.	Rui Dai; Yonggang Zhang; Zhen Fang; Bo Han; Xinmei Tian;
1562	Multiple Thinking Achieving Meta-Ability Decoupling for Object Navigation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a meta-ability decoupling (MAD) paradigm, which brings together various object navigation methods in an architecture system, allowing them to mutually enhance each other and evolve together.	Ronghao Dang; Lu Chen; Liuyi Wang; Zongtao He; Chengju Liu; Qijun Chen;
1563	Learning to Acquire Novel Cognitive Tasks with Evolution, Plasticity and Meta-meta-learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here we evolve neural networks, endowed with plastic connections and neuromodulation, over a sizable set of simple cognitive tasks adapted from a computational neuroscience framework.	Thomas Miconi;
1564	Model-Free Robust Average-Reward Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we focus on the robust average-reward MDPs under the model-free setting.We provide several widely used uncertainty sets as examples, including those defined by the contamination model, total variation, Chi-squared divergence, Kullback-Leibler (KL) divergence, and Wasserstein distance.	Yue Wang; Alvaro Velasquez; George K. Atia; Ashley Prater-Bennette; Shaofeng Zou;
1565	Discover and Cure: Concept-aware Mitigation of Spurious Correlation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose an interpretable framework, Discover and Cure (DISC), to tackle the issue.	Shirley Wu; Mert Yuksekgonul; Linjun Zhang; James Zou;
1566	Geometric Autoencoders – What You See Is What You Decode Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, such expressive networks can achieve low reconstruction error even when the latent representation is distorted. To avoid such misleading visualizations, we propose first a differential geometric perspective on the decoder, leading to insightful diagnostics for an embedding’s distortion, and second a new regularizer mitigating such distortion.	Philipp Nazari; Sebastian Damrich; Fred A Hamprecht;
1567	Robust Camera Pose Refinement for Multi-Resolution Hash Encoding Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a joint optimization algorithm to calibrate the camera pose and learn a geometric representation using efficient multi-resolution hash encoding.	Hwan Heo; Taekyung Kim; Jiyoung Lee; Jaewon Lee; Soohyun Kim; Hyunwoo J. Kim; Jin-Hwa Kim;
1568	Faster Gradient-Free Algorithms for Nonsmooth Nonconvex Stochastic Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper proposes a more efficient algorithm using stochastic recursive gradient estimators, which improves the complexity to $\mathcal{O}(L^3 d^{3/2} \epsilon^{-3}+ \Delta L^2 d^{3/2} \delta^{-1} \epsilon^{-3})$.	Lesi Chen; Jing Xu; Luo Luo;
1569	DADAO: Decoupled Accelerated Decentralized Asynchronous Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work introduces DADAO: the first decentralized, accelerated, asynchronous, primal, first-order algorithm to minimize a sum of $L$-smooth and $\mu$-strongly convex functions distributed over a given network of size $n$.	Adel Nabli; Edouard Oyallon;
1570	Competing for Shareable Arms in Multi-Player Multi-Armed Bandits Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Subsequently, we propose a novel Selfish MPMAB with Averaging Allocation (SMAA) approach based on the equilibrium.	Renzhe Xu; Haotian Wang; Xingxuan Zhang; Bo Li; Peng Cui;
1571	Contrast with Reconstruct: Contrastive 3D Representation Learning Guided By Generative Pretraining Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This motivates us to learn 3D representations by sharing the merits of both paradigms, which is non-trivial due to the pattern difference between the two paradigms. In this paper, we propose contrast with reconstruct (ReCon) that unifies these two paradigms.	Zekun Qi; Runpei Dong; Guofan Fan; Zheng Ge; Xiangyu Zhang; Kaisheng Ma; Li Yi;
1572	Estimating Possible Causal Effects with Latent Variables Via Adjustment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Since many causal graphs can correspond to one PAG, they are possibly associated with different causal effects. The aim of this paper is to estimate these possible causal effects via covariate adjustment given a PAG.	Tian-Zuo Wang; Tian Qin; Zhi-Hua Zhou;
1573	Bit Allocation Using Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we consider the problem of bit allocation in Neural Video Compression (NVC).	Tongda Xu; Han Gao; Chenjian Gao; Yuanyuan Wang; Dailan He; Jinyong Pi; Jixiang Luo; Ziyu Zhu; Mao Ye; Hongwei Qin; Yan Wang; Jingjing Liu; Ya-Qin Zhang;
1574	Causal Strategic Classification: A Tale of Two Shifts Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Focusing on accuracy as our primary objective, we show how strategic behavior and causal effects underlie two complementing forms of distribution shift. We characterize these shifts, and propose a learning algorithm that balances between these two forces and over time, and permits end-to-end training.	Guy Horowitz; Nir Rosenfeld;
1575	Modeling Dynamic Environments with Scene Graph Memory Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Embodied AI agents that search for objects in large environments such as households often need to make efficient decisions by predicting object locations based on partial information. We pose this as a new type of link prediction problem: link prediction on partially observable dynamic graphs Our graph is a representation of a scene in which rooms and objects are nodes, and their relationships are encoded in the edges; only parts of the changing graph are known to the agent at each timestep.	Andrey Kurenkov; Michael Lingelbach; Tanmay Agarwal; Emily Jin; Chengshu Li; Ruohan Zhang; Li Fei-Fei; Jiajun Wu; Silvio Savarese; Roberto Martín-Martín;
1576	Solving High-Dimensional PDEs with Latent Spectral Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present Latent Spectral Models (LSM) toward an efficient and precise solver for high-dimensional PDEs.	Haixu Wu; Tengge Hu; Huakun Luo; Jianmin Wang; Mingsheng Long;
1577	Oscillation-free Quantization for Low-bit Vision Transformers Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose three techniques correspondingly: statistical weight quantization ($\rm StatsQ$) to improve quantization robustness compared to the prevalent learnable-scale-based method; confidence-guided annealing ($\rm CGA$) that freezes the weights with $\textit{high confidence}$ and calms the oscillating weights; and $\textit{query}$-$\textit{key}$ reparameterization ($\rm QKR$) to resolve the query-key intertwined oscillation and mitigate the resulting gradient misestimation.	Shih-yang Liu; Zechun Liu; Kwang-Ting Cheng;
1578	A Study on Transformer Configuration and Training Objective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: For example, we usually set the base model with hidden size (i.e. model width) to be 768 and the number of transformer layers (i.e. model depth) to be 12. In this paper, we revisit these conventional configurations by studying the the relationship between transformer configuration and training objective.	Fuzhao Xue; Jianghai Chen; Aixin Sun; Xiaozhe Ren; Zangwei Zheng; Xiaoxin He; Yongming Chen; Xin Jiang; Yang You;
1579	Quantifying The Knowledge in GNNs for Reliable Distillation Into MLPs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To achieve reliable distillation, we propose an effective approach, namely Knowledge-inspired Reliable Distillation (KRD), that models the probability of each node being an informative and reliable knowledge point, based on which we sample a set of additional reliable knowledge points as supervision for training student MLPs.	Lirong Wu; Haitao Lin; Yufei Huang; Stan Z. Li;
1580	Composer: Creative and Controllable Image Synthesis with Composable Conditions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This work offers a new generation paradigm that allows flexible control of the output image, such as spatial layout and palette, while maintaining the synthesis quality and model creativity.	Lianghua Huang; Di Chen; Yu Liu; Yujun Shen; Deli Zhao; Jingren Zhou;
1581	Sketching Meets Differential Privacy: Fast Algorithm for Dynamic Kronecker Projection Maintenance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Given a constraint matrix ${\sf A}$ and a positive semi-definite matrix $W\in \mathbb{R}^{n\times n}$ with a sparse eigenbasis, we consider the task of maintaining the projection in the form of ${\sf B}^\top({\sf B}{\sf B}^\top)^{-1}{\sf B}$, where ${\sf B}={\sf A}(W\otimes I)$ or ${\sf B}={\sf A}(W^{1/2}\otimes W^{1/2})$.	Zhao Song; Xin Yang; Yuanyuan Yang; Lichen Zhang;
1582	Robust Perception Through Equivariance Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we introduce a framework that uses the dense intrinsic constraints in natural images to robustify inference.	Chengzhi Mao; Lingyu Zhang; Abhishek Vaibhav Joshi; Junfeng Yang; Hao Wang; Carl Vondrick;
1583	Understanding and Defending Patched-based Adversarial Attacks for Vision Transformer Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Based on understanding adversarial patch attacks, we propose a simple but efficient defense that correctly detects more than 95% of adversarial patches.	Liang Liu; Yanan Guo; Youtao Zhang; Jun Yang;
1584	Robust Weight Signatures: Gaining Robustness As Easy As Patching Weights? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We start by drawing several key observations: (i) assuming that we train the same model architecture on both a clean dataset and its corrupted version, a comparison between the two resultant models shows their weights to mostly differ in shallow layers; (ii) the weight difference after projection, which we call Robust Weight Signature (RWS), appears to be discriminative and indicative of different corruption types; (iii) perhaps most strikingly, for the same corruption type, the RWSs obtained by one model architecture are highly consistent and transferable across different datasets. Based on those RWS observations, we propose a minimalistic model robustness patching framework that carries a model trained on clean data together with its pre-extracted RWSs.	Ruisi Cai; Zhenyu Zhang; Zhangyang Wang;
1585	Estimating Heterogeneous Treatment Effects: Mutual Information Bounds and Learning Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, existing works on modeling selection bias and corresponding algorithms do not naturally generalize to non-binary treatment spaces. To address this limitation, we propose to use mutual information to describe selection bias in estimating HTE and derive a novel error bound using the mutual information between the covariates and the treatments, which is the first error bound to cover general treatment schemes including multinoulli or continuous spaces.	Xingzhuo Guo; Yuchen Zhang; Jianmin Wang; Mingsheng Long;
1586	On The Optimality of Misspecified Kernel Ridge Regression Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we show that KRR is minimax optimal for any $s\in (0,1)$ when the $\mathcal{H}$ is a Sobolev RKHS.	Haobo Zhang; Yicheng Li; Weihao Lu; Qian Lin;
1587	Multi-View Masked World Models for Visual Robotic Manipulation Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: How else can we utilize the richness of multi-view data? In this paper, we investigate how to learn good representations with multi-view data and utilize them for visual robotic manipulation.	Younggyo Seo; Junsu Kim; Stephen James; Kimin Lee; Jinwoo Shin; Pieter Abbeel;
1588	Performative Recommendation: Diversifying Content Via Strategic Incentives Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To remedy this, conventional approaches such as re-ranking improve diversity by presenting more diverse items. Here we argue that to promote inherent and prolonged diversity, the system must encourage its creation.	Itay Eilat; Nir Rosenfeld;
1589	How Does Information Bottleneck Help Deep Learning? Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we provide the first rigorous learning theory for justifying the benefit of information bottleneck in deep learning by mathematically relating information bottleneck to generalization errors.	Kenji Kawaguchi; Zhun Deng; Xu Ji; Jiaoyang Huang;
1590	Momentum Ensures Convergence of SIGNSGD Under Weaker Assumptions Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper revisits the convergence of signSGD and proves that momentum can remedy signSGD under weaker assumptions than previous techniques; in particular, our convergence theory does not require the assumption of bounded stochastic gradient or increased batch size.	Tao Sun; Qingsong Wang; Dongsheng Li; Bao Wang;
1591	Lowering The Pre-training Tax for Gradient-based Subset Training: A Lightweight Distributed Pre-Training Toolkit Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: As a result, the pre-training phase itself incurs significant time/resource overhead, and prior work has not gone beyond hyperparameter search to reduce pre-training time. Our work explicitly aims to reduce this $\textbf{pre-training tax}$ in gradient-based subset training.	Yeonju Ro; Zhangyang Wang; Vijay Chidambaram; Aditya Akella;
1592	Eliminating Adversarial Noise Via Information Discard and Robust Representation Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we start from the latent inherent properties of adversarial samples to break the limitations.	Dawei Zhou; Yukun Chen; Nannan Wang; Decheng Liu; Xinbo Gao; Tongliang Liu;
1593	BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This paper proposes BLIP-2, a generic and efficient pre-training strategy that bootstraps vision-language pre-training from off-the-shelf frozen pre-trained image encoders and frozen large language models.	Junnan Li; Dongxu Li; Silvio Savarese; Steven Hoi;
1594	Gradient Descent Finds The Global Optima of Two-Layer Physics-Informed Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The main aim of this paper is to conduct the convergence analysis of the gradient descent for two-layer physics-informed neural networks (PINNs).	Yihang Gao; Yiqi Gu; Michael Ng;
1595	A Nearly-Optimal Bound for Fast Regression with $\ell_\infty$ Guarantee Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: One popular approach for solving $\ell_2$ regression problem is via sketching: picking a structured random matrix $S\in \mathbb{R}^{m\times n}$ with $m\ll n$ and $SA$ can be quickly computed, solve the “sketched” regression problem $x’=\mathrm{argmin} \|\|SAx-Sb\|\|_2$. In this paper, we show that in order to obtain such $\ell_\infty$ guarantee for $\ell_2$ regression, one has to use sketching matrices that are dense.	Zhao Song; Mingquan Ye; Junze Yin; Lichen Zhang;
1596	Transcendental Idealism of Planner: Evaluating Perception from Planning Perspective for Autonomous Driving Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose a principled framework that provides a coherent and systematic understanding of the impact an error in the perception module imposes on an autonomous agent’s planning that actually controls the vehicle.	Weixin Li; Xiaodong Yang;
1597	Attribute-Efficient PAC Learning of Low-Degree Polynomial Threshold Functions with Nasty Noise Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: The concept class of low-degree polynomial threshold functions (PTFs) plays a fundamental role in machine learning. In this paper, we study PAC learning of $K$-sparse degree-$d$ PTFs on $\mathbb{R}^n$, where any such concept depends only on $K$ out of $n$ attributes of the input.	Shiwei Zeng; Jie Shen;
1598	Consistency of Multiple Kernel Clustering Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in multiple kernel clustering (MKC), the consistency of kernel weights has not been sufficiently investigated. In this work, we fill this gap with a non-asymptotic analysis on the consistency of kernel weights of a novel method termed SimpleMKKM.	Weixuan Liang; Xinwang Liu; Yong Liu; Chuan Ma; Yunping Zhao; Zhe Liu; En Zhu;
1599	Towards Controlled Data Augmentations for Active Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we thoroughly study the coupling of data augmentation and active learning, thereby proposing Controllable Augmentation ManiPulator for Active Learning.	Jianan Yang; Haobo Wang; Sai Wu; Gang Chen; Junbo Zhao;
1600	Bi-directional Masks for Efficient N:M Sparse Training Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We focus on addressing the dense backward propagation issue for training efficiency of N:M fine-grained sparsity that preserves at most N out of M consecutive weights and achieves practical speedups supported by the N:M sparse tensor core. Therefore, we present a novel method of Bi-directional Masks (Bi-Mask) with its two central innovations in: 1) Separate sparse masks in the two directions of forward and backward propagation to obtain training acceleration.	Yuxin Zhang; Yiting Luo; Mingbao Lin; Yunshan Zhong; JingJing Xie; Fei Chao; Rongrong Ji;
1601	The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a novel approach to addressing two fundamental challenges in Model-based Reinforcement Learning (MBRL): the computational expense of repeatedly finding a good policy in the learned model, and the objective mismatch between model fitting and policy computation.	Anirudh Vemula; Yuda Song; Aarti Singh; Drew Bagnell; Sanjiban Choudhury;
1602	Fair and Robust Estimation of Heterogeneous Treatment Effects for Policy Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simple and general framework for nonparametric estimation of heterogeneous treatment effects under fairness constraints.	Kwangho Kim; Jose R Zubizarreta;
1603	Controlled Text Generation with Natural Language Instructions Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Nevertheless, it is notoriously difficult to control their generation in such a way that it satisfies user-specified constraints. In this paper, we present InstructCTG, a simple controlled text generation framework that incorporates different constraints by verbalizing them as natural language instructions.	Wangchunshu Zhou; Yuchen Eleanor Jiang; Ethan Wilcox; Ryan Cotterell; Mrinmaya Sachan;
1604	Margin-based Neural Network Watermarking Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose a novel margin-based DNN watermarking approach that is robust to the functionality stealing attacks based on model extraction and distillation.	Byungjoo Kim; Suyoung Lee; Seanie Lee; Sooel Son; Sung Ju Hwang;
1605	Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we propose a multistep operator that unifies per-decision and trajectory-aware methods.	Brett Daley; Martha White; Christopher Amato; Marlos C. Machado;
1606	Great Models Think Alike: Improving Model Reliability Via Inter-Model Latent Agreement Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: A fundamental challenge is that models are often unreliable due to overconfidence. In this paper, we estimate a model’s reliability by measuring the agreement between its latent space, and the latent space of a foundation model.	Ailin Deng; Miao Xiong; Bryan Hooi;
1607	Opponent-Limited Online Search for Imperfect Information Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To make Safe-1-KLSS applicable to even larger games, we propose Opponent-Limited Subgame Solving (OLSS) to limit how the opponent reaches a subgame and how it acts in the subgame.	Weiming Liu; Haobo Fu; QIANG FU; Yang Wei;
1608	Pruning Via Sparsity-indexed ODE: A Continuous Sparsity Viewpoint Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we proposed a novel pruning framework, coined Sparsity-indexed ODE (SpODE) that provides explicit guidance on how to best preserve model performance while ensuring an infinitesimal increase in model sparsity.	Zhanfeng Mo; Haosen Shi; Sinno Pan;
1609	Nesterov Meets Optimism: Rate-Optimal Separable Minimax Optimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a new first-order optimization algorithm — AcceleratedGradient-OptimisticGradient (AG-OG) Descent Ascent—for separable convex-concave minimax optimization.	Chris Junchi Li; Angela Yuan; Gauthier Gidel; Quanquan Gu; Michael Jordan;
1610	Lazy Agents: A New Perspective on Solving Sparse Reward Problem in Multi-agent Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Sparse reward remains a valuable and challenging problem in multi-agent reinforcement learning (MARL). This paper addresses this issue from a new perspective, i.e., lazy agents.	Boyin Liu; Zhiqiang Pu; Yi Pan; Jianqiang Yi; Yanyan Liang; Du Zhang;
1611	Long Horizon Temperature Scaling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, autoregressive models rely on myopic temperature scaling that greedily optimizes the next token. To address this, we propose Long Horizon Temperature Scaling (LHTS), a novel approach for sampling from temperature-scaled joint distributions.	Andy Shih; Dorsa Sadigh; Stefano Ermon;
1612	Using Perturbation to Improve Goodness-of-Fit Tests Based on Kernelized Stein Discrepancy Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose to perturb the observed sample via Markov transition kernels, with respect to which the target distribution is invariant.	Xing Liu; Andrew Duncan; Axel Gandy;
1613	Near-Optimal $\Phi$-Regret Learning in Extensive-Form Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we establish efficient and uncoupled learning dynamics so that, when employed by all players in multiplayer perfect-recall imperfect-information extensive-form games, the trigger regret of each player grows as $O(\log T)$ after $T$ repetitions of play.	Ioannis Anagnostides; Gabriele Farina; Tuomas Sandholm;
1614	Neural Stochastic Differential Games for Time-series Analysis Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Despite such progress, most existing methods still face challenges in providing a general framework for analyzing time series. To tackle this, we adopt stochastic differential games to suggest a new philosophy of utilizing interacting collective intelligence in time series analysis.	Sungwoo Park; Byoungwoo Park; Moontae Lee; Changhee Lee;
1615	Which Tricks Are Important for Learning to Rank? Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Recently, several other GBDT-based ranking algorithms were proposed. In this paper, we thoroughly analyze these methods in a unified setup.	Ivan Lyzhin; Aleksei Ustimenko; Andrey Gulin; Liudmila Prokhorenkova;
1616	A Complete Expressiveness Hierarchy for Subgraph GNNs Via Subgraph Weisfeiler-Lehman Tests Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: While numerous architectures have been proposed, so far there is still a limited understanding of how various design paradigms differ in terms of expressive power, nor is it clear what design principle achieves maximal expressiveness with minimal architectural complexity. To address these fundamental questions, this paper conducts a systematic study of general node-based subgraph GNNs through the lens of Subgraph Weisfeiler-Lehman Tests (SWL).	Bohang Zhang; Guhao Feng; Yiheng Du; Di He; Liwei Wang;
1617	The SSL Interplay: Augmentations, Inductive Bias, and Generalization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Yet in practice, engineers face issues such as instability in tuning optimizers and collapse of representations during training. Such challenges motivate the need for a theory to shed light on the complex interplay between the choice of data augmentation, network architecture, and training algorithm. % on the resulting performance in downstream tasks. We study such an interplay with a precise analysis of generalization performance on both pretraining and downstream tasks in kernel regimes, and highlight several insights for SSL practitioners that arise from our theory.	Vivien Cabannes; Bobak Kiani; Randall Balestriero; Yann LeCun; Alberto Bietti;
1618	Policy Mirror Ascent for Efficient and Independent Learning in Mean Field Games Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Instead, we show that $N$ agents running policy mirror ascent converge to the Nash equilibrium of the regularized game within $\widetilde{\mathcal{O}}(\varepsilon^{-2})$ samples from a single sample trajectory without a population generative model, up to a standard $\mathcal{O}(\frac{1}{\sqrt{N}})$ error due to the mean field.	Batuhan Yardim; Semih Cayci; Matthieu Geist; Niao He;
1619	High-dimensional Clustering Onto Hamiltonian Cycle Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, most of the clustering methods merely generate pseudo labels and thus are unable to simultaneously present the similarities between different clusters and outliers. This paper proposes a new framework called High-dimensional Clustering onto Hamiltonian Cycle (HCHC) to solve the above problems.	Tianyi Huang; Shenghui Cheng; Stan Z. Li; Zhengjun Zhang;
1620	Personalized Subgraph Federated Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To this end, we introduce a new subgraph FL problem, personalized subgraph FL, which focuses on the joint improvement of the interrelated local GNNs rather than learning a single global model, and propose a novel framework, FEDerated Personalized sUBgraph learning (FED-PUB), to tackle it.	Jinheon Baek; Wonyong Jeong; Jiongdao Jin; Jaehong Yoon; Sung Ju Hwang;
1621	Principled Reinforcement Learning with Human Feedback from Pairwise or K-wise Comparisons Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We provide a theoretical framework for Reinforcement Learning with Human Feedback (RLHF).	Banghua Zhu; Michael Jordan; Jiantao Jiao;
1622	On Investigating The Conservative Property of Score-Based Generative Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we demonstrate that the architectural constraints of CSBMs may limit their modeling ability.	Chen-Hao Chao; Wei-Fang Sun; Bo-Wun Cheng; Chun-Yi Lee;
1623	Curriculum Co-disentangled Representation Learning Across Multiple Environments for Social Recommendation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: However, in literature there has been no work on social recommendation capable of disentangling user representations across consuming and social environments. To solve this problem, we study co-disentangled representation learning across different environments via proposing the curriculum co-disentangled representation learning (CurCoDis) model to disentangle the hidden factors for users across both consuming and social environments.	Xin Wang; Zirui Pan; Yuwei Zhou; Hong Chen; Chendi Ge; Wenwu Zhu;
1624	Rethinking Explaining Graph Neural Networks Via Non-parametric Subgraph Matching Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, based on the observation that graphs typically share some common motif patterns, we propose a novel non-parametric subgraph matching framework, dubbed MatchExplainer, to explore explanatory subgraphs.	Fang Wu; Siyuan Li; Xurui Jin; Yinghui Jiang; Dragomir Radev; Zhangming Niu; Stan Z. Li;
1625	UPSCALE: Unconstrained Channel Pruning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In response, our insight is to reorder channels at export time, (1) reducing latency by reducing memory copies and (2) improving accuracy by removing constraints.	Alvin Wan; Hanxiang Hao; Kaushik Patnaik; Yueyang Xu; Omer Hadad; David Güera; Zhile Ren; Qi Shan;
1626	A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, given the interconnected nature of source localization, separation, and recognition, independent models are likely to yield suboptimal performance as they fail to capture the interdependence between these tasks. To address this problem, we propose a unified audio-visual learning framework (dubbed OneAVM) that integrates audio and visual cues for joint localization, separation, and recognition.	Shentong Mo; Pedro Morgado;
1627	Fair and Optimal Classification Via Post-Processing Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To mitigate the bias exhibited by machine learning models, fairness criteria can be integrated into the training process to ensure fair treatment across all demographics, but it often comes at the expense of model performance. Understanding such tradeoffs, therefore, underlies the design of fair algorithms. To this end, this paper provides a complete characterization of the inherent tradeoff of demographic parity on classification problems, under the most general multi-group, multi-class, and noisy setting.	Ruicheng Xian; Lang Yin; Han Zhao;
1628	A Theory of Continuous Generative Flow Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We present a theory for generalized GFlowNets, which encompasses both existing discrete GFlowNets and ones with continuous or hybrid state spaces, and perform experiments with two goals in mind.	Salem Lahlou; Tristan Deleu; Pablo Lemos; Dinghuai Zhang; Alexandra Volokhova; Alex Hernández-García; Lena Nehale Ezzine; Yoshua Bengio; Nikolay Malkin;
1629	Do Not Train It: A Linear Neural Architecture Search of Graph Neural Networks Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: With the randomly-initialized weights, we can then seek the optimal architecture parameters via the sparse coding objective and derive a novel NAS-GNNs method, namely neural architecture coding (NAC).	Peng XU; Lin Zhang; Xuanzhou Liu; Jiaqi Sun; Yue Zhao; Haiqin Yang; Bei Yu;
1630	All in A Row: Compressed Convolution Networks for Graphs Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To bridge the gap between Euclidean space and graph space, we propose a differentiable method for regularization on graphs that applies permutations to the input graphs.	Junshu Sun; Shuhui Wang; XINZHE HAN; Zhe Xue; Qingming Huang;
1631	Blossom: An Anytime Algorithm for Computing Optimal Decision Trees Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a simple algorithm to learn optimal decision trees of bounded depth.	Emir Demirović; Emmanuel Hebrard; Louis Jean;
1632	Functional Neural Networks: Shift Invariant Models for Functional Data with Applications to EEG Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We introduce a new class of neural networks that are shift invariant and preserve smoothness of the data: functional neural networks (FNNs).	Florian Heinrichs; Mavin Heim; Corinna Weber;
1633	Optimizing Mode Connectivity for Class Incremental Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To dodge the ridge, we propose parameter-saving OPtimizing Connectivity (OPC) based on Fourier series and gradient projection for finding the low-loss path between minima.	Haitao Wen; Haoyang Cheng; Heqian Qiu; Lanxiao Wang; Lili Pan; Hongliang Li;
1634	Never Mind The Metrics—what About The Uncertainty? Visualising Binary Confusion Matrix Metric Distributions to Put Performance in Perspective Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We develop novel interactive visualisations of performance metric contours within (and beyond) ROC space, showing the discrete probability mass functions of true and false positive rates and how these relate to performance metric distributions. We aim to raise awareness of the substantial uncertainty in performance metric estimates that can arise when classifiers are evaluated on empirical datasets and benchmarks, and that performance claims should be tempered by this understanding.	David Lovell; Dimity Miller; Jaiden Capra; Andrew P. Bradley;
1635	Rethink DARTS Search Space and Renovate A New Benchmark Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we first propose and orchestrate a suite of improvements to frame a larger and harder DSS, termed LHD, while retaining high efficiency in search.	Jiuling Zhang; Zhiming Ding;
1636	SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose SmoothQuant, a training-free, accuracy-preserving, and general-purpose post-training quantization (PTQ) solution to enable 8-bit weight, 8-bit activation (W8A8) quantization for LLMs.	Guangxuan Xiao; Ji Lin; Mickael Seznec; Hao Wu; Julien Demouth; song han;
1637	Provable Dynamic Fusion for Low-Quality Multimodal Data Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Can we design a provably robust multimodal fusion method? This paper provides theoretical understandings to answer this question under a most popular multimodal fusion framework from the generalization perspective.	Qingyang Zhang; Haitao Wu; Changqing Zhang; Qinghua Hu; Huazhu Fu; Joey Tianyi Zhou; Xi Peng;
1638	Which Invariance Should We Transfer? A Causal Minimax Learning Approach Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: However, a key question remains: which subset of this whole stable information should the model transfer, in order to achieve optimal generalization ability? To answer this question, we present a comprehensive minimax analysis from a causal perspective.	Mingzhou Liu; Xiangyu Zheng; Xinwei Sun; Fang Fang; Yizhou Wang;
1639	CLUSTSEG: Clustering for Universal Segmentation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We present CLUSTSEG, a general, transformer-based framework that tackles different image segmentation tasks ($i.e.,$ superpixel, semantic, instance, and panoptic) through a unified, neural clustering scheme.	James Chenhao Liang; Tianfei Zhou; Dongfang Liu; Wenguan Wang;
1640	Optimal LP Rounding and Linear-Time Approximation Algorithms for Clustering Edge-Colored Hypergraphs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We study the approximability of an existing framework for clustering edge-colored hypergraphs, which is closely related to chromatic correlation clustering and is motivated by machine learning and data mining applications where the goal is to cluster a set of objects based on multiway interactions of different categories or types. We present improved approximation guarantees based on linear programming, and show they are tight by proving a matching integrality gap.	Nate Veldt;
1641	GFlowNet-EM for Learning Compositional Latent Variable Models Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: For algorithms based on expectation-maximization (EM), the E-step is often intractable without restrictive approximations to the posterior. We propose the use of GFlowNets, algorithms for sampling from an unnormalized density by learning a stochastic policy for sequential construction of samples, for this intractable E-step.	Edward J Hu; Nikolay Malkin; Moksh Jain; Katie E Everett; Alexandros Graikos; Yoshua Bengio;
1642	Statistical Inference and A/B Testing for First-Price Pacing Equilibria Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a statistical framework for the FPPE model, in which a limit FPPE with a continuum of items models the long-run steady-state behavior of the auction platform, and an observable FPPE consisting of a finite number of items provides the data to estimate primitives of the limit FPPE, such as revenue, Nash social welfare (a fair metric of efficiency), and other parameters of interest.	Luofeng Liao; Christian Kroer;
1643	SDDM: Score-Decomposed Diffusion Models on Manifolds for Unpaired Image-to-Image Translation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This work presents a new score-decomposed diffusion model (SDDM) on manifolds to explicitly optimize the tangled distributions during image generation.	Shikun Sun; Longhui Wei; Junliang Xing; Jia Jia; Qi Tian;
1644	Stable Estimation of Heterogeneous Treatment Effects Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To improve the estimation accuracy of HTE for underrepresented populations, we propose a novel Stable CounterFactual Regression (StableCFR) to smooth the population distribution and upsample the underrepresented subpopulations, while balancing confounders between treatment and control groups.	Anpeng Wu; Kun Kuang; Ruoxuan Xiong; Bo Li; Fei Wu;
1645	CoCo: A Coupled Contrastive Framework for Unsupervised Domain Adaptive Graph Classification Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose Coupled Contrastive Graph Representation Learning (CoCo), which extracts the topological information from coupled learning branches and reduces the domain discrepancy with coupled contrastive learning.	Nan Yin; Li Shen; Mengzhu Wang; Long Lan; Zeyu Ma; Chong Chen; Xian-Sheng Hua; Xiao Luo;
1646	Answering Complex Logical Queries on Knowledge Graphs Via Query Computation Tree Optimization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this work, we propose QTO (Query Computation Tree Optimization) that can efficiently find the exact optimal solution.	Yushi Bai; Xin Lv; Juanzi Li; Lei Hou;
1647	FAIRER: Fairness As Decision Rationale Alignment Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we investigate fairness from the perspective of decision rationale and define the parameter parity score to characterize the fair decision process of networks by analyzing neuron influence in various subgroups.	Tianlin Li; Qing Guo; Aishan Liu; Mengnan Du; Zhiming Li; Yang Liu;
1648	Social Learning Spontaneously Emerges By Searching Optimal Heuristics with Deep Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we address both problems by employing a deep reinforcement learning model to optimize the social learning strategies (SLSs) of agents in a cooperative game in a multi-dimensional landscape.	Seungwoong Ha; Hawoong Jeong;
1649	Crafting Training Degradation Distribution for The Accuracy-Generalization Trade-off in Real-World Super-Resolution Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce a novel approach to craft training degradation distributions using a small set of reference images.	Ruofan Zhang; Jinjin Gu; Haoyu Chen; Chao Dong; Yulun Zhang; Wenming Yang;
1650	SEGA: Structural Entropy Guided Anchor View for Graph Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: An anchor view that maintains the essential information of input graphs for contrastive learning has been hardly investigated. In this paper, based on the theory of graph information bottleneck, we deduce the definition of this anchor view; put differently, the anchor view with essential information of input graph is supposed to have the minimal structural uncertainty.	Junran Wu; Xueyuan Chen; Bowen Shi; Shangzhe Li; Ke Xu;
1651	Learning Neural Constitutive Laws from Motion Observations for Generalizable PDE Dynamics Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a hybrid neural network (NN) and PDE approach for learning generalizable PDE dynamics from motion observations.	Pingchuan Ma; Peter Yichen Chen; Bolei Deng; Joshua B. Tenenbaum; Tao Du; Chuang Gan; Wojciech Matusik;
1652	Model-Bellman Inconsistency for Model-based Offline Reinforcement Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we propose MOdel-Bellman Inconsistency penalized offLinE Policy Optimization (MOBILE), a novel uncertainty-driven offline RL algorithm.	Yihao Sun; Jiaji Zhang; Chengxing Jia; Haoxin Lin; Junyin Ye; Yang Yu;
1653	Evolving Semantic Prototype Improves Generative Zero-Shot Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: So the synthesized visual sample features do not faithfully represent the real sample features, limiting the classifier training and existing ZSL performance. In this paper, we formulate this mismatch phenomenon as the visual-semantic domain shift problem.	Shiming Chen; Wenjin Hou; Ziming Hong; Xiaohan Ding; Yibing Song; Xinge You; Tongliang Liu; Kun Zhang;
1654	Random Shuffle Transformer for Image Restoration Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we present that local window Transformer can also function as modeling non-local interactions.	Jie Xiao; Xueyang Fu; man zhou; Hongjian Liu; Zheng-Jun Zha;
1655	Retrieval-Augmented Multimodal Language Modeling Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: To integrate knowledge in a more scalable and modular way, we propose a retrieval-augmented multimodal model, which enables a base multimodal model (generator) to refer to relevant text and images fetched by a retriever from external memory (e.g., documents on the web).	Michihiro Yasunaga; Armen Aghajanyan; Weijia Shi; Richard James; Jure Leskovec; Percy Liang; Mike Lewis; Luke Zettlemoyer; Wen-tau Yih;
1656	A Closer Look at Few-shot Classification Again Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we empirically prove that the training algorithm and the adaptation algorithm can be completely disentangled, which allows algorithm analysis and design to be done individually for each phase.	Xu Luo; Hao Wu; Ji Zhang; Lianli Gao; Jing Xu; Jingkuan Song;
1657	Boosting Offline Reinforcement Learning with Action Preference Query Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this work, we introduce an interaction-free training scheme dubbed Offline-with-Action-Preferences (OAP).	Qisen Yang; Shenzhi Wang; Matthieu Gaetan Lin; Shiji Song; Gao Huang;
1658	Patch-level Contrastive Learning Via Positional Query for Visual Pre-training Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose a plug-in method PQCL (Positional Query for patch-level Contrastive Learning), which allows performing patch-level contrasts between two views with exact patch correspondence.	Shaofeng Zhang; Qiang Zhou; Zhibin Wang; Fan Wang; Junchi Yan;
1659	Minimum Width of Leaky-ReLU Neural Networks for Uniform Universal Approximation Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: This paper examines a uniform UAP for the function class $C(\mathcal{K},\mathbb{R}^{d_y})$ and gives the exact minimum width of the leaky-ReLU NN as $w_{\min}=\max(d_x+1,d_y)+1_{d_y=d_x+1}$, which involves the effects of the output dimensions. To obtain this result, we propose a novel lift-flow-discretization approach that shows that the uniform UAP has a deep connection with topological theory.	Li’ang Li; Yifei Duan; Guanghua Ji; Yongqiang Cai;
1660	Optimizing DDPM Sampling with Shortcut Fine-Tuning Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this study, we propose Shortcut Fine-Tuning (SFT), a new approach for addressing the challenge of fast sampling of pretrained Denoising Diffusion Probabilistic Models (DDPMs).	Ying Fan; Kangwook Lee;
1661	Learning Physical Models That Can Respect Conservation Laws Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Delivering on the promise of SciML requires seamlessly incorporating both types of problems into the learning process. To address this issue, we propose ProbConserv, a framework for incorporating constraints into a generic SciML architecture.	Derek Hansen; Danielle C. Maddix; Shima Alizadeh; Gaurav Gupta; Michael W. Mahoney;
1662	Searching Large Neighborhoods for Integer Linear Programs with Contrastive Learning Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: In this paper, we propose a novel approach, CL-LNS, that delivers state-of-the-art anytime performance on several ILP benchmarks measured by metrics including the primal gap, the primal integral, survival rates and the best performing rate.	Taoan Huang; Aaron M Ferber; Yuandong Tian; Bistra Dilkina; Benoit Steiner;
1663	BiBench: Benchmarking and Analyzing Network Binarization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Common challenges of binarization, such as accuracy degradation and efficiency limitation, suggest that its attributes are not fully understood. To close this gap, we present BiBench, a rigorously designed benchmark with in-depth analysis for network binarization.	Haotong Qin; Mingyuan Zhang; Yifu Ding; AOYU LI; Zhongang Cai; Ziwei Liu; Fisher Yu; Xianglong Liu;
1664	Leveraging Demonstrations to Improve Online Learning: Quality Matters Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: We propose an informed TS algorithm that utilizes the demonstration data in a coherent way through Bayes’ rule and derive a prior-dependent Bayesian regret bound.	Botao Hao; Rahul Jain; Tor Lattimore; Benjamin Van Roy; Zheng Wen;
1665	Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we thus propose model ratatouille, a new strategy to recycle the multiple fine-tunings of the same foundation model on diverse auxiliary tasks.	Alexandre Rame; Kartik Ahuja; Jianyu Zhang; Matthieu Cord; Leon Bottou; David Lopez-Paz;
1666	Optimizing NOTEARS Objectives Via Topological Swaps Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: These problems involve minimizing a given loss or score function, subject to a non-convex continuous constraint that penalizes the presence of cycles in a graph. In this work, we delve into the optimality challenges associated with this class of non-convex programs.	Chang Deng; Kevin Bello; Bryon Aragam; Pradeep Kumar Ravikumar;
1667	Dynamical Linear Bandits Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: In this paper, we introduce a novel setting, the Dynamical Linear Bandits (DLB), an extension of the linear bandits characterized by a hidden state.	Marco Mussi; Alberto Maria Metelli; Marcello Restelli;
1668	Unifying Molecular and Textual Representations Via Multi-task Language Modelling Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: Here, we propose the first multi-domain, multi-task language model that can solve a wide range of tasks in both the chemical and natural language domains.	Dimitrios Christofidellis; Giorgio Giannone; Jannis Born; Ole Winther; Teodoro Laino; Matteo Manica;
1669	Probabilistic Contrastive Learning Recovers The Correct Aleatoric Uncertainty of Ambiguous Inputs Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: This makes the true posterior for the latent vector probabilistic with heteroscedastic uncertainty. In this setup, we extend the common InfoNCE objective and encoders to predict latent distributions instead of points.	Michael Kirchhof; Enkelejda Kasneci; Seong Joon Oh;
1670	Towards Omni-generalizable Neural Methods for Vehicle Routing Problems Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose a generic meta-learning framework, which enables effective training of an initialized model with the capability of fast adaptation to new tasks during inference.	Jianan Zhou; Yaoxin Wu; Wen Song; Zhiguang Cao; Jie Zhang;
1671	Model-agnostic Measure of Generalization Difficulty Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: We propose what is to our knowledge the first model-agnostic measure of the inherent generalization difficulty of tasks.	Akhilan Boopathy; Kevin Liu; Jaedong Hwang; Shu Ge; Asaad Mohammedsaleh; Ila R Fiete;
1672	Long-Term Rhythmic Video Soundtracker Related Papers Related Patents Related Grants Related Venues Related Experts Related Code View Highlight: To this end, we present Long-Term Rhythmic Video Soundtracker (LORIS), a novel framework to synthesize long-term conditional waveforms.	Jiashuo Yu; Yaohui Wang; Xinyuan Chen; Xiao Sun; Yu Qiao;
1673	Rethinking Warm-Starts with Predictions: Learning Predictions Close to Sets of Optimal Solutions for Faster $\text{L}$-/$\text{L}^\natural$-Convex Function Minimization Related Papers Related Patents Related Grants Related Venues Related Experts View Highlight: Indeed, the dual problem of bipartite matching and, more generally, $\text{L}$-/$\text{L}^\natural$-convex function minimization have arbitrarily many optimal solutions, making such prediction-dependent bounds arbitrarily large. To resolve this theoretically critical issue, we present a new warm-start-with-prediction framework for $\text{L}$-/$\text{L}^\natural$-convex function minimization.	Shinsaku Sakaue; Taihei Oki;