# Paper Digest: AISTATS 2015 Highlights

Readers can also choose to read this highlight article on our console, which allows users to filter out papers using keywords and find related papers.

The International Conference on Artificial Intelligence and Statistics (AISTATS) is an interdisciplinary gathering of researchers at the intersection of computer science, artificial intelligence, machine learning, statistics, and related areas.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to **sign up our free daily paper digest service ** to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team

team@paperdigest.org

#### TABLE 1: AISTATS 2015 Papers

Title | Authors | Highlight | |
---|---|---|---|

1 | Nonparametric Bayesian Factor Analysis for Dynamic Count Matrices | Ayan Acharya, Joydeep Ghosh, Mingyuan Zhou | We apply the model to text and music analysis, with state-of-the-art results. |

2 | Parameter Estimation of Generalized Linear Models without Assuming their Link Function | Sreangsu Acharyya, Joydeep Ghosh | We propose a parameter-recovery facilitating, jointly-convex, regularized loss functional that is optimized globally over the vector as well as the link function, with best rates possible under a first order oracle model. |

3 | Spectral Gap Error Bounds for Improving CUR Matrix Decomposition and the Nystr�m Method | David Anderson, Simon Du, Michael Mahoney, Christopher Melgaard, Kunming Wu, Ming Gu | Here, we introduce novel \emphspectral gap error bounds that judiciously exploit the potentially rapid spectrum decay in the input matrix, a most common occurrence in machine learning and data analysis. |

4 | Global Multi-armed Bandits with H�lder Continuity | Onur Atan, Cem Tekin, Mihaela Schaar | In this paper, formalize a new class of multi-armed bandit methods, Global Multi-armed Bandit (GMAB), in which arms are globally informative through a global parameter, i.e., choosing an arm reveals information about all the arms. |

5 | Efficient Sparse Clustering of High-Dimensional Non-spherical Gaussian Mixtures | Martin Azizyan, Aarti Singh, Larry Wasserman | The method we propose is a combination of a recent approach for learning parameters of a Gaussian mixture model and sparse linear discriminant analysis (LDA). |

6 | Unifying Local Consistency and MAX SAT Relaxations for Scalable Inference with Rounding Guarantees | Stephen Bach, Bert Huang, Lise Getoor | We prove the equivalence of first-order local consistency relaxations and the MAX SAT relaxation of Goemans and Williamson (1994) for a class of MRFs we refer to as logical MRFs. |

7 | Near-optimal max-affine estimators for convex regression | Gabor Balazs, Andr�s Gy�rgy, Csaba Szepesvari | This paper considers least squares estimators for regression problems over convex, uniformly bounded, uniformly Lipschitz function classes minimizing the empirical risk over max-affine functions (the maximum of finitely many affine functions). |

8 | Convex Multi-Task Learning by Clustering | Aviad Barzilai, Koby Crammer | We propose a scalable optimization algorithm for finding the optimal solution. |

9 | Gaussian Processes for Bayesian hypothesis tests on regression functions | Alessio Benavoli, Francesca Mangili | In this paper we show that they can also be employed as a universal tool for developing a large variety of Bayesian statistical hypothesis tests for regression functions. |

10 | Sparse Solutions to Nonnegative Linear Systems and Applications | Aditya Bhaskara, Ananda Suresh, Morteza Zadimoghaddam | We give an efficient algorithm for finding sparse approximate solutions to linear systems of equations with nonnegative coefficients. |

11 | Generalized Linear Models for Aggregated Data | Avradeep Bhowmik, Joydeep Ghosh, Oluwasanmi Koyejo | Based on this relationship, we propose a simple algorithm to estimate the model parameters and individual level inferences via alternating imputation and standard generalized linear model fitting. |

12 | Accurate and conservative estimates of MRF log-likelihood using reverse annealing | Yuri Burda, Roger Grosse, Ruslan Salakhutdinov | We present the Reverse AIS Estimator (RAISE), a stochastic lower bound on the log-likelihood of an approximation to the original MRF model. |

13 | Stochastic Spectral Descent for Restricted Boltzmann Machines | David Carlson, Volkan Cevher, Lawrence Carin | We introduce a new method called “Stochastic Spectral Descent” that updates parameters in the normed space. |

14 | Implementable confidence sets in high dimensional regression | Alexandra Carpentier | We focus on the problem of constructing adaptive and honest confidence sets for the sparse parameter θ, i.e. we want to construct a confidence set for theta that contains theta with high probability, and that is as small as possible. |

15 | Online Ranking with Top-1 Feedback | Sougata Chaudhuri, Ambuj Tewari | We consider a novel top-1 feedback model: at the end of each round, the relevance score for only the top ranked object is revealed. We provide a comprehensive set of results regarding learnability under this challenging setting. |

16 | One-bit Compressed Sensing with the k-Support Norm | Sheng Chen, Arindam Banerjee | In this paper, we investigate 1-bit CS problems for sparse signals using the recently proposed k-support norm. |

17 | Efficient Second-Order Gradient Boosting for Conditional Random Fields | Tianqi Chen, Sameer Singh, Ben Taskar, Carlos Guestrin | We incorporate second-order information by deriving a Markov Chain mixing rate bound to quantify the dependencies, and introduce a gradient boosting algorithm that iteratively optimizes an adaptive upper bound of the objective function. |

18 | Filtered Search for Submodular Maximization with Controllable Approximation Bounds | Wenlin Chen, Yixin Chen, Kilian Weinberger | In this paper, we propose a filtered search (FS) framework that allows the user to set an arbitrary approximation bound guarantee with a “tunable knob”, from 0 (arbitrarily bad) to 1 (globally optimal). |

19 | Predictive Inverse Optimal Control for Linear-Quadratic-Gaussian Systems | Xiangli Chen, Brian Ziebart | In this work, we extend predictive inverse optimal control to the linear- quadratic-Gaussian control setting. |

20 | Exact Bayesian Learning of Ancestor Relations in Bayesian Networks | Yetian Chen, Lingjian Meng, Jin Tian | In this paper, we develop dynamic programming (DP) algorithms to compute the exact posterior probabilities of ancestor relations in Bayesian networks. |

21 | Model Selection for Topic Models via Spectral Decomposition | Dehua Cheng, Xinran He, Yan Liu | Topic models have achieved significant successes in analyzing large-scale text corpus. |

22 | The Loss Surfaces of Multilayer Networks | Anna Choromanska, MIkael Henaff, Michael Mathieu, Gerard Ben Arous, Yann LeCun | We study the connection between the highly non-convex loss function of a simple model of the fully-connected feed-forward neural network and the Hamiltonian of the spherical spin-glass model under the assumptions of: i) variable independence, ii) redundancy in network parametrization, and iii) uniformity. |

23 | Averaged Least-Mean-Squares: Bias-Variance Trade-offs and Optimal Sampling Distributions | Alexandre Defossez, Francis Bach | We consider the least-squares regression problem and provide a detailed asymptotic analysis of the performance of averaged constant-step-size stochastic gradient descent. |

24 | A Topic Modeling Approach to Ranking | Weicong Ding, Prakash Ishwar, Venkatesh Saligrama | We propose a topic modeling approach to the prediction of preferences in pairwise comparisons. |

25 | A totally unimodular view of structured sparsity | Marwa El Halabi, Volkan Cevher | This paper describes a simple framework for structured sparse recovery based on convex optimization. |

26 | Back to the Past: Source Identification in Diffusion Networks from Partially Observed Cascades | Mehrdad Farajtabar, Manuel Gomez Rodriguez, Mohammad Zamani, Nan Du, Hongyuan Zha, Le Song | In this paper, we tackle this problem by developing a two-stage framework, which first learns a continuous-time diffusion network based on historical diffusion traces and then identifies the source of an incomplete diffusion trace by maximizing the likelihood of the trace under the learned model. |

27 | Graph Approximation and Clustering on a Budget | Ethan Fetaya, Ohad Shamir, Shimon Ullman | We consider the problem of learning from a similarity matrix (such as spectral clustering and low-dimensional embedding), when computing pairwise similarities are costly, and only a limited number of entries can be observed. |

28 | A Sufficient Statistics Construction of Exponential Family Le �vy Measure Densities for Nonparametric Conjugate Models | Robert Finn, Brian Kulis | We seek to address the problem of obtaining a general construction of prior distributions over infi- nite dimensional spaces possessing distribu- tional properties amenable to conjugacy. |

29 | Computational Complexity of Linear Large Margin Classification With Ramp Loss | S�ren Frejstrup Maibing, Christian Igel | This article addresses the fundamental question about the computational complexity of determining whether there is a hypotheses class with a hypothesis such that the upper bound on the generalization error is below a certain value. |

30 | Learning Deep Sigmoid Belief Networks with Data Augmentation | Zhe Gan, Ricardo Henao, David Carlson, Lawrence Carin | Deep directed generative models are developed. |

31 | Efficient Estimation of Mutual Information for Strongly Dependent Variables | Shuyang Gao, Greg Ver Steeg, Aram Galstyan | We introduce a new estimator that is robust to local non-uniformity, works well with limited data, and is able to capture relationship strengths over many orders of magnitude. |

32 | On Anomaly Ranking and Excess-Mass Curves | Nicolas Goix, Anne Sabourin, St�phan Cl�men�on | Extensions to the multivariate setting are far from straightforward and it is precisely the main purpose of this paper to introduce a novel and convenient (functional) criterion for measuring the performance of a scoring function regarding the anomaly ranking task, referred to as the Excess-Mass curve (EM-curve). |

33 | Modeling Skill Acquisition Over Time with Sequence and Topic Modeling | Jos� Gonz�lez-Brenes | We propose three novel data-driven methods that bridge sequence modeling with topic models to infer students’ time varying knowledge. |

34 | Consistent Collective Matrix Completion under Joint Low Rank Structure | Suriya Gunasekar, Makoto Yamada, Dawei Yin, Yi Chang | The sample complexity requirement derived in the paper are optimum up to logarithmic factors, and significantly improve upon the requirements obtained by trivial extensions of standard matrix completion. |

35 | The Bayesian Echo Chamber: Modeling Social Influence via Linguistic Accommodation | Fangjian Guo, Charles Blundell, Hanna Wallach, Katherine Heller | We present the Bayesian Echo Chamber, a new Bayesian generative model for social interaction data. |

36 | Preserving Privacy of Continuous High-dimensional Data with Minimax Filters | Jihun Hamm | Minimax filters that achieve the optimal privacy-utility trade-off from broad families of filters and loss/classifiers are defined, and algorithms for learning the filers in batch or distributed settings are presented. |

37 | A Consistent Method for Graph Based Anomaly Localization | Satoshi Hara, Tetsuro Morimura, Toshihiro Takahashi, Hiroki Yanagisawa, Taiji Suzuki | In this paper, we propose an anomaly localization algorithm with a consistency guarantee on its results. |

38 | Metric recovery from directed unweighted graphs | Tatsunori Hashimoto, Yi Sun, Tommi Jaakkola | We analyze directed, unweighted graphs obtained from x_i∈\RR^d by connecting vertex i to j iff |x_i – x_j| < ε(x_i). |

39 | Scalable Variational Gaussian Process Classification | James Hensman, Alexander Matthews, Zoubin Ghahramani | We show how to scale the model within a variational inducing point framework, out-performing the state of the art on benchmark datasets. |

40 | Stochastic Structured Variational Inference | Matthew Hoffman, David Blei | We show how to relax the mean-field approximation to allow arbitrary dependencies between global parameters and local hidden variables, producing better parameter estimates by reducing bias, sensitivity to local optima, and sensitivity to hyperparameters. |

41 | Reliable and Scalable Variational Inference for the Hierarchical Dirichlet Process | Michael Hughes, Dae Il Kim, Erik Sudderth | We introduce a new variational inference objective for hierarchical Dirichlet process admixture models. |

42 | Cross-domain recommendation without shared users or items by sharing latent vector distributions | Tomoharu Iwata, Takeuchi Koh | We propose a cross-domain recommendation method for predicting the ratings of items in different domains, where neither users nor items are shared across domains. |

43 | Submodular Point Processes with Applications to Machine learning | Rishabh Iyer, Jeffrey Bilmes | In this paper, we analyze the computational complexity of probabilistic inference in SPPs. |

44 | Online Optimization : Competing with Dynamic Comparators | Ali Jadbabaie, Alexander Rakhlin, Shahin Shahrampour, Karthik Sridharan | In this paper, we address these two directions together. |

45 | Estimating the accuracies of multiple classifiers without labeled data | Ariel Jaffe, Boaz Nadler, Yuval Kluger | In this paper, focusing on the binary case, we present simple, computationally efficient algorithms to solve these questions. |

46 | Sparse Dueling Bandits | Kevin Jamieson, Sumeet Katariya, Atul Deshpande, Robert Nowak | This paper focuses on a new approach for finding the best arm according to the Borda criterion using noisy comparisons. |

47 | Consensus Message Passing for Layered Graphical Models | Varun Jampani, S. M. Ali Eslami, Daniel Tarlow, Pushmeet Kohli, John Winn | With these models in mind, we introduce a modification to message passing that learns to exploit their layered structure by passing ’consensus’ messages that guide inference towards good solutions. |

48 | Robust Cost Sensitive Support Vector Machine | Shuichi Katsumata, Akiko Takeda | In this paper we consider robust classifications and show equivalence between the regularized classifications. |

49 | On Approximate Non-submodular Minimization via Tree-Structured Supermodularity | Yoshinobu Kawahara, Rishabh Iyer, Jeffrey Bilmes | We address the problem of minimizing non-submodular functions where the supermodularity is restricted to tree-structured pairwise terms. |

50 | Sparse Submodular Probabilistic PCA | Rajiv Khanna, Joydeep Ghosh, Russell Poldrack, Oluwasanmi Koyejo | We propose a novel approach for sparse probabilistic principal component analysis, that combines a low rank representation for the latent factors and loadings with a novel sparse variational inference approach for estimating distributions of latent variables subject to sparse support constraints. |

51 | Latent feature regression for multivariate count data | Arto Klami, Abhishek Tripathi, Johannes Sirola, Lauri V�re, Frederic Roulland | We consider the problem of regression on multivariate count data and present a Gibbs sampler for a latent feature regression model suitable for both under- and overdispersed response variables. |

52 | Dimensionality estimation without distances | Matth�us Kleindessner, Ulrike Luxburg | We provide two estimators for this situation, a naive one and a more elaborate one. |

53 | A Bayes consistent 1-NN classifier | Aryeh Kontorovich, Roi Weiss | We show that a simple modification of the 1-nearest neighbor classifier yields a strongly Bayes consistent learner. |

54 | DART: Dropouts meet Multiple Additive Regression Trees | Rashmi Korlakai Vinayak, Ran Gilad-Bachrach | In this work, we explore a different approach to address the problem, that of employing dropouts, a tool that has been recently proposed in the context of learning deep neural networks. |

55 | On Estimating L_2^2 Divergence | Akshay Krishnamurthy, Kirthevasan Kandasamy, Barnabas Poczos, Larry Wasserman | We give a comprehensive theoretical characterization of a nonparametric estimator for the L_2^2 divergence between two continuous distributions. |

56 | Tensor Factorization via Matrix Factorization | Volodymyr Kuleshov, Arun Chaganty, Percy Liang | In this paper, we propose a new algorithm for CP tensor factorization that uses random projections to reduce the problem to simultaneous matrix diagonalization. |

57 | Low-Rank Spectral Learning with Weighted Loss Functions | Alex Kulesza, Nan Jiang, Satinder Singh | In this paper we prove that when learning predictive state representations those problematic cases disappear if we introduce a particular weighted loss function and learn using sufficiently large sets of statistics; our main result is a bound on the loss of the learned low-rank model in terms of the singular values that are discarded. |

58 | Symmetric Iterative Proportional Fitting | Sven Kurras | Since IPF inherently generates non-symmetric matrices, we introduce two symmetrized variants of IPF. |

59 | Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits | Branislav Kveton, Zheng Wen, Azin Ashkan, Csaba Szepesvari | In this paper, we close the problem of computationally and sample efficient learning in stochastic combinatorial semi-bandits. |

60 | Sequential Kernel Herding: Frank-Wolfe Optimization for Particle Filtering | Simon Lacoste-Julien, Fredrik Lindsten, Francis Bach | In this paper, we propose to replace the random sampling step in a particle filter by Frank-Wolfe optimization. |

61 | Particle Gibbs for Bayesian Additive Regression Trees | Balaji Lakshminarayanan, Daniel Roy, Yee Whye Teh | We present a novel sampler for BART based on the Particle Gibbs (PG) algorithm (Andrieu et al., 2010) and a top-down particle filtering algorithm for Bayesian decision trees (Lakshminarayanan et al., 2013). |

62 | Deeply-Supervised Nets | Chen-Yu Lee, Saining Xie, Patrick Gallagher, Zhengyou Zhang, Zhuowen Tu | We propose deeply-supervised nets (DSN), a method that simultaneously minimizes classification error and improves the directness and transparency of the hidden layer learning process. |

63 | Preferential Attachment in Graphs with Affinities | Jay Lee, Manzil Zaheer, Stephan G�nnemann, Alex Smola | We propose a random graph model based on both node attributes and preferential attachment. |

64 | Bayesian Hierarchical Clustering with Exponential Family: Small-Variance Asymptotics and Reducibility | Juho Lee, Seungjin Choi | In this paper we relax BHC into a non-probabilistic formulation, exploring smallvariance asymptotics in conjugate-exponential models. |

65 | Modelling Policies in MDPs in Reproducing Kernel Hilbert Space | Guy Lever, Ronnie Stafford | We present a framework for performing gradient-based policy optimization in the RKHS, deriving the functional gradient of the return for our policy, which has a simple form and can be estimated efficiently. |

66 | Scalable Optimization of Randomized Operational Decisions in Adversarial Classification Settings | Bo Li, Yevgeniy Vorobeychik | To overcome scalability limitations, we introduce a novel method for estimating a compact parity basis representation for the operational decision function. |

67 | Toward Minimax Off-policy Value Estimation | Lihong Li, Remi Munos, Csaba Szepesvari | This paper studies the off-policy evaluation problem, where one aims to estimate the value of a target policy based on a sample of observations collected by another policy. |

68 | Compressed Sensing with Very Sparse Gaussian Random Projections | Ping Li, Cun-Hui Zhang | In this paper, following a well-known experimental setup \citeUrl:wiki_sparse, we show that, at the same number of measurements, the recovery accuracies of our proposed method are similar to the standard L1 decoding. |

69 | Max-Margin Zero-Shot Learning for Multi-class Classification | Xin Li, Yuhong Guo | In this paper, we propose a semi-supervised max-margin learning framework that integrates the semi-supervised classification problem over observed classes and the unsupervised clustering problem over unseen classes together to tackle zero-shot multi-class classification. |

70 | Conditional Restricted Boltzmann Machines for Multi-label Learning with Incomplete Labels | Xin Li, Feipeng Zhao, Yuhong Guo | In this paper, we develop a novel conditional restricted Boltzmann machine model to address multi-label learning with incomplete labels. |

71 | Sparsistency of \ell_1-Regularized M-Estimators | Yen-Huan Li, Jonathan Scarlett, Pradeep Ravikumar, Volkan Cevher | For this purpose, we propose the local structured smoothness condition (LSSC) on the loss function. |

72 | Similarity Learning for High-Dimensional Sparse Data | Kuan Liu, Aur�lien Bellet, Fei Sha | In this paper, we propose a method that can learn efficiently similarity measure from high-dimensional sparse data. |

73 | Tradeoffs for Space, Time, Data and Risk in Unsupervised Learning | Mario Lucic, Mesrob Ohannessian, Amin Karbasi, Andreas Krause | Using k-means clustering as a prototypical unsupervised learning problem, we show how we can strategically summarize the data (control space) in order to trade off risk and time when data is generated by a probabilistic model. |

74 | Active Pointillistic Pattern Search | Yifei Ma, Dougal Sutherland, Roman Garnett, Jeff Schneider | We introduce the problem of active pointillistic pattern search (APPS), which seeks to discover regions of a domain exhibiting desired behavior with limited observations. |

75 | The Security of Latent Dirichlet Allocation | Shike Mei, Xiaojin Zhu | We present an efficient solution (up to local optima) using descent method and implicit functions. |

76 | A Spectral Algorithm for Inference in Hidden semi-Markov Models | Igor Melnyk, Arindam Banerjee | In this paper, we introduce a novel spectral algorithm to perform inference in HSMMs. |

77 | Efficient Training of Structured SVMs via Soft Constraints | Ofer Meshi, Nathan Srebro, Tamir Hazan | In this work we observe that relaxing these agreement constraints and replacing them with soft constraints yields a much easier optimization problem. |

78 | Variance Reduction via Antithetic Markov Chains | James Neufeld, Dale Schuurmans, Michael Bowling | We present a Monte Carlo integration method, antithetic Markov chain sampling (AMCS), that incorporates local Markov transitions in an underlying importance sampler. |

79 | Fast Function to Function Regression | Junier Oliva, William Neiswanger, Barnabas Poczos, Eric Xing, Hy Trac, Shirley Ho, Jeff Schneider | We analyze the problem of regression when both input covariates and output responses are functions from a nonparametric function class. |

80 | Reactive bandits with attitude | Pedro Ortega, Kee-Eung Kim, Daniel Lee | When the underlying stochastic distribution is Gaussian, we derive an analytic solution for the long run optimal player strategy for different regimes of the bandit. |

81 | Feature Selection for Linear SVM with Provable Guarantees | Saurabh Paul, Malik Magdon-Ismail, Petros Drineas | We give two provably accurate feature-selection techniques for the linear SVM. |

82 | On Theoretical Properties of Sum-Product Networks | Robert Peharz, Sebastian Tschiatschek, Franz Pernkopf, Pedro Domingos | In this paper we fill some gaps in the theoretic foundation of SPNs. |

83 | Robust sketching for multiple square-root LASSO problems | Vu Pham, Laurent El Ghaoui | We introduce a robust framework for solving multiple square-root LASSO problems, based on a sketch of the learning data that uses low-rank approximations. |

84 | Deep Exponential Families | Rajesh Ranganath, Linpeng Tang, Laurent Charlin, David Blei | We describe deep exponential families (DEFs), a class of latent variable models that are inspired by the hidden structures used in deep neural networks. |

85 | On the High Dimensional Power of a Linear-Time Two Sample Test under Mean-shift Alternatives | Sashank Reddi, Aaditya Ramdas, Barnabas Poczos, Aarti Singh, Larry Wasserman | The main contribution of this paper is to explicitly characterize the power of a popular nonparametric two sample test, designed for general alternatives, under a mean-shift alternative in the high-dimensional setting. |

86 | A Scalable Algorithm for Structured Kernel Feature Selection | Shaogang Ren, Shuai Huang, John Onofrey, Xenios Papademetris, Xiaoning Qian | In this paper we propose a stochastic optimization algorithm that can efficiently address this computational problem on account of the redundant kernel representations of the given data. |

87 | Learning Efficient Anomaly Detectors from K-NN Graphs | Jonathan Root, Jing Qian, Venkatesh Saligrama | We propose a non-parametric anomaly detection algorithm for high dimensional data. |

88 | Gamma Processes, Stick-Breaking, and Variational Inference | Anirban Roychowdhury, Brian Kulis | In this paper, we present a variational inference framework for models involving gamma process priors. |

89 | Direct Density-Derivative Estimation and Its Application in KL-Divergence Approximation | Hiroaki Sasaki, Yung-Kyun Noh, Masashi Sugiyama | In this paper, we give a direct method to approximate the density derivative without estimating the density itself. |

90 | Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields | Mark Schmidt, Reza Babanezhad, Mohamed Ahmed, Aaron Defazio, Ann Clifton, Anoop Sarkar | We describe a practical implementation that uses structure in the CRF gradient to reduce the memory requirement of this linearly-convergent stochastic gradient method, propose a non-uniform sampling scheme that substantially improves practical performance, and analyze the rate of convergence of the SAGA variant under non-uniform sampling. |

91 | Sensor Selection for Crowdsensing Dynamical Systems | Francois Schnitzler, Jia Yuan Yu, Shie Mannor | To achieve low estimation error, we propose a Thompson sampling approach combining submodular optimization and a scalable online variational inference algorithm to maintain the posterior distribution over the variance. |

92 | A Rate of Convergence for Mixture Proportion Estimation, with Application to Learning from Noisy Labels | Clayton Scott | In this work we establish a rate of convergence for mixture proportion estimation under an appropriate distributional assumption, and argue that this rate of convergence is useful for analyzing weakly supervised learning algorithms that build on MPE. |

93 | Inference of Cause and Effect with Unsupervised Inverse Regression | Eleni Sgouritsa, Dominik Janzing, Philipp Hennig, Bernhard Sch�lkopf | To this end, we propose a method for estimating a conditional from samples of the corresponding marginal, which we call unsupervised inverse GP regression. |

94 | Estimation from Pairwise Comparisons: Sharp Minimax Bounds with Topology Dependence | Nihar Shah, Sivaraman Balakrishnan, Joseph Bradley, Abhay Parekh, Kannan Ramchandran, Martin Wainwright | The Bradley-Terry-Luce (BTL) and Thurstone models are the most widely used parametric models for such pairwise comparison data. |

95 | Exploiting Symmetries to Construct Efficient MCMC Algorithms With an Application to SLAM | Roshan Shariff, Andr�s Gy�rgy, Csaba Szepesvari | In this paper we propose a variation of the MH algorithm based on group moves, where the next state is obtained by first choosing a random transformation of the state space and then applying this transformation to the current state. |

96 | Learning Where to Sample in Structured Prediction | Tianlin Shi, Jacob Steinhardt, Percy Liang | In this paper, we propose a heterogeneous approach that dynamically allocates computation to the different parts. |

97 | State Space Methods for Efficient Inference in Student-t Process Regression | Arno Solin, Simo S�rkk� | We show how a large class of temporal TP regression models can be reformulated as state space models, and how a forward filtering and backward smoothing recursion can be derived for solving the inference analytically in linear time complexity. |

98 | Learning from Data with Heterogeneous Noise using SGD | Shuang Song, Kamalika Chaudhuri, Anand Sarwate | In this paper, we adopt instead a model in which data is observed through heterogeneous noise, where the noise level reflects the quality of the data source. |

99 | Data modeling with the elliptical gamma distribution | Suvrit Sra, Reshad Hosseini, Lucas Theis, Matthias Bethge | We study mixture modeling using the elliptical gamma (EG) distribution, a non-Gaussian distribution that allows heavy and light tail and peak behaviors. |

100 | WASP: Scalable Bayes via barycenters of subset posteriors | Sanvesh Srivastava, Volkan Cevher, Quoc Dinh, David Dunson | We propose a simple, general, and highly efficient approach, which first runs a posterior sampling algorithm in parallel on different machines for subsets of a large data set. |

101 | Calibration of conditional composite likelihood for Bayesian inference on Gibbs random fields | Julien Stoehr, Nial Friel | This paper provides a mean to calibrate the posterior distribution resulting from using a composite likelihood and illustrate its performance in several examples. |

102 | A Dirichlet Process Mixture Model for Spherical Data | Julian Straub, Jason Chang, Oren Freifeld, John Fisher III | For this purpose we propose a Dirichlet process mixture model of Gaussian distributions in distinct tangent spaces (DP-TGMM) to the sphere. |

103 | Inferring Block Structure of Graphical Models in Exponential Families | Siqi Sun, Hai Wang, Jinbo Xu | In this paper, we propose a novel generative model for describing the block structure in general exponential families, and optimize it by an Expectation-Maximization(EM) algorithm with variational Bayes. |

104 | Two-stage sampled learning theory on distributions | Zoltan Szabo, Arthur Gretton, Barnabas Poczos, Bharath Sriperumbudur | In this paper, we provide theoretical guarantees for a remarkably simple algorithmic alternative to solve the distribution regression problem: embed the distributions to a reproducing kernel Hilbert space, and learn a ridge regressor from the embeddings to the outputs. |

105 | Predicting Preference Reversals via Gaussian Process Uncertainty Aversion | Rikiya Takahashi, Tetsuro Morimura | In order to accurately predict choice decisions involving preference reversals, which existing econometric methods have failed to incorporate, the authors introduce a new cognitive choice model whose parameters are efficiently fitted with a global convex optimization algorithm. |

106 | Streaming Variational Inference for Bayesian Nonparametric Mixture Models | Alex Tank, Nicholas Foti, Emily Fox | We work within this general framework and present a streaming variational inference algorithm for NRM mixture models based on assumed density filtering. |

107 | Missing at Random in Graphical Models | Jin Tian | In this paper, we assume the missing data model is represented as a directed acyclic graph that not only encodes the dependencies among the variables but also explicitly portrays the causal mechanisms responsible for the missingness process. |

108 | Particle Gibbs with Ancestor Sampling for Probabilistic Programs | Jan-Willem Meent, Hongseok Yang, Vikash Mansinghka, Frank Wood | We present empirical results that demonstrate nontrivial performance gains. |

109 | Learning of Non-Parametric Control Policies with High-Dimensional State Features | Herke Van Hoof, Jan Peters, Gerhard Neumann | In this paper, we develop a policy search algorithm that integrates robust policy updates and kernel embeddings. |

110 | Maximally Informative Hierarchical Representations of High-Dimensional Data | Greg Ver Steeg, Aram Galstyan | We present bounds on how informative a representation is about input data. |

111 | Falling Rule Lists | Fulton Wang, Cynthia Rudin | We provide a Bayesian framework for learning falling rule lists that does not rely on traditional greedy decision tree learning methods. |

112 | Multi-Manifold Modeling in Non-Euclidean spaces | Xu Wang, Konstantinos Slavakis, Gilad Lerman | This paper advocates a novel framework for segmenting a dataset on a Riemannian manifold M into clusters lying around low-dimensional submanifolds of M. Important examples of M, for which the proposed algorithm is computationally efficient, include the sphere, the set of positive definite matrices, and the Grassmannian. |

113 | Column Subset Selection with Missing Data via Active Sampling | Yining Wang, Aarti Singh | In this paper, we propose and analyze two sampling based algorithms for column subset selection without access to the complete input matrix. |

114 | Trend Filtering on Graphs | Yu-Xiang Wang, James Sharpnack, Alex Smola, Ryan Tibshirani | We introduce a family of adaptive estimators on graphs, based on penalizing the \ell_1 norm of discrete graph differences. |

115 | A Greedy Homotopy Method for Regression with Nonconvex Constraints | Fabian Wauthier, Peter Donnelly | The goal of this paper is to estimate sparse linear regression models, where for a given partition \mathcalG of input variables, the selected variables are chosen from a \it diverse set of groups in \mathcalG. |

116 | Revisiting the Limits of MAP Inference by MWSS on Perfect Graphs | Adrian Weller | Here we introduce novel techniques and consider all cases, demonstrating that this greatly expands the set of tractable models. |

117 | Understanding and Evaluating Sparse Linear Discriminant Analysis | Yi Wu, David Wipf, Jeong-Min Yun | Linear discriminant analysis (LDA) represents a simple yet powerful technique for partitioning a p-dimensional feature vector into one of K classes based on a linear projection learned from N labeled observations. |

118 | Stochastic Block Transition Models for Dynamic Networks | Kevin Xu | In this paper, I propose a stochastic block transition model (SBTM) for dynamic networks that is inspired by the well-known stochastic block model (SBM) for static networks and previous dynamic extensions of the SBM. |

119 | Majorization-Minimization for Manifold Embedding | Zhirong Yang, Jaakko Peltonen, Samuel Kaski | We propose a new MM procedure that yields fast MM algorithms for a wide variety of manifold embedding problems. |

120 | A la Carte � Learning Fast Kernels | Zichao Yang, Andrew Wilson, Alex Smola, Le Song | We introduce a family of fast, flexible, general purpose, and lightly parametrized kernel learning methods, derived from Fastfood basis function expansions. |

121 | Minimizing Nonconvex Non-Separable Functions | Yaoliang Yu, Xun Zheng, Micol Marchetti-Bowick, Eric Xing | To address this issue, we propose a new proximal gradient meta-algorithm by rigorously extending the proximal average to the nonconvex setting. |

122 | A Simple Homotopy Algorithm for Compressive Sensing | Lijun Zhang, Tianbao Yang, Rong Jin, Zhi-Hua Zhou | In this paper, we consider the problem of recovering the s largest elements of an arbitrary vector from noisy measurements. |

123 | Scalable Nonparametric Multiway Data Analysis | Shandian Zhe, Zenglin Xu, Xinqi Chu, Yuan Qi, Youngja Park | To address these issues, we propose a scalable nonparametric tensor decomposition model. |

124 | Infinite Edge Partition Models for Overlapping Community Detection and Link Prediction | Mingyuan Zhou | A hierarchical gamma process infinite edge partition model is proposed to factorize the binary adjacency matrix of an unweighted undirected relational network under a Bernoulli-Poisson link. |

125 | Power-Law Graph Cuts | Xiangyang Zhou, Jiaxin Zhang, Brian Kulis | To achieve our goals, we treat the Pitman-Yor exchangeable partition probability function (EPPF) as a regularizer to graph cut objectives. |

126 | The Log-Shift Penalty for Adaptive Estimation of Multiple Gaussian Graphical Models | Yuancheng Zhu, Rina Foygel Barber | To estimate multiple related Gaussian graphical models on the same set of variables, we formulate a hierarchical model, which leads to an optimization problem with a nonconvex log-shift penalty function. |