# Paper Digest: AISTATS 2014 Highlights

Readers can also choose to read this highlight article on our console, which allows users to filter out papers using keywords and find related papers.

The International Conference on Artificial Intelligence and Statistics (AISTATS) is an interdisciplinary gathering of researchers at the intersection of computer science, artificial intelligence, machine learning, statistics, and related areas.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to **sign up our free daily paper digest service ** to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team

team@paperdigest.org

#### TABLE 1: AISTATS 2014 Papers

Title | Authors | Highlight | |
---|---|---|---|

1 | Preface | Samuel Kaski, Jukka Corander | Preface |

2 | Decontamination of Mutually Contaminated Models | Gilles Blanchard, Clayton Scott | This work focuses on the problem of classification with multiclass label noise, in a general setting where the noise proportions are unknown and the true class distributions are nonseparable and potentially quite complex. |

3 | Distributed optimization of deeply nested systems | Miguel Carreira-Perpinan, Weiran Wang | We describe a general strategy to learn the parameters and, to some extent, the architecture of nested systems, which we call the method of auxiliary coordinates (MAC). |

4 | Analysis of Empirical MAP and Empirical Partially Bayes: Can They be Alternatives to Variational Bayes? | Shinichi Nakajima, Masashi Sugiyama | In this paper, we theoretically investigate the behavior of the MAP and the PB solutions of matrix factorization. |

5 | Improved Bounds for Online Learning Over the Permutahedron and Other Ranking Polytopes | Nir Ailon | We present an algorithm of expected regret O(n\sqrtOPT + n^2), where OPT is the loss of the best (single) ranking in hindsight. |

6 | Information-Theoretic Characterization of Sparse Recovery | Cem Aksoylar, Venkatesh Saligrama | We formulate sparse support recovery as a salient set identification problem and use information-theoretic analyses to characterize the recovery performance and sample complexity. |

7 | Hybrid Discriminative-Generative Approach with Gaussian Processes | Ricardo Andrade Pacheco, James Hensman, Max Zwiessele, Neil D. Lawrence | Here, we present a model based on a hybrid approach that breaks down some of the barriers between the discriminative and generative points of view, allowing continuous dimensionality reduction of hybrid discrete-continuous data, discriminative classification with missing inputs and manifold learning informed by class labels. |

8 | Average Case Analysis of High-Dimensional Block-Sparse Recovery and Regression for Arbitrary Designs | Waheed Bajwa, Marco Duarte, Robert Calderbank | Average Case Analysis of High-Dimensional Block-Sparse Recovery and Regression for Arbitrary Designs |

9 | A New Perspective on Learning Linear Separators with Large L_qL_p Margins | Maria-Florina Balcan, Christopher Berlind | We give theoretical and empirical results that provide new insights into large margin learning. |

10 | A Non-parametric Conditional Factor Regression Model for Multi-Dimensional Input and Response | Ava Bargi, Richard Yi Xu, Zoubin Ghahramani, Massimo Piccardi | In this paper, we propose a non-parametric conditional factor regression (NCFR) model for domains with multi-dimensional input and response. |

11 | Learning Optimal Bounded Treewidth Bayesian Networks via Maximum Satisfiability | Jeremias Berg, Matti J�rvisalo, Brandon Malone | In this work, we develop a novel score-based approach to BTW-BNSL, based on casting BTW-BNSL as weighted partial Maximum satisfiability. |

12 | Online Passive-Aggressive Algorithms for Non-Negative Matrix Factorization and Completion | Mathieu Blondel, Yotaro Kubo, Ueda Naonori | In this paper, we present non-negative passive-aggressive (NN-PA), a family of online algorithms for non-negative matrix factorization (NMF). |

13 | PAC-Bayesian Theory for Transductive Learning | Luc B�gin, Pascal Germain, Fran�ois Laviolette, Jean-Francis Roy | We propose a PAC-Bayesian analysis of the transductive learning setting, introduced by Vapnik [2008], by proposing a family of new bounds on the generalization error. |

14 | Random Bayesian networks with bounded indegree | Eunice Yuh-Jie Chen, Judea Pearl | In this paper, we propose a simple model for large random BNs with bounded indegree, that is, large directed acyclic graphs (DAG) where the edges appear at random and each node has at most a given number of parents. |

15 | Efficient Low-Rank Stochastic Gradient Descent Methods for Solving Semidefinite Programs | Jianhui Chen, Tianbao Yang, Shenghuo Zhu | We propose a low-rank stochastic gradient descent (LR-SGD) method for solving a class of semidefinite programming (SDP) problems. |

16 | Characterizing EVOI-Sufficient k-Response Query Sets in Decision Problems | Robert Cohn, Satinder Singh, Edmund Durfee | When the only constraint on what queries can be asked is that they have exactly k possible responses (with k \ge 2), we show that the set of k-response decision queries (which ask the user to select his/her preferred decision given a choice of k decisions) is EVOI-Sufficient, meaning that no single k-response query can have higher EVOI than the best single k-response decision query for any decision problem. |

17 | Doubly Aggressive Selective Sampling Algorithms for Classification | Koby Crammer | We introduce two stochastic linear algorithms and analyze them in the worst-case mistake-bound framework. |

18 | Sparse Bayesian Variable Selection for the Identification of Antigenic Variability in the Foot-and-Mouth Disease Virus | Vinny Davies, Richard Reeve, William Harvey, Francois Maree, Dirk Husmeier | Here we describe a novel sparse Bayesian variable selection model using spike and slab priors which is able to predict antigenic variability and identify sites which are important for the neutralisation of the virus. |

19 | Sparsity and the Truncated $l^2$-norm | Lee Dicker | In this paper, we study an alternative measure of sparsity, the truncated $l^2$-norm, which is related to other $l^p$-norms, but appears to have some unique and useful properties. |

20 | Efficient Distributed Topic Modeling with Provable Guarantees | Weicong Ding, Mohammad Rohban, Prakash Ishwar, Venkatesh Saligrama | We consider topic modeling under the separability assumption and develop novel computationally efficient methods that provably achieve the statistical performance of the state-of-the-art centralized approaches while requiring insignificant communication between the distributed document collections. |

21 | Pan-sharpening with a Bayesian nonparametric dictionary learning model | Xinghao Ding, Yiyong Jiang, Yue Huang, John Paisley | We present a new pan-sharpening algorithm that uses a Bayesian nonparametric dictionary learning model to give an underlying sparse representation for image reconstruction. |

22 | Approximate Slice Sampling for Bayesian Posterior Inference | Christopher DuBois, Anoop Korattikara, Max Welling, Padhraic Smyth | In this paper, we advance the theory of large scale Bayesian posterior inference by introducing a new approximate slice sampler that uses only small mini-batches of data in every iteration. |

23 | Bayesian Logistic Gaussian Process Models for Dynamic Networks | Daniele Durante, David Dunson | Motivated by an application to studying dynamic networks among sports teams, we propose a Bayesian nonparametric model. |

24 | Avoiding pathologies in very deep networks | David Duvenaud, Oren Rippel, Ryan Adams, Zoubin Ghahramani | We propose an alternate network architecture which does not suffer from this pathology. |

25 | Efficient Inference for Complex Queries on Complex Distributions | Lili Dworkin, Michael Kearns, Lirong Xia | We consider problems of approximate inference in which the query of interest is given by a complex formula (such as a formula in disjunctive formal form (DNF)) over a joint distribution given by a graphical model. |

26 | Bayesian Switching Interaction Analysis Under Uncertainty | Zoran Dzunic, John Fisher III | We introduce a Bayesian discrete-time framework for switching-interaction analysis under uncertainty, in which latent interactions, switching pattern and signal states and dynamics are inferred from noisy (and possibly missing) observations of these signals. |

27 | Robust learning of inhomogeneous PMMs | Ralf Eggeling, Teemu Roos, Petri Myllym�ki, Ivo Grosse | In this work, we empirically investigate the performance of robust alternatives for structure and parameter learning that extend the practical applicability of inhomogeneous parsimonious Markov models to more complex settings than before. |

28 | Fully-Automatic Bayesian Piecewise Sparse Linear Models | Riki Eto, Ryohei Fujimaki, Satoshi Morinaga, Hiroshi Tamano | Our contributions are mainly three-fold. |

29 | Learning with Maximum A-Posteriori Perturbation Models | Andreea Gane, Tamir Hazan, Tommi Jaakkola | In this paper, we analyze, extend and seek to estimate such dependencies from data. |

30 | Sketching the Support of a Probability Measure | Joachim Giesen, Soeren Laue, Lars Kuehne | Here we propose to sketch the support of the probability measure (that does not need to be a manifold) by some gradient flow complex, or more precisely by its Hasse diagram. |

31 | Robust Stochastic Principal Component Analysis | John Goes, Teng Zhang, Raman Arora, Gilad Lerman | We introduce three novel stochastic approximation algorithms for robust PCA that are extensions of standard algorithms for PCA – the stochastic power method, incremental PCA and online PCA using matrix-exponentiated-gradient (MEG) updates. |

32 | Bayesian Nonparametric Poisson Factorization for Recommendation Systems | Prem Gopalan, Francisco J. Ruiz, Rajesh Ranganath, David Blei | We develop a Bayesian nonparametric Poisson factorization model for recommendation systems. |

33 | Efficiently Enforcing Diversity in Multi-Output Structured Prediction | Abner Guzman-Rivera, Pushmeet Kohli, Dhruv Batra, Rob Rutenbar | This paper proposes a novel method for efficiently generating multiple diverse predictions for structured prediction problems. |

34 | Learning and Evaluation in Presence of Non-i.i.d. Label Noise | Nico G�rnitz, Anne Porbadnigk, Alexander Binder, Claudia Sannelli, Mikio Braun, Klaus-Robert Mueller, Marius Kloft | In this paper, we present a novel methodology for learning and evaluation in presence of systematic label noise. |

35 | Analytic Long-Term Forecasting with Periodic Gaussian Processes | Nooshin HajiGhassemi, Marc Deisenroth | Gaussian processes are a state-of-the-art method for learning models from data. |

36 | On Estimating Causal Effects based on Supplemental Variables | Takahiro Hayashi, Manabu Kuroki | In this paper, we consider the situation where a treatment is associated with a response through a set of supplementary variables in both linear and discrete models. |

37 | Non-Asymptotic Analysis of Relational Learning with One Network | Peng He, Changshui Zhang | We propose a novel combinational approach to analyze complex dependencies of relational data, which is crucial to our non-asymptotic analysis. |

38 | Exploiting the Limits of Structure Learning via Inherent Symmetry | Peng He, Changshui Zhang | This theoretical paper is concerned with the structure learning limit for Gaussian Markov random fields from i.i.d. samples. |

39 | A Statistical Model for Event Sequence Data | Kevin Heins, Hal Stern | In this paper, we consider a general probabilistic framework for identifying such patterns, by distinguishing between events that belong to a pattern and events that occur as part of background processes. |

40 | Probabilistic Solutions to Differential Equations and their Application to Riemannian Statistics | Philipp Hennig, S�ren Hauberg | We study a probabilistic numerical method for the solution of both boundary and initial value problems that returns a joint Gaussian process posterior over the solution. |

41 | Tilted Variational Bayes | James Hensman, Max Zwiessele, Neil Lawrence | We present a novel method for approximate inference. |

42 | On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning | Matthew Hoffman, Bobak Shahriari, Nando Freitas | We introduce a Bayesian approach for this problem and show that it empirically outperforms both the existing frequentist counterpart and other Bayesian optimization methods. |

43 | Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors | Junya Honda, Akimichi Takemura | In this paper we discuss the optimality of TS for the model of normal distributions with unknown means and variances as one of the most fundamental examples of multiparameter models. |

44 | Tight Bounds for the Expected Risk of Linear Classifiers and PAC-Bayes Finite-Sample Guarantees | Jean Honorio, Tommi Jaakkola | We analyze the expected risk of linear classifiers for a fixed weight vector in the “minimax” setting. |

45 | Latent Gaussian Models for Topic Modeling | Changwei Hu, Eunsu Ryu, David Carlson, Yingjian Wang, Lawrence Carin | A new approach is proposed for topic modeling, in which the latent matrix factorization employs Gaussian priors, rather than the Dirichlet-class priors widely used in such models. |

46 | A Finite-Sample Generalization Bound for Semiparametric Regression: Partially Linear Models | Ruitong Huang, Csaba Szepesvari | In this paper we provide generalization bounds for semiparametric regression with the so-called partially linear models where the regression function is written as the sum of a linear parametric and a nonlinear, nonparametric function, the latter taken from a some set \mathcalH with finite entropy-integral. |

47 | Global Optimization Methods for Extended Fisher Discriminant Analysis | Satoru Iwata, Yuji Nakatsukasa, Akiko Takeda | A parametrized extension, which we call the extended FDA, has been introduced from the viewpoint of robust optimization. |

48 | High-Dimensional Density Ratio Estimation with Extensions to Approximate Likelihood Computation | Rafael Izbicki, Ann Lee, Chad Schafer | Here we propose a simple-to-implement, fully nonparametric density ratio estimator that expands the ratio in terms of the eigenfunctions of a kernel-based operator; these functions reflect the underlying geometry of the data (e.g., submanifold structure), often leading to better estimates without an explicit dimension reduction step. |

49 | Near Optimal Bayesian Active Learning for Decision Making | Shervin Javdani, Yuxin Chen, Amin Karbasi, Andreas Krause, Drew Bagnell, Siddhartha Srinivasa | Our goal is to drive uncertainty into a single decision region as quickly as possible. |

50 | A Level-set Hit-and-run Sampler for Quasi-Concave Distributions | Shane Jensen, Dean Foster | We develop a new sampling strategy that uses the hit-and-run algorithm within level sets of a target density. |

51 | New Bounds on Compressive Linear Least Squares Regression | Ata Kaban | In this paper we provide a new analysis of compressive least squares regression that removes a spurious log N factor from previous bounds, where N is the number of training points. |

52 | Recovering Distributions from Gaussian RKHS Embeddings | Motonobu Kanagawa, Kenji Fukumizu | In this paper, we consider the recovery of the information of a distribution from an estimate of the kernel mean, when a Gaussian kernel is used. |

53 | Collaborative Ranking for Local Preferences | Berk Kapicioglu, David Rosenberg, Robert Schapire, Tony Jebara | To address this, we introduce a matrix factorization framework called Collaborative Local Ranking (CLR). |

54 | Scalable Collaborative Bayesian Preference Learning | Mohammad Emtiyaz Khan, Young Jun Ko, Matthias Seeger | To simplify the difficulty, we present a novel expectation maximization algorithm, driven by expectation propagation approximate inference, which scales to very large datasets without requiring strong factorization assumptions. |

55 | A Gaussian Latent Variable Model for Large Margin Classification of Labeled and Unlabeled Data | Do-kyum Kim, Matthew Der, Lawrence Saul | We investigate a Gaussian latent variable model for semi-supervised learning of linear large margin classifiers. |

56 | Scalable Variational Bayesian Matrix Factorization with Side Information | Yong-Deok Kim, Seungjin Choi | In this paper, we present a scalable inference for VBMF with side information, the complexity of which is linear in the rank K of factor matrices. |

57 | Algebraic Reconstruction Bounds and Explicit Inversion for Phase Retrieval at the Identifiability Threshold | Franz Kir�ly, Martin Ehler | We study phase retrieval from magnitude measurements of an unknown signal as an algebraic estimation problem. |

58 | Visual Boundary Prediction: A Deep Neural Prediction Network and Quality Dissection | Jyri Kivinen, Chris Williams, Nicolas Heess | This paper investigates visual boundary detection, i.e. prediction of the presence of a boundary at a given image location. |

59 | Low-Rank Spectral Learning | Alex Kulesza, N. Raj Rao, Satinder Singh | Spectral learning methods have recently been proposed as alternatives to slow, non-convex optimization algorithms like EM for a variety of probabilistic models in which hidden information must be inferred by the learner. |

60 | Fugue: Slow-Worker-Agnostic Distributed Learning for Big Models on Big Data | Abhimanu Kumar, Alex Beutel, Qirong Ho, Eric Xing | We present a scheme for fast, distributed learning on big (i.e. high-dimensional) models applied to big datasets. |

61 | Computational Education using Latent Structured Prediction | Tanja K�ser, Alexander Schwing, Tamir Hazan, Markus Gross | For interpretability we propose to constrain the parameter space a-priori by leveraging domain knowledge. |

62 | Towards building a Crowd-Sourced Sky Map | Dustin Lang, David Hogg, Bernhard Sch�lkopf | We describe a system that builds a high dynamic-range and wide-angle image of the night sky by combining a large set of input images. |

63 | Incremental Tree-Based Inference with Dependent Normalized Random Measures | Juho Lee, Seungjin Choi | In this paper, we present a tree-based inference method for MNRM mixture models, extending Bayesian hierarchical clustering (BHC) which was originally developed as a deterministic approximate inference for Dirichlet process mixture (DPM) models. |

64 | Jointly Informative Feature Selection | Leonidas Lefakis, Francois Fleuret | We propose several novel criteria for the selection of groups of jointly informative continuous features in the context of classification. |

65 | Learning Heterogeneous Hidden Markov Random Fields | Jie Liu, Chunming Zhang, Elizabeth Burnside, David Page | We formally define heterogeneous HMRFs and propose an EM algorithm whose M-step combines a contrastive divergence learner with a kernel smoothing step to incorporate the background knowledge. |

66 | PAC-Bayesian Collective Stability | Ben London, Bert Huang, Ben Taskar, Lise Getoor | We investigate whether weaker definitions of collective stability suffice. |

67 | Active Area Search via Bayesian Quadrature | Yifei Ma, Roman Garnett, Jeff Schneider | In this paper we combine these two concepts and consider the problem of choosing data collection locations when the goal is to identify regions whose set of collected data would be labeled positively by a set classifier. |

68 | Active Boundary Annotation using Random MAP Perturbations | Subhransu Maji, Tamir Hazan, Tommi Jaakkola | As an example of our framework we propose a boundary refinement task which can used to obtain pixel-accurate image boundaries much faster than traditional tools by focussing on parts of the image for refinement in a multi-scale manner. |

69 | Interpretable Sparse High-Order Boltzmann Machines | Martin Renqiang Min, Xia Ning, Chao Cheng, Mark Gerstein | In this paper, we propose an efficient approach for learning a fully observable high-order Boltzmann Machine based on sparse learning and contrastive divergence, resulting in an interpretable Sparse High-order Boltzmann Machine, denoted as SHBM. |

70 | Efficient Lifting of MAP LP Relaxations Using k-Locality | Martin Mladenov, Kristian Kersting, Amir Globerson | Such models often exhibit considerable symmetry, and it is a challenge to devise algorithms that exploit this symmetry to speed up inference. |

71 | A Geometric Algorithm for Scalable Multiple Kernel Learning | John Moeller, Parasaran Raman, Suresh Venkatasubramanian, Avishek Saha | We present a geometric formulation of the Multiple Kernel Learning (MKL) problem. |

72 | On the Testability of Models with Missing Data | Karthika Mohan, Judea Pearl | We present sufficient conditions for testability in missing data applications and note the impediments for testability when data are contaminated by missing entries. |

73 | Selective Sampling with Drift | Edward Moroshko, Koby Crammer | We develop a novel selective sampling algorithm for the drifting setting, analyze it under no assumptions on the mechanism generating the sequence of instances, and derive new mistake bounds that depend on the amount of drift in the problem. |

74 | The Dependent Dirichlet Process Mixture of Objects for Detection-free Tracking and Object Modeling | Willie Neiswanger, Frank Wood, Eric Xing | We present a model that localizes objects via unsupervised tracking while learning a representation of each object, avoiding the need for pre-built detectors. |

75 | Bias Reduction and Metric Learning for Nearest-Neighbor Estimation of Kullback-Leibler Divergence | Yung-Kyun Noh, Masashi Sugiyama, Song Liu, Marthinus C. Plessis, Frank Chongwoo Park, Daniel D. Lee | In this paper, we show that this non-local bias can be mitigated by changing the distance metric, and we propose a method for learning an optimal Mahalanobis-type metric based on global information provided by approximate parametric models of the underlying densities. |

76 | Robust Forward Algorithms via PAC-Bayes and Laplace Distributions | Asaf Noy, Koby Crammer | We introduce new learning algorithms that minimize objectives derived directly from PAC-Bayes bounds, incorporating Laplace distributions. |

77 | Joint Structure Learning of Multiple Non-Exchangeable Networks | Chris Oates, Sach Mukherjee | Here we present a novel Bayesian formulation that generalises joint structure learning beyond the exchangeable case. |

78 | Scaling Nonparametric Bayesian Inference via Subsample-Annealing | Fritz Obermeyer, Jonathan Glidden, Eric Jonas | We describe an adaptation of the simulated annealing algorithm to nonparametric clustering and related probabilistic models. |

79 | Fast Distribution To Real Regression | Junier Oliva, Willie Neiswanger, Barnabas Poczos, Jeff Schneider, Eric Xing | We study the problem of distribution to real regression, where one aims to regress a mapping f that takes in a distribution input covariate P∈\mathcalI (for a non-parametric family of distributions \mathcalI) and outputs a real-valued response Y=f(P) + ε. |

80 | FuSSO: Functional Shrinkage and Selection Operator | Junier Oliva, Barnabas Poczos, Timothy Verstynen, Aarti Singh, Jeff Schneider, Fang-Cheng Yeh, Wen-Yih Tseng | We present the FuSSO, a functional analogue to the LASSO, that efficiently finds a sparse set of functional input covariates to regress a real-valued response against. |

81 | To go deep or wide in learning? | Gaurav Pandey, Ambedkar Dukkipati | In this paper, we propose an approach called wide learning based on arc-cosine kernels, that learns a single layer of infinite width. |

82 | LAMORE: A Stable, Scalable Approach to Latent Vector Autoregressive Modeling of Categorical Time Series | Yubin Park, Carlos Carvalho, Joydeep Ghosh | This paper proposes two auxiliary techniques that help stabilize and calibrate the estimated parameters. |

83 | Spoofing Large Probability Mass Functions to Improve Sampling Times and Reduce Memory Costs | Jon Parker, Hans Engler | This paper presents a novel lossy compression method intended for large (O(10^5)) dense PMFs that speeds up the sampling process and guarantees high fidelity sampling. |

84 | Learning Bounded Tree-width Bayesian Networks using Integer Linear Programming | Pekka Parviainen, Hossein Shahrabi Farahani, Jens Lagergren | Since the inference problem is common in many application areas, we provide a practical algorithm for learning bounded tree-width Bayesian networks. |

85 | An Efficient Algorithm for Large Scale Compressive Feature Learning | Hristo Paskov, John Mitchell, Trevor Hastie | This paper focuses on large-scale unsupervised feature selection from text. |

86 | Expectation Propagation for Likelihoods Depending on an Inner Product of Two Multivariate Random Variables | Tomi Peltola, Pasi Jyl�nki, Aki Vehtari | We describe how a deterministic Gaussian posterior approximation can be constructed using expectation propagation (EP) for models, where the likelihood function depends on an inner product of two multivariate random variables. |

87 | An inclusion optimal algorithm for chain graph structure learning | Jose Pe�a, Dag Sonntag, Jens Nielsen | This paper presents and proves an extension of Meek’s conjecture to chain graphs under the Lauritzen-Wermuth-Frydenberg interpretation. |

88 | A Stepwise uncertainty reduction approach to constrained global optimization | Victor Picheny | We propose here a new optimization strategy based on the stepwise uncertainty reduction paradigm, which offers an efficient trade-off between exploration and local search near the boundaries. |

89 | Connected Sub-graph Detection | Jing Qian, Venkatesh Saligrama, Yuting Chen | For concreteness we consider the connected sub-graph detection problem that arises in a number of applications including network intrusion, disease outbreaks, and video surveillance. |

90 | An Analysis of Active Learning with Uniform Feature Noise | Aaditya Ramdas, Barnabas Poczos, Aarti Singh, Larry Wasserman | In this paper, we consider the effect of feature noise in active learning, which could arise either because X itself is being measured, or it is corrupted in transmission to the oracle, or the oracle returns the label of a noisy version of the query point. |

91 | Black Box Variational Inference | Rajesh Ranganath, Sean Gerrish, David Blei | In this paper, we present a “black box” variational inference algorithm, one that can be quickly applied to many models with little additional derivation. |

92 | Cluster Canonical Correlation Analysis | Nikhil Rasiwasia, Dhruv Mahajan, Vijay Mahadevan, Gaurav Aggarwal | In this paper we present cluster canonical correlation analysis (cluster-CCA) for joint dimensionality reduction of two sets of data points. |

93 | Sequential crowdsourced labeling as an epsilon-greedy exploration in a Markov Decision Process | Vikas Raykar, Priyanka Agrawal | With the goal of reducing the labeling cost, we introduce the notion of sequential crowdsourced labeling, where instead of asking for all the labels in one shot we acquire labels from annotators sequentially one at a time. |

94 | Learning Structured Models with the AUC Loss and Its Generalizations | Nir Rosenfeld, Ofer Meshi, Danny Tarlow, Amir Globerson | In this work, we propose a representation and learning formulation for optimizing structured models over the AUC loss, show how our approach generalizes the unstructured case, and provide algorithms for solving the resulting inference and learning problems. |

95 | Class Proportion Estimation with Application to Multiclass Anomaly Rejection | Tyler Sanderson, Clayton Scott | This work addresses two classification problems that fall under the heading of domain adaptation, wherein the distributions of training and testing examples differ. |

96 | Lifted MAP Inference for Markov Logic Networks | Somdeb Sarkhel, Deepak Venugopal, Parag Singla, Vibhav Gogate | In this paper, we present a new approach for lifted MAP inference in Markov Logic Networks (MLNs). |

97 | Estimating Dependency Structures for non-Gaussian Components with Linear and Energy Correlations | Hiroaki Sasaki, Michael Gutmann, Hayaru Shouno, Aapo Hyvarinen | In this paper, we propose a probabilistic model of non-Gaussian components which are allowed to have both linear and energy correlations. |

98 | Student-t Processes as Alternatives to Gaussian Processes | Amar Shah, Andrew Wilson, Zoubin Ghahramani | We investigate the Student-t process as an alternative to the Gaussian process as a nonparametric prior over functions. |

99 | In Defense of Minhash over Simhash | Anshumali Shrivastava, Ping Li | In this study, we provide a theoretical answer (validated by experiments) that MinHash virtually always outperforms SimHash when the data are binary, as common in practice such as search. |

100 | Loopy Belief Propagation in the Presence of Determinism | David Smith, Vibhav Gogate | In this paper, we propose a new method for remedying this problem. |

101 | Explicit Link Between Periodic Covariance Functions and State Space Models | Arno Solin, Simo S�rkk� | This paper shows how periodic covariance functions in Gaussian process regression can be reformulated as state space models, which can be solved with classical Kalman filtering theory. |

102 | Bat Call Identification with Gaussian Process Multinomial Probit Regression and a Dynamic Time Warping Kernel | Vassilios Stathopoulos, Veronica Zamora-Gutierrez, Kate Jones, Mark Girolami | We study the problem of identifying bat species from echolocation calls in order to build automated bioacoustic monitoring algorithms. |

103 | SMERED: A Bayesian Approach to Graphical Record Linkage and De-duplication | Rebecca Steorts, Rob Hall, Stephen Fienberg | We propose a novel unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. |

104 | Adaptive Variable Clustering in Gaussian Graphical Models | Siqi Sun, Yuancheng Zhu, Jinbo Xu | We present a novel nonparametric Bayesian generative model for such a block-structured GGM and an efficient inference algorithm to find the clustering of variables in this GGM by combining a Gibbs sampler and a split-merge Metropolis-Hastings algorithm. |

105 | Scaling Graph-based Semi Supervised Learning to Large Number of Labels Using Count-Min Sketch | Partha Talukdar, William Cohen | In this paper, we propose MAD-Sketch, a novel graph-based SSL algorithm which compactly stores label distribution on each node using Count-min Sketch, a randomized data structure. |

106 | Path Thresholding: Asymptotically Tuning-Free High-Dimensional Sparse Regression | Divyanshu Vats, Richard Baraniuk | In this paper, we address the challenging problem of selecting tuning parameters for high-dimensional sparse regression. |

107 | Active Learning for Undirected Graphical Model Selection | Divyanshu Vats, Robert Nowak, Richard Baraniuk | We propose an active learning algorithm that uses junction tree representations to adapt future measurements based on the information gathered from prior measurements. |

108 | Linear-time training of nonlinear low-dimensional embeddings | Max Vladymyrov, Miguel Carreira-Perpinan | We address this bottleneck by formulating the optimization as an N-body problem and using fast multipole methods (FMMs) to approximate the gradient in linear time. |

109 | Gaussian Copula Precision Estimation with Missing Values | Huahua Wang, Farideh Fazayeli, Soumyadeep Chatterjee, Arindam Banerjee | In this paper, we propose double plugin Gaussian (DoPinG) copula estimators to estimate the sparse precision matrix corresponding to \emphnon-paranormal distributions. |

110 | An LP for Sequential Learning Under Budgets | Joseph Wang, Kirill Trapeznikov, Venkatesh Saligrama | We present a convex framework to learn sequential decisions and apply this to the problem of learning under a budget. |

111 | Efficient Algorithms and Error Analysis for the Modified Nystrom Method | Shusen Wang, Zhihua Zhang | In this paper, we propose two algorithms that make the modified Nyström method practical. |

112 | Bayesian Multi-Scale Optimistic Optimization | Ziyu Wang, Babak Shakibi, Lin Jin, Nando Freitas | In this paper, we introduce a new technique for efficient global optimization that combines Gaussian process confidence bounds and treed simultaneous optimistic optimization to eliminate the need for auxiliary optimization of acquisition functions. |

113 | Accelerating ABC methods using Gaussian processes | Richard Wilkinson | We introduce Gaussian process (GP) accelerated ABC, which we show can significantly reduce the number of simulations required. |

114 | A New Approach to Probabilistic Programming Inference | Frank Wood, Jan Willem Meent, Vikash Mansinghka | We introduce and demonstrate a new approach to inference in expressive probabilistic programming languages based on particle Markov chain Monte Carlo. |

115 | Dynamic Resource Allocation for Optimizing Population Diffusion | Shan Xue, Alan Fern, Daniel Sheldon | The main contribution of this paper is to design and evaluate an online planner for this problem based on Hindsight Optimization (HOP), a technique that has shown promise in other stochastic planning problems. |

116 | Mixed Graphical Models via Exponential Families | Eunho Yang, Yulia Baker, Pradeep Ravikumar, Genevera Allen, Zhandong Liu | We study several instances of our model, and propose scalable M-estimators for recovering the underlying network structure. |

117 | Context Aware Group Nearest Shrunken Centroids in Large-Scale Genomic Studies | Juemin Yang, Fang Han, Rafael Irizarry, Han Liu | We have devised an approach to phenotype classification from gene expression profiling. |

118 | Nonparametric estimation and testing of exchangeable graph models | Justin Yang, Christina Han, Edoardo Airoldi | We propose a 3-step procedure to estimate the canonical graphon of any ExGM that satisfies these conditions. |

119 | Generating Efficient MCMC Kernels from Probabilistic Programs | Lingfeng Yang, Patrick Hanrahan, Noah Goodman | We present a technique that recovers hand-coded levels of performance from a universal probabilistic language, for the Metropolis-Hastings (MH) MCMC inference algorithm. |

120 | Efficient Transfer Learning Method for Automatic Hyperparameter Tuning | Dani Yogatama, Gideon Mann | We propose a fast and effective algorithm for automatic hyperparameter tuning that can generalize across datasets. |

121 | Accelerated Stochastic Gradient Method for Composite Regularization | Wenliang Zhong, James Kwok | In this paper, we propose a novel extension with accelerated gradient method for stochastic optimization. |

122 | Heterogeneous Domain Adaptation for Multiple Classes | Joey Tianyi Zhou, Ivor W.Tsang, Sinno Jialin Pan, Mingkui Tan | In this paper, we present an efficient Multi-class Heterogeneous Domain Adaptation (HDA) method, where data from the source and target domains are represented by heterogeneous features with different dimensions. |