Paper Digest: WWW 2015 Highlights
The Web Conference (WWW) is one of the top internet conferences in the world.
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
team@paperdigest.org
TABLE 1: WWW 2015 Papers
Title | Authors | Highlight | |
---|---|---|---|
1 | Optimizing Display Advertising in Online Social Networks | Zeinab Abbassi, Aditya Bhaskara, Vishal Misra | In this work, we propose formal probabilistic models to capture this phenomenon, and study the algorithmic problem that then arises. |
2 | Frankenplace: Interactive Thematic Mapping for Ad Hoc Exploratory Search | Benjamin Adams, Grant McKenzie, Mark Gahegan | In this paper we describe the architecture of an interactive thematic map search engine, Frankenplace, designed to facilitate document exploration at the intersection of theme and place. |
3 | Towards Reconciling SPARQL and Certain Answers | Shqiponja Ahmetaj, Wolfgang Fischl, Reinhard Pichler, Mantas Šimkus, Sebastian Skritek | For OWL 2 QL entailment, we present algorithms for the evaluation of an interesting fragment of SPARQL (the so-called well-designed SPARQL). |
4 | Donor Retention in Online Crowdfunding Communities: A Case Study of DonorsChoose.org | Tim Althoff, Jure Leskovec | We present a large-scale study of millions of donors and donations on DonorsChoose.org, a crowdfunding platform for education projects. |
5 | Budget-Constrained Item Cold-Start Handling in Collaborative Filtering Recommenders via Optimal Design | Oren Anava, Shahar Golan, Nadav Golbandi, Zohar Karnin, Ronny Lempel, Oleg Rokhlenko, Oren Somekh | We formalize this problem as an optimization problem: given a new item, a pool of available users, and a budget constraint, select which users to assign with the task of rating the new item in order to minimize the prediction error of our model. |
6 | Improved Theoretical and Practical Guarantees for Chromatic Correlation Clustering | Yael Anava, Noa Avigdor-Elgrabli, Iftah Gamzu | Our main contribution is a fast and easy-to-implement constant approximation framework for the problem, which builds on a novel reduction of the problem to that of correlation clustering. |
7 | Global Diffusion via Cascading Invitations: Structure, Growth, and Homophily | Ashton Anderson, Daniel Huttenlocher, Jon Kleinberg, Jure Leskovec, Mitul Tiwari | In this paper, we study the diffusion of LinkedIn, an online professional network comprising over 332 million members, a large fraction of whom joined the site as part of a signup cascade. |
8 | Recommendation Subgraphs for Web Discovery | Arda Antikacioglu, R. Ravi, Srinath Sridhar | We formalize the concept of recommendations used for discovery as a natural graph optimization problem on a bipartite graph and propose three methods for solving the problem in increasing order of sophistication: a local random sampling algorithm, a greedy algorithm and a more involved partitioning based algorithm. |
9 | Is Sniping A Problem For Online Auction Markets? | Matt Backus, Thomas Blake, Dimitriy V. Masterov, Steven Tadelis | We show the effect to be causal using a carefully selected subset of auctions from eBay.com and instrumental variables estimation strategy. |
10 | Essential Web Pages Are Easy to Find | Ricardo Baeza-Yates, Paolo Boldi, Flavio Chierichetti | In this paper we address the problem of estimating the index size needed by web search engines to answer as many queries as possible by exploiting the marked difference between query and click frequencies. |
11 | Design and Analysis of Benchmarking Experiments for Distributed Internet Services | Eytan Bakshy, Eitan Frachtenberg | We develop statistical models of distributed Internet service performance based on data from Perflab, a production system used at Facebook which vets thousands of changes to the company’s codebase each day. |
12 | ACCAMS: Additive Co-Clustering to Approximate Matrices Succinctly | Alex Beutel, Amr Ahmed, Alexander J. Smola | Instead of using low-rank factorization we take a drastically different approach, based on the simple insight that an additive model of co-clusterings allows one to approximate matrices efficiently. |
13 | Who, What, When, and Where: Multi-Dimensional Collaborative Recommendations Using Tensor Factorization on Sparse User-Generated Data | Preeti Bhargava, Thomas Phan, Jiayu Zhou, Juhan Lee | In this paper, we present a system and an approach for performing multi-dimensional collaborative recommendations for Who (User), What (Activity), When (Time) and Where (Location), using tensor factorization on sparse user-generated data. |
14 | Secrets, Lies, and Account Recovery: Lessons from the Use of Personal Knowledge Questions at Google | Joseph Bonneau, Elie Bursztein, Ilan Caron, Rob Jackson, Mike Williamson | We examine the first large real-world data set on personal knowledge question’s security and memorability from their deployment at Google. |
15 | Supporting Ethical Web Research: A New Research Ethics Review | Anne Bowser, Janice Y. Tsai | We describe the creation of a new ethics framework and a research ethics submission system (RESS) within Microsoft Research (MSR). |
16 | Sequential Hypothesis Tests for Adaptive Locality Sensitive Hashing | Aniket Chakrabarti, Srinivasan Parthasarathy | In this work we revisit the LSH problem from a Frequentist setting and formulate sequential tests for composite hypothesis (similarity greater than or less than threshold) that can be leveraged by such LSH algorithms for adaptively pruning candidates aggressively. |
17 | Opinion Spam Detection in Web Forum: A Real Case Study | Yu-Ren Chen, Hsin-Hsi Chen | In this paper, we conduct a real case study based on a set of internal records of opinion spams leaked from a shady marketing campaign. |
18 | Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking | Gong Cheng, Danyun Xu, Yuzhong Qu | To avoid overloading human users with too much information and help them more efficiently choose an entity from candidates, we aim to substitute entire entity descriptions with compact, equally effective structured summaries that are automatically generated. |
19 | Semantic Tagging of Mathematical Expressions | Pao-Yu Chien, Pu-Jen Cheng | In this work, we propose a novel STME approach that relies on neither text along with expressions, nor labelled training data. To evaluate our system, we build large-scale training and test datasets automatically from a public math forum. |
20 | Collaborative Ranking with a Push at the Top | Konstantina Christakopoulou, Arindam Banerjee | We consider three specific formulations, based on collaborative p-norm push, infinite push, and reverse-height push, and propose efficient optimization methods for learning these models. |
21 | Parallel Streaming Signature EM-tree: A Clustering Algorithm for Web Scale Applications | Christopher Michael De Vries, Lance De Vine, Shlomo Geva, Richi Nayak | We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. |
22 | Network-based Origin Confusion Attacks against HTTPS Virtual Hosting | Antoine Delignat-Lavaud, Karthikeyan Bhargavan | We present evidence that such vulnerable virtual host configurations are widespread, even on the most popular and security-scrutinized websites, thus allowing a network adversary to hijack pages, or steal secure cookies and single sign-on tokens. |
23 | The Dynamics of Micro-Task Crowdsourcing: The Case of Amazon MTurk | Djellel Eddine Difallah, Michele Catasta, Gianluca Demartini, Panagiotis G. Ipeirotis, Philippe Cudré-Mauroux | In this paper, we adopt a data-driven approach to (A) perform a long-term analysis of a popular micro-task crowdsourcing platform and understand the evolution of its main actors (workers, requesters, and platform). |
24 | Hierarchical Neural Language Models for Joint Representation of Streaming Documents and their Content | Nemanja Djuric, Hao Wu, Vladan Radosavljevic, Mihajlo Grbovic, Narayan Bhamidipati | We consider the problem of learning distributed representations for documents in data streams. |
25 | Future User Engagement Prediction and Its Application to Improve the Sensitivity of Online Experiments | Alexey Drutsa, Gleb Gusev, Pavel Serdyukov | We propose a novel approach to improve the sensitivity of user engagement metrics (that are widely used in A/B tests) by utilizing prediction of the future behavior of an individual user. |
26 | Enriching Structured Knowledge with Open Information | Arnab Dutta, Christian Meilicke, Heiner Stuckenschmidt | We propose an approach for semantifying web extracted facts. |
27 | A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems | Ali Mamdouh Elkahky, Yang Song, Xiaodong He | In this work, we propose a content-based recommendation system to address both the recommendation quality and the system scalability. |
28 | Cookies That Give You Away: The Surveillance Implications of Web Tracking | Steven Englehardt, Dillon Reisman, Christian Eubank, Peter Zimmerman, Jonathan Mayer, Arvind Narayanan, Edward W. Felten | To evaluate the effectiveness of our attack, we introduce a methodology that combines web measurement and network measurement. |
29 | Efficient Densest Subgraph Computation in Evolving Graphs | Alessandro Epasto, Silvio Lattanzi, Mauro Sozio | We study the densest subgraph problem in the the dynamic graph model, for which we present the first scalable algorithm with provable guarantees. |
30 | A Practical Framework for Privacy-Preserving Data Analytics | Liyue Fan, Hongxia Jin | In this paper, we propose a practical framework for data analytics, while providing differential privacy guarantees to individual data contributors. |
31 | Compressed Indexes for String Searching in Labeled Graphs | Paolo Ferragina, Francesco Piccinno, Rossano Venturini | This paper takes inspiration from the Facebook Unicorn’s platform and proposes some compressed-indexing schemes for large graphs whose nodes are labeled with strings of variable length – i.e., node’s attributes such as user’s (nick-)name – that support sophisticated search operations which involve both the linked structure of the graph and the string content of its nodes. |
32 | Improving Paid Microtasks through Gamification and Adaptive Furtherance Incentives | Oluwaseyi Feyisetan, Elena Simperl, Max Van Kleek, Nigel Shadbolt | Following these initial insights, we define a predictive model for estimating the most appropriate incentives for individual workers, based on their previous contributions. |
33 | Tagging Personal Photos with Transfer Deep Learning | Jianlong Fu, Tao Mei, Kuiyuan Yang, Hanqing Lu, Yong Rui | To deal with these challenges, in this paper, we present a novel transfer deep learning approach to tag personal photos. |
34 | MobInsight: On Improving The Performance of Mobile Apps in Cellular Networks | Vijay Gabale, Dilip Krishnaswamy | In this work, we perform a systematic measurement study of more than 50 popular apps and 2 cellular networks, and discover that while cellular networks have predictable latency, it is the path between exit points of cellular networks (e.g., GGSN) and cloud-servers that degrades apps performance. |
35 | Rethinking Security of Web-Based System Applications | Martin Georgiev, Suman Jana, Vitaly Shmatikov | We show that the access-control models of these platforms are (a) incompatible and (b) prone to unintended delegation of native-access rights: when applications request native access for their own code, they unintentionally enable it for untrusted third-party code, too. |
36 | Cardinal Contests | Arpita Ghosh, Patrick Hummel | We model and analyze cardinal contests, where a principal running a rank-order tournament has access to an absolute measure of the qualities of agents’ submissions in addition to their relative rankings, and ask how modifying the rank-order tournament to incorporate cardinal information can improve incentives for effort. |
37 | Accessible On-Line Floor Plans | Cagatay Goncu, Anuradha Madugalla, Simone Marinai, Kim Marriott | We present a new model for accessible presentation of on-line information graphics and demonstrate its use for presenting floor plans. |
38 | Network A/B Testing: From Sampling to Estimation | Huan Gui, Ya Xu, Anmol Bhasin, Jiawei Han | In this paper, we study the problem of network A/B testing in real networks, which have substantially different characteristics from the simulated random networks studied in previous works. |
39 | User Session Identification Based on Strong Regularities in Inter-activity Time | Aaron Halfaker, Os Keyes, Daniel Kluver, Jacob Thebault-Spieker, Tien Nguyen, Kenneth Shores, Anuradha Uduwage, Morten Warncke-Wang | In this work, we demonstrate a strong regularity in the temporal rhythms of user initiated events across several different domains of online activity (incl. |
40 | Incentivizing High Quality Crowdwork | Chien-Ju Ho, Aleksandrs Slivkins, Siddharth Suri, Jennifer Wortman Vaughan | We study the causal effects of financial incentives on the quality of crowdwork. |
41 | Skolemising Blank Nodes while Preserving Isomorphism | Aidan Hogan | In this paper, we propose and evaluate a scheme to produce canonical labels for blank nodes in RDF graphs. |
42 | Scalable Methods for Adaptively Seeding a Social Network | Thibaut Horel, Yaron Singer | In particular, we develop algorithms for linear influence models with provable approximation guarantees that can be gracefully parallelized. To show the effectiveness of our methods we collected data from various verticals social network users follow. |
43 | User Review Sites as a Resource for Large-Scale Sociolinguistic Studies | Dirk Hovy, Anders Johannsen, Anders Søgaard | Our research aims to remedy both problems by exploring a large new data source, international review websites with user profiles. |
44 | When Does Improved Targeting Increase Revenue? | Patrick Hummel, R. Preston McAfee | In second price auctions with symmetric bidders, we find that improved targeting via enhanced information disclosure decreases revenue when there are two bidders and increases revenue if there are at least four bidders. |
45 | Social Status and Badge Design | Nicole Immorlica, Greg Stoddard, Vasilis Syrgkanis | In this paper, we study how to design virtual incentive mechanisms that maximize total contributions to a website when users are motivated by social status. |
46 | Mapping Temporal Horizons: Analysis of Collective Future and Past related Attention in Twitter | Adam Jatowt, Émilien Antoine, Yukiko Kawai, Toyokazu Akiyama | In this work we investigate how microblogging users collectively refer to time. |
47 | Path Sampling: A Fast and Provable Method for Estimating 4-Vertex Subgraph Counts | Madhav Jha, C. Seshadhri, Ali Pinar | We provide a sampling algorithm that provably and accurately approximates the frequencies of all 4-vertex pattern subgraphs. |
48 | Automatic Online Evaluation of Intelligent Assistants | Jiepu Jiang, Ahmed Hassan Awadallah, Rosie Jones, Umut Ozertem, Imed Zitouni, Ranjitha Gurunath Kulkarni, Omar Zia Khan | We develop consistent and automatic approaches that can evaluate different tasks in voice-activated intelligent assistants. |
49 | Incorporating Social Context and Domain Knowledge for Entity Recognition | Jie Tang, Zhanpeng Fang, Jimeng Sun | In this paper, we propose the SOCINST model to formalize the problem into a probabilistic model. |
50 | Querying Web-Scale Information Networks Through Bounding Matching Scores | Jiahui Jin, Samamon Khemmarat, Lixin Gao, Junzhou Luo | In this paper, we propose an efficient algorithm for finding the best k answers for a given query without precomputing graph indices. |
51 | LN-Annote: An Alternative Approach to Information Extraction from Emails using Locally-Customized Named-Entity Recognition | YoungHoon Jung, Karl Stratos, Luca P. Carloni | To address these problems, we propose LN-Annote, a new method to extract personal information from the email that is locally available on mobile devices (without remote access to the cloud). We present an extensive set of experiment results: beside proving the feasibility of our approach, they demonstrate its efficiency in terms of the named-entity extraction performance as well as the execution speed and the energy consumption spent in mobile devices. |
52 | Describing and Understanding Neighborhood Characteristics through Online Social Media | Mohamed Kafsi, Henriette Cramer, Bart Thomee, David A. Shamma | To surface the content that is characteristic for a region, we present the geographical hierarchy model (GHM), a probabilistic model based on the assumption that data observed in a region is a random mixture of content that pertains to different levels of a hierarchy. |
53 | Active Learning for Multi-relational Data Construction | Hiroshi Kajino, Akihiro Kishimoto, Adi Botea, Elizabeth Daly, Spyros Kotoulas | In this paper, we formalize the problem of dataset construction as active learning problems and present the Active Multi-relational Data Construction (AMDC) method. |
54 | The Social World of Content Abusers in Community Question Answering | Imrul Kayes, Nicolas Kourtellis, Daniele Quercia, Adriana Iamnitchi, Francesco Bonchi | Based on our empirical observations, we build a classifier that is able to detect abusive users with an accuracy as high as 83%. |
55 | The Lifecycles of Apps in a Social Ecosystem | Isabel Kloumann, Lada Adamic, Jon Kleinberg, Shaomei Wu | In this work we address this challenge through an analysis of the collection of apps on Facebook Login, developing a novel framework for analyzing both temporal and social properties. |
56 | Getting More for Less: Optimized Crowdsourcing with Dynamic Tasks and Goals | Ari Kobren, Chun How Tan, Panagiotis Ipeirotis, Evgeniy Gabrilovich | We directly address this problem by presenting techniques that optimize the crowdsourcing process by jointly maximizing the user longevity in the system and the true value that the system derives from user participation. |
57 | Evolution of Conversations in the Age of Email Overload | Farshad Kooti, Luca Maria Aiello, Mihajlo Grbovic, Kristina Lerman, Amin Mantrach | In this paper, we report results of a large-scale study of more than 2 million users exchanging 16 billion emails over several months. |
58 | Events and Controversies: Influences of a Shocking News Event on Information Seeking | Danai Koutra, Paul N. Bennett, Eric Horvitz | We seek to identify and study information-seeking behavior and access to alternative versus reinforcing viewpoints following shocking, emotional, and large-scale news events. |
59 | Statistically Significant Detection of Linguistic Change | Vivek Kulkarni, Rami Al-Rfou, Bryan Perozzi, Steven Skiena | We propose a new computational approach for tracking and detecting statistically significant linguistic shifts in the meaning and usage of words. |
60 | Replacing the Irreplaceable: Fast Algorithms for Team Member Recommendation | Liangyue Li, Hanghang Tong, Nan Cao, Kate Ehrlich, Yu-Ru Lin, Norbou Buchler | In this paper, we study the problem of TEAM MEMBER REPLACEMENT — given a team of people embedded in a social network working on the same task, find a good candidate to best replace a team member who becomes unavailable to perform the task for certain reason (e.g., conflicts of interests or resource capacity). |
61 | Robust Group Linkage | Pei Li, Xin Luna Dong, Songtao Guo, Andrea Maurino, Divesh Srivastava | We present a robust two-stage algorithm: the first stage identifies pivots–maximal sets of records that are very likely to belong to the same group, while being robust to possible erroneous values; the second stage collects strong evidence from the pivots and leverages it for merging more records into the same group, while being tolerant to differences in local values of an attribute. |
62 | Uncovering the Small Community Structure in Large Networks: A Local Spectral Approach | Yixuan Li, Kun He, David Bindel, John E. Hopcroft | In this paper, we propose a novel approach for finding overlapping communities called LEMON (Local Expansion via Minimum One Norm). |
63 | Scalable Parallel EM Algorithms for Latent Dirichlet Allocation in Multi-Core Systems | Xiaosheng Liu, Jia Zeng, Xi Yang, Jianfeng Yan, Qiang Yang | To handle web-scale content analysis on just a single PC, we propose multi-core parallel expectation-maximization (PEM) algorithms to infer and estimate LDA parameters in shared memory systems. |
64 | Grading the Graders: Motivating Peer Graders in a MOOC | Yanxin Lu, Joe Warren, Christopher Jermaine, Swarat Chaudhuri, Scott Rixner | In this paper, we detail our efforts at creating and running a controlled study designed to examine how students in a MOOC might be motivated to do a better job during peer grading. |
65 | Measurement and Analysis of Mobile Web Cache Performance | Yun Ma, Xuanzhe Liu, Shuhui Zhang, Ruirui Xiang, Yunxin Liu, Tao Xie | To address these issues, in this paper, we present a proactive approach for a comprehensive measurement study on mobile Web cache performance. |
66 | SCULPT: A Schema Language for Tabular Data on the Web | Wim Martens, Frank Neven, Stijn Vansummeren | We present a formal model for SCULPT and obtain a linear time combined complexity evaluation algorithm. |
67 | The Web as a Jungle: Non-Linear Dynamical Systems for Co-evolving Online Activities | Yasuko Matsubara, Yasushi Sakurai, Christos Faloutsos | We present ECOWEB, (i.e., Ecosystem on the Web), which is an intuitive model designed as a non-linear dynamical system for mining large-scale co-evolving online activities. |
68 | Spanning Edge Centrality: Large-scale Computation and Applications | Charalampos Mavroforakis, Richard Garcia-Lebron, Ioannis Koutis, Evimaria Terzi | In this article we bring theory into practice, with careful and optimized implementations that allow the fast computation of spanning centrality in very large graphs with millions of nodes. |
69 | No Escape From Reality: Security and Privacy of Augmented Reality Browsers | Richard McPherson, Suman Jana, Vitaly Shmatikov | We start by analyzing the functional requirements that AR browsers must support in order to present AR content. |
70 | Discovering Meta-Paths in Large Heterogeneous Information Networks | Changping Meng, Reynold Cheng, Silviu Maniu, Pierre Senellart, Wangda Zhang | Since this problem is computationally intractable, we propose a greedy algorithm to select the most relevant meta-paths. |
71 | From "Selena Gomez" to "Marlon Brando": Understanding Explorative Entity Search | Iris Miliaraki, Roi Blanco, Mounia Lalmas | In this paper, we perform a large-scale analysis into how users interact with the entity results returned by Spark. Based on this analysis, we develop a set of query and user-based features that reflect the click behavior of users and explore their effectiveness in the context of a prediction task. |
72 | Children Seen But Not Heard: When Parents Compromise Children’s Online Privacy | Tehila Minkus, Kelvin Liu, Keith W. Ross | In this paper, we conduct a study to see how widespread these behaviors are among adults on Facebook and Instagram. |
73 | TrueView: Harnessing the Power of Multiple Review Sites | Amanda J. Minnich, Nikan Chavoshi, Abdullah Mueen, Shuang Luan, Michalis Faloutsos | Our work is an early effort that explores the advantages and the challenges in using multiple reviewing sites towards more informed decision making. |
74 | QUOTUS: The Structure of Political Media Coverage as Revealed by Quoting Patterns | Vlad Niculae, Caroline Suen, Justine Zhang, Cristian Danescu-Niculescu-Mizil, Jure Leskovec | In this paper we propose a framework based on quoting patterns for quantifying and characterizing the degree to which media outlets exhibit systematic bias. |
75 | Energy and Performance of Smartphone Radio Bundling in Outdoor Environments | Ana Nika, Yibo Zhu, Ning Ding, Abhilash Jindal, Y. Charlie Hu, Xia Zhou, Ben Y. Zhao, Haitao Zheng | In this study, we seek to answer these questions using extensive measurements to empirically characterize both energy and performance for radio bundling approaches. |
76 | PriVaricator: Deceiving Fingerprinters with Little White Lies | Nick Nikiforakis, Wouter Joosen, Benjamin Livshits | In this paper we propose PriVaricator, a solution to the problem of browser-based fingerprinting. |
77 | Diagnoses, Decisions, and Outcomes: Web Search as Decision Support for Cancer | Michael J. Paul, Ryen W. White, Eric Horvitz | Using this corpus, we present a variety of analyses toward the goal of understanding search and decision making about treatments. |
78 | PocketTrend: Timely Identification and Delivery of Trending Search Content to Mobile Users | Gennady Pekhimenko, Dimitrios Lymberopoulos, Oriana Riva, Karin Strauss, Doug Burger | To understand how trending search topics are formed and evolve over time, we analyze 21 million queries submitted during periods where popular events caused search query volume spikes. |
79 | Overcoming Relational Learning Biases to Accurately Predict Preferences in Large Scale Networks | Joseph J. Pfeiffer, Jennifer Neville, Paul N. Bennett | In this work, we address each of these limitations. |
80 | Deriving an Emergent Relational Schema from RDF Data | Minh-Duc Pham, Linnea Passing, Orri Erling, Peter Boncz | We motivate and describe techniques that allow to detect an "emergent" relational schema from RDF data. |
81 | The Digital Life of Walkable Streets | Daniele Quercia, Luca Maria Aiello, Rossano Schifanella, Adam Davies | To partly automate the computation of those scores, we explore the possibility of using the social media data of Flickr and Foursquare to automatically identify safe and walkable streets. |
82 | Beyond Models: Forecasting Complex Network Processes Directly from Data | Bruno Ribeiro, Minh X. Hoang, Ambuj K. Singh | In this work we show that model-free forecasting is possible. |
83 | Weakly Supervised Extraction of Computer Security Events from Twitter | Alan Ritter, Evan Wright, William Casey, Tom Mitchell | We therefore propose a weakly supervised approach, in which extractors for new categories of events are easy to define and train, by specifying a small number of seed examples. |
84 | Groupsourcing: Team Competition Designs for Crowdsourcing | Markus Rokicki, Sergej Zerr, Stefan Siersdorfer | In this paper, we investigate how team mechanisms can be leveraged to further improve the cost efficiency of crowdsourcing competitions. |
85 | Authentication Melee: A Usability Analysis of Seven Web Authentication Systems | Scott Ruoti, Brent Roberts, Kent Seamons | We report the results of four within-subjects usability studies for seven web authentication systems. |
86 | Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions | Ahmet Erdem Sariyuce, C. Seshadhri, Ali Pinar, Umit V. Catalyurek | We give provably efficient algorithms for nucleus decompositions, and empirically evaluate their behavior in a variety of real graphs. |
87 | Bringing CUPID Indoor Positioning System to Practice | Souvik Sen, Dongho Kim, Stephane Laroche, Kyu-Han Kim, Jeongkeun Lee | In this paper, we present CUPID2.0 which improved our previously proposed CUPID indoor positioning system to overcome these limitations. |
88 | Early Detection of Spam Mobile Apps | Suranga Seneviratne, Aruna Seneviratne, Mohamed Ali Kaafar, Anirban Mahanti, Prasant Mohapatra | Through a systematic crawl of a popular app market and by identifying a set of removed apps, we propose a method to detect spam apps solely using app metadata available at the time of publication. |
89 | N-gram IDF: A Global Term Weighting Scheme Based on Information Distance | Masumi Shirakawa, Takahiro Hara, Shojiro Nishio | Based on our findings, we propose N-gram IDF, a theoretical extension of IDF for handling words and phrases of any length. |
90 | Query Suggestion and Data Fusion in Contextual Disambiguation | Milad Shokouhi, Marc Sloan, Paul N. Bennett, Kevyn Collins-Thompson, Siranush Sarkizova | In this paper, we explore these complementary approaches and how they might be combined. |
91 | Asymmetric Minwise Hashing for Indexing Binary Inner Products and Set Containment | Anshumali Shrivastava, Ping Li | In this paper, we propose asymmetric minwise hashing ({\em MH-ALSH}), to provide a solution to this well-known problem. |
92 | Language Understanding in the Wild: Combining Crowdsourcing and Machine Learning | Edwin D. Simpson, Matteo Venanzi, Steven Reece, Pushmeet Kohli, John Guiver, Stephen J. Roberts, Nicholas R. Jennings | To overcome this problem, we present a novel Bayesian approach to language understanding that relies on aggregated crowdsourced judgements. |
93 | HypTrails: A Bayesian Approach for Comparing Hypotheses About Human Trails on the Web | Philipp Singer, Denis Helic, Andreas Hotho, Markus Strohmaier | In this work, we present a general approach called HypTrails for comparing a set of hypotheses about human trails on the Web, where hypotheses represent beliefs about transitions between states. |
94 | Exploiting Collective Hidden Structures in Webpage Titles for Open Domain Entity Extraction | Wei Song, Shiqi Zhao, Chao Zhang, Hua Wu, Haifeng Wang, Lizhen Liu, Hanshi Wang | We present a novel method for open domain named entity extraction by exploiting the collective hidden structures in webpage titles. |
95 | ROCKER: A Refinement Operator for Key Discovery | Tommaso Soru, Edgard Marx, Axel-Cyrille Ngonga Ngomo | In this paper, we address this research gap by specifying a refinement operator, dubbed ROCKER, which we prove to be finite, proper and non-redundant. |
96 | Random Walk TripleRush: Asynchronous Graph Querying and Sampling | Philip Stutz, Bibek Paudel, Mihaela Verman, Abraham Bernstein | In this paper we propose to rethink query execution in a triple store as a highly parallelized asynchronous graph exploration on an active index data structure. |
97 | Open Domain Question Answering via Semantic Enrichment | Huan Sun, Hao Ma, Wen-tau Yih, Chen-Tse Tsai, Jingjing Liu, Ming-Wei Chang | In this paper, we develop a new QA system that mines answers directly from the Web, and meanwhile employs KBs as a significant auxiliary to further boost the QA performance. |
98 | All Who Wander: On the Prevalence and Characteristics of Multi-community Engagement | Chenhao Tan, Lillian Lee | In this paper, we examine three aspects of multi-community engagement: the sequence of communities that users post to, the language that users employ in those communities, and the feedback that users receive, using longitudinal posting behavior on Reddit as our main data source, and DBLP for auxiliary experiments. |
99 | LINE: Large-scale Information Network Embedding | Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, Qiaozhu Mei | In this paper, we propose a novel network embedding method called the “LINE,” which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted. |
100 | Leveraging Pattern Semantics for Extracting Entities in Enterprises | Fangbo Tao, Bo Zhao, Ariel Fuxman, Yang Li, Jiawei Han | To address these challenges, we propose an end-to-end framework for extracting entities in enterprises, taking the input of enterprise corpus and limited seeds to generate a high-quality entity collection as output. |
101 | Density-friendly Graph Decomposition | Nikolaj Tatti, Aristides Gionis | We start by defining what it means for a subgraph to be locally-dense, and we show that our definition entails a nested chain decomposition of the graph, similar to the one given by k-cores, but in this case the components are arranged in order of increasing density. |
102 | Crowd Fraud Detection in Internet Advertising | Tian Tian, Jun Zhu, Fen Xia, Xin Zhuang, Tong Zhang | In this paper, we carefully examine the characteristics of the group behaviors of crowd fraud and identify three persistent patterns, which are moderateness, synchronicity and dispersivity. |
103 | Provably Fast Inference of Latent Features from Networks: with Applications to Learning Social Circles and Multilabel Classification | Charalampos Tsourakakis | In this work we focus on a fundamental theoretical question related to the above phenomena with various applications: given an undirected graph G, can we infer efficiently the latent vertex features which explain the observed network structure under the assumption of a generative model that exhibits homophily? |
104 | The K-clique Densest Subgraph Problem | Charalampos Tsourakakis | In this work, we introduce the k-clique densest subgraph problem, k ≥ 2. |
105 | GERBIL: General Entity Annotator Benchmarking Framework | Ricardo Usbeck, Michael Röder, Axel-Cyrille Ngonga Ngomo, Ciro Baron, Andreas Both, Martin Brümmer, Diego Ceccarelli, Marco Cornolti, Didier Cherix, Bernd Eickmann, Paolo Ferragina, Christiane Lemke, Andrea Moro, Roberto Navigli, Francesco Piccinno, Giuseppe Rizzo, Harald Sack, René Speck, Raphaël Troncy, Jörg Waitelonis, Lars Wesemann | We present GERBIL, an evaluation framework for semantic entity annotation. |
106 | An Optimization Framework for Weighting Implicit Relevance Labels for Personalized Web Search | Yury Ustinovskiy, Gleb Gusev, Pavel Serdyukov | In this paper we develop a framework for automatic reweighting of these labels. |
107 | A First Look at Tribal Web Traffic | Morgan Vigil, Matthew Rantanen, Elizabeth Belding | In this paper, we present the characterization of the Tribal Digital Village (TDV) network, a multi-hop wireless network currently connecting 13 reservations in San Diego county. |
108 | A Weighted Correlation Index for Rankings with Ties | Sebastiano Vigna | We prove a number of interesting mathematical properties of our generalization and describe an O(n\log n) algorithm for its computation. |
109 | Gathering Additional Feedback on Search Results by Multi-Armed Bandits with Respect to Production Ranking | Aleksandr Vorobev, Damien Lefortier, Gleb Gusev, Pavel Serdyukov | We improve the most flexible and pragmatic of them to handle some actual practical issues. |
110 | The E-Commerce Market for "Lemons": Identification and Analysis of Websites Selling Counterfeit Goods | John Wadleigh, Jake Drew, Tyler Moore | We investigate the practice of websites selling counterfeit goods. |
111 | Concept Expansion Using Web Tables | Chi Wang, Kaushik Chakrabarti, Yeye He, Kris Ganjam, Zhimin Chen, Philip A. Bernstein | In this paper, we propose to leverage the millions of tables on the web for this problem. |
112 | User Latent Preference Model for Better Downside Management in Recommender Systems | Jian Wang, David Hardtke | The approach we propose is general and can be applied to any scenario or domain where downside management is key to the system. |
113 | The Role of Data Cap in Optimal Two-part Network Pricing | Xin Wang, Richard T.B. Ma, Yinlong Xu | In this paper, we study the impact of data cap on the optimal two-part pricing schemes for congestion-prone service markets, e.g., broadband or cloud services. |
114 | Tweeting Cameras for Event Detection | Yuhui Wang, Mohan S. Kankanhalli | To tackle this problem, we propose an innovative multi-layer tweeting camera framework integrating both physical sensors and social sensors to detect various concepts of real-world events. |
115 | Mining Missing Hyperlinks from Human Navigation Traces: A Case Study of Wikipedia | Robert West, Ashwin Paranjape, Jure Leskovec | Here we propose a novel approach to identifying missing links in Wikipedia. |
116 | Semantic Annotation of Mobility Data using Social Media | Fei Wu, Zhenhui Li, Wang-Chien Lee, Hongjian Wang, Zhuojie Huang | We propose frequency-based method, Gaussian mixture model, and kernel density estimation (KDE) to tackle this problem. |
117 | Automatic Web Content Extraction by Combination of Learning and Grouping | Shanchan Wu, Jerry Liu, Jian Fan | We formulate the actual content identifying problem as a DOM tree node selection problem. |
118 | Executing Provenance-Enabled Queries over Web Data | Marcin Wylot, Philippe Cudre-Mauroux, Paul Groth | In this paper, we tackle the problem of efficiently executing provenance-enabled queries over RDF data. |
119 | Understanding Malvertising Through Ad-Injecting Browser Extensions | Xinyu Xing, Wei Meng, Byoungyoung Lee, Udi Weinsberg, Anmol Sheth, Roberto Perdisci, Wenke Lee | In this paper, we show that browser extensions that use ads as their monetization strategy often facilitate the deployment of malvertising. |
120 | E-commerce Reputation Manipulation: The Emergence of Reputation-Escalation-as-a-Service | Haitao Xu, Daiping Liu, Haining Wang, Angelos Stavrou | In this paper, we investigate the impact of the SRE service on reputation escalation by performing in-depth measurements of the prevalence of the SRE service, the business model and market size of SRE markets, and the characteristics of sellers and offered laborers. |
121 | Effective Techniques for Message Reduction and Load Balancing in Distributed Graph Computation | Da Yan, James Cheng, Yi Lu, Wilfred Ng | In this paper, we propose two effective message reduction techniques: (1)vertex mirroring with message combining, and (2)an additional request-respond API. |
122 | Tackling the Achilles Heel of Social Networks: Influence Propagation based Language Model Smoothing | Rui Yan, Ian E.H. Yen, Cheng-Te Li, Shiqi Zhao, Xiaohua Hu | In this paper we propose to tackle the Achilles Heel of social networks by smoothing the language model via influence propagation. |
123 | A Game Theoretic Model for the Formation of Navigable Small-World Networks | Zhi Yang, Wei Chen | In this paper, we present a game theoretic model for the formation of navigable small world networks. |
124 | A Scalable Asynchronous Distributed Algorithm for Topic Modeling | Hsiang-Fu Yu, Cho-Jui Hsieh, Hyokun Yun, S.V.N. Vishwanathan, Inderjit S. Dhillon | In this paper, we present a novel algorithm F+Nomad LDA which simultaneously tackles both these problems. |
125 | LightLDA: Big Topic Models on Modest Computer Clusters | Jinhui Yuan, Fei Gao, Qirong Ho, Wei Dai, Jinliang Wei, Xun Zheng, Eric Po Xing, Tie-Yan Liu, Wei-Ying Ma | Our major contributions include: 1) a new, highly-efficient O(1) Metropolis-Hastings sampling algorithm, whose running cost is (surprisingly) agnostic of model size, and empirically converges nearly an order of magnitude more quickly than current state-of-the-art Gibbs samplers; 2) a model-scheduling scheme to handle the big model challenge, where each worker machine schedules the fetch/use of sub-models as needed, resulting in a frugal use of limited memory capacity and network bandwidth; 3) a differential data-structure for model storage, which uses separate data structures for high- and low-frequency words to allow extremely large models to fit in memory, while maintaining high inference speed. |
126 | A Novelty-Seeking based Dining Recommender System | Fuzheng Zhang, Kai Zheng, Nicholas Jing Yuan, Xing Xie, Enhong Chen, Xiaofang Zhou | In this paper, by leveraging users’ historical dining pattern, socio-demographic characteristics and restaurants’ attributes, we aim at generating the top-K restaurants for a user’s next dining. |
127 | Daily-Aware Personalized Recommendation based on Feature-Level Time Series Analysis | Yongfeng Zhang, Min Zhang, Yi Zhang, Guokun Lai, Yiqun Liu, Honghui Zhang, Shaoping Ma | In this paper, we make use of the large volume of textual reviews for the automatic extraction of domain knowledge, namely, the explicit features/aspects in a specific product domain. |
128 | Automatic Detection of Information Leakage Vulnerabilities in Browser Extensions | Rui Zhao, Chuan Yue, Qing Yi | In this paper, we present a framework, LvDetector, that combines static and dynamic program analysis techniques for automatic detection of information leakage vulnerabilities in legitimate browser extensions. |
129 | Enquiring Minds: Early Detection of Rumors in Social Media from Enquiry Posts | Zhe Zhao, Paul Resnick, Qiaozhu Mei | We present a technique to identify trending rumors, which we define as topics that include disputed factual claims. |
130 | Improving User Topic Interest Profiles by Behavior Factorization | Zhe Zhao, Zhiyuan Cheng, Lichan Hong, Ed H. Chi | Here we propose to separately model users’ topical interests that come from these various behavioral signals in order to construct better user profiles. |
131 | Predicting Pinterest: Automating a Distributed Human Computation | Changtao Zhong, Dmytro Karamshuk, Nishanth Sastry | This paper seeks to understand Pinterest as a distributed human computation that categorises images from around the Web. |