Paper Digest: WWW 2014 Highlights
To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.
If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.
Paper Digest Team
TABLE 1: WWW 2014 Papers
|Large graph mining: patterns, cascades, fraud detection, and algorithms
|For the first, we present a list of static and temporal laws, including advances patterns like ‘eigenspokes’; we show how to use them to spot suspicious activities, in on-line buyer-and-seller settings, in FaceBook, in twitter-like networks.
|Taming the web
|In this talk, we present Tizen’s approaches to taming the web to maximize its benefits while minimizing the risks of its perils.
|Organizing the digital world to empower people to do more, know more, and be more
|In this talk, Dr. Lu will share an outline of Microsoft’s quest and aspiration to organize the digital universe with a pervasive computational fabric of digital information, digital services, and digital experiences that empower every human being on the planet to accomplish more and enrich their life.
|Machine learning in an auction environment
|Patrick Hummel, Preston McAfee
|We consider a model of repeated online auctions in which an ad with an uncertain click-through rate faces a random distribution of competing bids in each auction and there is discounting of payoffs.
|Optimal revenue-sharing double auctions with applications to ad exchanges
|Renato Gomes, Vahab Mirrokni
|Our goal in this paper is to study optimal mechanism design in settings plagued by competition and two-sided asymmetric information, and identify conditions under which the current practice of employing constant cuts is indeed optimal.
|Advertising in a stream
|Samuel Ieong, Mohammad Mahdian, Sergei Vassilvitskii
|In this paper, we model this setting, and observe that allocation and pricing of ad insertions in a stream poses interesting algorithmic and mechanism design challenges.
|The company you keep: mobile malware infection rates and inexpensive risk indicators
|Hien Thi Thu Truong, Eemil Lagerspetz, Petteri Nurmi, Adam J. Oliner, Sasu Tarkoma, N. Asokan, Sourav Bhattacharya
|In this paper, we present the first independent study of malware infection rates and associated risk factors using data collected directly from over 55,000 Android devices.
|Stranger danger: exploring the ecosystem of ad-based URL shortening services
|Nick Nikiforakis, Federico Maggi, Gianluca Stringhini, M. Zubair Rafique, Wouter Joosen, Christopher Kruegel, Frank Piessens, Giovanni Vigna, Stefano Zanero
|In this paper, we investigate the ecosystem of these increasingly popular ad-based URL shortening services.
|Automatic detection and correction of web application vulnerabilities using data mining to predict false positives
|Ibéria Medeiros, Nuno F. Neves, Miguel Correia
|This paper explores the use of a hybrid of methods to detect vulnerabilities with less false positives.
|Personalized collaborative clustering
|Yisong Yue, Chong Wang, Khalid El-Arini, Carlos Guestrin
|We propose a simple yet effective latent factor model to learn the variability of similarity functions across a user population. We propose and study a new machine learning problem for personalization, which we call collaborative clustering.
|Local collaborative ranking
|Joonseok Lee, Samy Bengio, Seungyeon Kim, Guy Lebanon, Yoram Singer
|In this paper, we examine an alternative approach in which the rating matrix is locally low-rank.
|CoBaFi: collaborative bayesian filtering
|Alex Beutel, Kenton Murray, Christos Faloutsos, Alexander J. Smola
|In this paper we describe a unified Bayesian approach to Collaborative Filtering that accomplishes all of these goals.
|Efficient estimation for high similarities using odd sketches
|Michael Mitzenmacher, Rasmus Pagh, Ninh Pham
|In this paper we introduce the Odd Sketch, a compact binary sketch for estimating the Jaccard similarity of two sets.
|Composite retrieval of heterogeneous web search
|Horatiu Bota, Ke Zhou, Joemon M. Jose, Mounia Lalmas
|In this paper, we go one step further and study a different search paradigm: composite retrieval.
|Contextual and dimensional relevance judgments for reusable SERP-level evaluation
|Peter B. Golbus, Imed Zitouni, Jin Young Kim, Ahmed Hassan, Fernando Diaz
|In this work, we aim to investigate the nature of relevance judgment collection.
|Quizz: targeted crowdsourcing with a billion (potential) users
|Panagiotis G. Ipeirotis, Evgeniy Gabrilovich
|We describe Quizz, a gamified crowdsourcing system that simultaneously assesses the knowledge of users and acquires new knowledge from them.
|Community-based bayesian aggregation models for crowdsourcing
|Matteo Venanzi, John Guiver, Gabriella Kazai, Pushmeet Kohli, Milad Shokouhi
|To mitigate this issue, we propose a novel community-based Bayesian label aggregation model, CommunityBCC, which assumes that crowd workers conform to a few different types, where each type represents a group of workers with similar confusion matrices.
|The wisdom of minority: discovering and targeting the right group of workers for crowdsourcing
|Hongwei Li, Bo Zhao, Ariel Fuxman
|In this paper, we propose a general crowd targeting framework that can automatically discover, for a given task, if any group of workers based on their attributes have higher quality on average; and target such groups, if they exist, for future work on the same task.
|Monitoring web browsing behavior with differential privacy
|Liyue Fan, Luca Bonomi, Li Xiong, Vaidy Sunderam
|In this paper, we adopt differential privacy, a strong, provable privacy definition, and show that differentially private aggregates of web browsing activities can be released in real-time while preserving the utility of shared data.
|Quite a mess in my cookie jar!: leveraging machine learning to protect web authentication
|Stefano Calzavara, Gabriele Tolomei, Michele Bugliesi, Salvatore Orlando
|In this paper, we conduct the first such formal assessment, based on a gold set of cookies we collect from 70 popular websites of the Alexa ranking.
|Reconciling mobile app privacy and usability on smartphones: could user privacy profiles help?
|Bin Liu, Jialiu Lin, Norman Sadeh
|In this paper, we report on the results of a study analyzing people’s privacy preferences when it comes to granting permissions to different mobile apps.
|Random walks based modularity: application to semi-supervised learning
|Robin Devooght, Amin Mantrach, Ilkka Kivimäki, Hugues Bersini, Alejandro Jaimes, Marco Saerens
|We introduce here a novel, formal and well-defined modularity measure based on random walks.
|High quality, scalable and parallel community detection for large real graphs
|Arnau Prat-Pérez, David Dominguez-Sal, Josep-Lluis Larriba-Pey
|In this paper, we propose a novel disjoint community detection algorithm called Scalable Community Detection (SCD).
|Dynamic and historical shortest-path distance queries on large evolving networks by pruned landmark labeling
|Takuya Akiba, Yoichi Iwata, Yuichi Yoshida
|We propose two dynamic indexing schemes for shortest-path and distance queries on large time-evolving graphs, which are useful in a wide range of important applications such as real-time network-aware search and network evolution analysis.
|To gather together for a better world: understanding and leveraging communities in micro-lending recommendation
|Jaegul Choo, Daniel Lee, Bistra Dilkina, Hongyuan Zha, Haesun Park
|Based on this approach, we achieved a competitive performance in predicting the lending activities for the top 200 teams.
|Recommending investors for crowdfunding projects
|Jisun An, Daniele Quercia, Jon Crowcroft
|We thus set out to propose different ways of recommending investors found on Twitter for specific Kickstarter projects.
|Understanding spatial homophily: the case of peer influence and social selection
|Ke Zhang, Konstantinos Pelechrinis
|In this work, we are interested in examining the forces of the above mechanisms in the context of the locations visited by people.
|Designing and deploying online field experiments
|Eytan Bakshy, Dean Eckles, Michael S. Bernstein
|We thus introduce a language for online field experiments called PlanOut.
|Local business ambience characterization through mobile audio sensing
|He Wang, Dimitrios Lymberopoulos, Jie Liu
|In this paper, we propose to automatically crowdsource such rich, local business ambience metadata through real user check-in events.
|Social bootstrapping: how pinterest and last.fm social communities benefit by borrowing links from facebook
|Changtao Zhong, Mostafa Salehi, Sunil Shah, Marius Cobzarenco, Nishanth Sastry, Meeyoung Cha
|We find that the copied subgraph has a giant component, higher reciprocity and clustering, and confirm that the copied connections see higher social interactions.
|Modeling contextual agreement in preferences
|Loc Do, Hady W. Lauw
|In this paper, we propose a generative model for contextual agreement in preferences.
|A Monte Carlo algorithm for cold start recommendation
|Yu Rong, Xiao Wen, Hong Cheng
|In contrast to such methods, we propose a more general solution to address the cold start problem based on the observed user rating records only.
|Was this review helpful to you?: it depends! context and voting patterns in online content
|Ruben Sipos, Arpita Ghosh, Thorsten Joachims
|In this paper, we explore how users respond to this question and find that their responses are not quite straightforward after all.
|Reduce and aggregate: similarity ranking in multi-categorical bipartite graphs
|Alessandro Epasto, Jon Feldman, Silvio Lattanzi, Stefano Leonardi, Vahab Mirrokni
|We present a novel algorithmic framework that addresses both issues for the computation of several graph-theoretical similarity measures, including # common neighbors, and Personalized PageRank.
|Robust multivariate autoregression for anomaly detection in dynamic product ratings
|Nikou Günnemann, Stephan Günnemann, Christos Faloutsos
|We propose an efficient algorithm solving our objective and we present interesting findings on various real world datasets.
|Mining novelty-seeking trait across heterogeneous domains
|Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie
|In this paper, we focus on understanding individual novelty-seeking trait embodied at different levels and across heterogeneous domains.
|Discovering emerging entities with ambiguous names
|Johannes Hoffart, Yasemin Altun, Gerhard Weikum
|In this paper we focus on the most difficult case where the names of new entities are ambiguous.
|Effective named entity recognition for idiosyncratic web collections
|Roman Prokofyev, Gianluca Demartini, Philippe Cudré-Mauroux
|In this work, we propose novel approaches for NER on distinctive document collections (such as scientific articles) based on n-grams inspection and classification.
|Deduplicating a places database
|Nilesh Dalvi, Marian Olteanu, Manish Raghavan, Philip Bohannon
|In this paper, we present a language model that can encapsulate both domain knowledge as well as local geographical knowledge.
|The dynamics of repeat consumption
|Ashton Anderson, Ravi Kumar, Andrew Tomkins, Sergei Vassilvitskii
|Based on this, we develop a model by which the item from $t$ timesteps ago is reconsumed with a probability proportional to a function of t.
|From devices to people: attribution of search activity in multi-user settings
|Ryen W. White, Ahmed Hassan, Adish Singla, Eric Horvitz
|We present methods for attributing search activity to individual searchers.
|Demographics, weather and online reviews: a study of restaurant recommendations
|Saeideh Bakhshi, Partha Kanuparthy, Eric Gilbert
|In this work, we take a first look at online restaurant recommendation communities to study what endogenous (i.e., related to entities being reviewed) and exogenous factors influence people’s participation in the communities, and to what extent.
|TripleProv: efficient processing of lineage queries in a native RDF store
|Marcin Wylot, Philippe Cudre-Mauroux, Paul Groth
|In the following, we present the overall architecture of our system, its different lineage storage models, and the various query execution strategies we have implemented to efficiently answer provenance-enabled queries.
|RDF analytics: lenses over semantic graphs
|Dario Colazzo, François Goasdoué, Ioana Manolescu, Alexandra Roatiş
|In this work, we fully redesign, from the bottom up, core data analytics concepts and tools in the context of RDF data, leading to the first complete formal framework for warehouse-style RDF analytics.
|Formalisation and experiences of R2RML-based SPARQL to SQL query translation using morph
|Freddy Priyatna, Oscar Corcho, Juan Sequeda
|In this paper we describe an extension of a well-known algorithm for SPARQL to SQL translation, originally formalised for RDBMS-backed triple stores, that takes into account R2RML mappings.
|Codewebs: scalable homework search for massive open online programming courses
|Andy Nguyen, Christopher Piech, Jonathan Huang, Leonidas Guibas
|We outline a method for decomposing online homework submissions into a vocabulary of "code phrases", and based on this vocabulary, we architect a queryable index that allows for fast searches into the massive dataset of student homework submissions.
|Joint question clustering and relevance prediction for open domain non-factoid question answering
|Snigdha Chaturvedi, Vittorio Castelli, Radu Florian, Ramesh M. Nallapati, Hema Raghavan
|In this paper, we address this problem by modeling the Answer Type as a latent variable that is learned in a data-driven fashion, allowing the model to be more adaptive to new domains and data sets.
|Knowledge base completion via search-based question answering
|Robert West, Evgeniy Gabrilovich, Kevin Murphy, Shaohua Sun, Rahul Gupta, Dekang Lin
|In this paper, we propose a way to leverage existing Web-search-based question-answering technology to fill in the gaps in knowledge bases in a targeted way.
|A time-based collective factorization for topic discovery and monitoring in news
|Carmen K. Vaca, Amin Mantrach, Alejandro Jaimes, Marco Saerens
|In this paper, we introduce a novel framework inspired from Collective Factorization for online topic discovery able to connect topics between different time-slots.
|The dual-sparse topic model: mining focused topics and focused terms in short text
|Tianyi Lin, Wentao Tian, Qiaozhu Mei, Hong Cheng
|In this paper, we propose a dual-sparse topic model that addresses the sparsity in both the topic mixtures and the word usage.
|Acquisition of open-domain classes via intersective semantics
|Acquisition of open-domain classes via intersective semantics
|Automated runtime recovery for QoS-based service composition
|Tian Huat Tan, Manman Chen, Étienne André, Jun Sun, Yang Liu, Jin Song Dong
|In this work, we propose an automated approach based on a genetic algorithm to calculate the recovery plan that could guarantee the satisfaction of functional properties of the composite service after recovery.
|Similarity-based web browser optimization
|Haoyu Wang, Mengxin Liu, Yao Guo, Xiangqun Chen
|In this paper, we propose a similarity-based optimization approach to improve webpage processing performance of web browsers.
|Temporal QoS-aware web service recommendation via non-negative tensor factorization
|Wancai Zhang, Hailong Sun, Xudong Liu, Xiaohui Guo
|By considering the third dynamic context information, a Temporal QoS-aware Web Service Recommendation Framework is presented to predict missing QoS value under various temporal context.
|Adscape: harvesting and analyzing online display ads
|Paul Barford, Igor Canadi, Darja Krushevskaja, Qiang Ma, S. Muthukrishnan
|In this paper we report a first-of-its-kind study that seeks to broadly understand the features, mechanisms and dynamics of display advertising on the web – i.e., the Adscape.
|Statistical inference in two-stage online controlled experiments with treatment selection and validation
|Alex Deng, Tianxi Li, Yu Guo
|In this paper, we propose a general methodology for combining the first screening stage data together with validation stage data for more sensitive hypothesis testing and more accurate point estimation of the treatment effect.
|An experimental evaluation of bidders’ behavior in ad auctions
|Gali Noti, Noam Nisan, Ilan Yaniv
|The goal of the research was to understand users’ strategies in making bids.
|Chaff from the wheat: characterization and modeling of deleted questions on stack overflow
|Denzil Correa, Ashish Sureka
|We present the first study of deleted questions on Stack Overflow.
|Timeline generation: tracking individuals on twitter
|Jiwei Li, Claire Cardie
|In this paper, we preliminarily learn the problem of reconstructing users’ life history based on the their Twitter stream and proposed an unsupervised framework that create a chronological list for personal important events (PIE) of individuals.
|Modeling and predicting the growth and death of membership-based websites
|In this work we present six years of the daily number of users (DAU) of twenty-two membership-based websites – encompassing online social networks, grassroots movements, online forums, and membership-only Internet stores – well balanced between successes and failures.
|Word storms: multiples of word clouds for visual comparison of documents
|Quim Castellà, Charles Sutton
|We present a novel algorithm that creates a coordinated word storm, in which words that appear in multiple documents are placed in the same location, using the same color and orientation, across clouds.
|Exploring the filter bubble: the effect of using recommender systems on content diversity
|Tien T. Nguyen, Pik-Mai Hui, F. Maxwell Harper, Loren Terveen, Joseph A. Konstan
|We contribute a novel metric to measure content diversity based on information encoded in user-generated tags, and we present a new set of methods to examine the temporal effect of recommender systems on the user experience.
|Engaging with massive online courses
|Ashton Anderson, Daniel Huttenlocher, Jon Kleinberg, Jure Leskovec
|In this work, we use such trace data to develop a conceptual framework for understanding how users currently engage with MOOCs.
|User satisfaction in competitive sponsored search
|David Kempe, Brendan Lucier
|We present a model of competition between web search algorithms, and study the impact of such competition on user welfare.
|Price competition in online combinatorial markets
|Moshe Babaioff, Noam Nisan, Renato Paes Leme
|We consider a single buyer with a combinatorial preference that would like to purchase related products and services from different vendors,where each vendor supplies exactly one product.
|Revenue monotone mechanisms for online advertising
|Gagan Goel, Mohammad Reza Khani
|In this work, we seek incentive-compatible mechanisms that are revenue-monotone.
|Semantic stability in social tagging streams
|Claudia Wagner, Philipp Singer, Markus Strohmaier, Bernardo A. Huberman
|In this work we tackle these questions by (i) presenting a novel and robust method which overcomes a number of limitations in existing methods, (ii) empirically investigating semantic stabilization processes in a wide range of social tagging systems with distinct domains and properties and (iii) detecting potential causes for semantic stabilization, specifically imitation behavior, shared background knowledge and intrinsic properties of natural language.
|Test-driven evaluation of linked data quality
|Dimitris Kontokostas, Patrick Westphal, Sören Auer, Sebastian Hellmann, Jens Lehmann, Roland Cornelissen, Amrapali Zaveri
|We present a methodology for test-driven quality assessment of Linked Data, which is inspired by test-driven software development.
|Don’t like RDF reification?: making statements about statements using singleton property
|Vinh Nguyen, Olivier Bodenreider, Amit Sheth
|In this paper, we propose a novel approach called Singleton Property for representing statements about statements and provide a formal semantics for it.
|Comment-based multi-view clustering of web 2.0 items
|Xiangnan He, Min-Yen Kan, Peichu Xie, Xiao Chen
|In this paper, we systematically investigate how user-generated comments can be used to improve the clustering of Web 2.0 items.
|Finding progression stages in time-evolving event sequences
|Jaewon Yang, Julian McAuley, Jure Leskovec, Paea LePendu, Nigam Shah
|In this paper, we develop a model-based method for discovering common progression stages in general event sequences.
|On estimating the average degree
|Anirban Dasgupta, Ravi Kumar, Tamas Sarlos
|In this work we consider the problem of estimating the average degree of a large network using efficient random sampling, where the number of nodes is not known to the algorithm.
|Who proposed the relationship?: recovering the hidden directions of undirected social networks
|Jun Zhang, Chaokun Wang, Jianmin Wang
|In this study, we engage in the investigation of directionality patterns on real-world directed social networks and summarize our findings using four consistency hypotheses.
|User profiling in an ego network: co-profiling attributes and relationships
|Rui Li, Chi Wang, Kevin Chen-Chuan Chang
|In this paper, we study the problem of profiling user attributes in social network.
|Attributed graph models: modeling network structure with correlated attributes
|Joseph J. Pfeiffer, Sebastian Moreno, Timothy La Fond, Jennifer Neville, Brian Gallagher
|In this work, we present the Attributed Graph Model (AGM) framework to jointly model network structure and vertex attributes.
|WikiWho: precise and efficient attribution of authorship of revisioned content
|Fabian Flöck, Maribel Acosta
|As a solution, we propose a graph-based model to represent revisioned content and an algorithm over this model that tackles both issues effectively.
|What makes a good biography?: multidimensional quality analysis based on wikipedia article feedback data
|Lucie Flekova, Oliver Ferschke, Iryna Gurevych
|In this paper, we study the user-perceived quality of Wikipedia articles based on a novel Wikipedia user feedback dataset.
|What makes an image popular?
|Aditya Khosla, Atish Das Sarma, Raffay Hamid
|In this paper, we show the importance of image cues such as color, gradients, deep learning features and the set of objects present, as well as the importance of various social cues such as number of friends or number of photos uploaded that lead to high or low popularity of images.
|STFU NOOB!: predicting crowdsourced decisions on toxic behavior in online games
|Jeremy Blackburn, Haewoon Kwak
|In this paper, we propose a supervised learning approach for predicting crowdsourced decisions on toxic behavior with large-scale labeled data collections; over 10 million user reports involved in 1.46 million toxic players and corresponding crowdsourced decisions.
|Unveiling group characteristics in online social games: a socio-economic analysis
|Taejoong Chung, Jinyoung Han, Daejin Choi, Taekyoung Ted Kwon, Huy Kang Kim, Yanghee Choi
|In this paper, we analyze the group activities of users in Aion, one of the largest MMORPGs, based on the records of the activities of 94,497 users.
|XXXtortion?: inferring registration intent in the .XXX TLD
|Tristan Halvorson, Kirill Levchenko, Stefan Savage, Geoffrey M. Voelker
|We use this information to characterize each xxx domain and infer the registrant’s most likely intent.
|The bursty dynamics of the Twitter information network
|Seth A. Myers, Jure Leskovec
|Here, we study ways in which network structure reacts to users posting and sharing content.
|Can cascades be predicted?
|Justin Cheng, Lada Adamic, P. Alex Dow, Jon Michael Kleinberg, Jure Leskovec
|In this work, we develop a framework for addressing cascade prediction problems.
|How to influence people with partial incentives
|Erik D. Demaine, MohammadTaghi Hajiaghayi, Hamid Mahini, David L. Malec, S. Raghavan, Anshul Sawant, Morteza Zadimoghadam
|Our main theoretical contribution is to show how to adapt the major positive results from the integral case to the fractional case.
|Fast topic discovery from web search streams
|Di Jiang, Kenneth Wai-Ting Leung, Wilfred Ng
|In this paper, we propose a novel probabilistic topic model, the Web Search Stream Model (WSSM), which is delicately calibrated for handling two salient features of the web search data: it is in the format of streams and in massive volume.
|A hierarchical Dirichlet model for taxonomy expansion for search engines
|Jingjing Wang, Changsung Kang, Yi Chang, Jiawei Han
|In this paper, we study the problem of how to expand an existing category hierarchy for a search/navigation system to accommodate the information needs of users more comprehensively.
|Recent and robust query auto-completion
|Stewart Whiting, Joemon M. Jose
|To address this trade-off, we propose several practical completion suggestion ranking approaches, including: (i) a sliding window of query popularity evidence from the past 2-28 days, (ii) the query popularity distribution in the last N queries observed with a given prefix, and (iii) short-range query popularity prediction based on recently observed trends.