Paper Digest: WWW 2014 Highlights

May 12, 2014June 27, 2020 admin

The Web Conference (WWW) is one of the top internet conferences in the world.

To help the community quickly catch up on the work presented in this conference, Paper Digest Team processed all accepted papers, and generated one highlight sentence (typically the main topic) for each paper. Readers are encouraged to read these machine generated highlights / summaries to quickly get the main idea of each paper.

If you do not want to miss any interesting academic paper, you are welcome to sign up our free daily paper digest service to get updates on new papers published in your area every day. You are also welcome to follow us on Twitter and Linkedin to get updated with new conference digests.

Paper Digest Team
team@paperdigest.org

TABLE 1: WWW 2014 Papers

	Title	Authors	Highlight
1	Large graph mining: patterns, cascades, fraud detection, and algorithms	Christos Faloutsos	For the first, we present a list of static and temporal laws, including advances patterns like ‘eigenspokes’; we show how to use them to spot suspicious activities, in on-line buyer-and-seller settings, in FaceBook, in twitter-like networks.
2	Taming the web	Jong-Deok Choi	In this talk, we present Tizen’s approaches to taming the web to maximize its benefits while minimizing the risks of its perils.
3	Organizing the digital world to empower people to do more, know more, and be more	Qi Lu	In this talk, Dr. Lu will share an outline of Microsoft’s quest and aspiration to organize the digital universe with a pervasive computational fabric of digital information, digital services, and digital experiences that empower every human being on the planet to accomplish more and enrich their life.
4	Machine learning in an auction environment	Patrick Hummel, Preston McAfee	We consider a model of repeated online auctions in which an ad with an uncertain click-through rate faces a random distribution of competing bids in each auction and there is discounting of payoffs.
5	Optimal revenue-sharing double auctions with applications to ad exchanges	Renato Gomes, Vahab Mirrokni	Our goal in this paper is to study optimal mechanism design in settings plagued by competition and two-sided asymmetric information, and identify conditions under which the current practice of employing constant cuts is indeed optimal.
6	Advertising in a stream	Samuel Ieong, Mohammad Mahdian, Sergei Vassilvitskii	In this paper, we model this setting, and observe that allocation and pricing of ad insertions in a stream poses interesting algorithmic and mechanism design challenges.
7	The company you keep: mobile malware infection rates and inexpensive risk indicators	Hien Thi Thu Truong, Eemil Lagerspetz, Petteri Nurmi, Adam J. Oliner, Sasu Tarkoma, N. Asokan, Sourav Bhattacharya	In this paper, we present the first independent study of malware infection rates and associated risk factors using data collected directly from over 55,000 Android devices.
8	Stranger danger: exploring the ecosystem of ad-based URL shortening services	Nick Nikiforakis, Federico Maggi, Gianluca Stringhini, M. Zubair Rafique, Wouter Joosen, Christopher Kruegel, Frank Piessens, Giovanni Vigna, Stefano Zanero	In this paper, we investigate the ecosystem of these increasingly popular ad-based URL shortening services.
9	Automatic detection and correction of web application vulnerabilities using data mining to predict false positives	Ibéria Medeiros, Nuno F. Neves, Miguel Correia	This paper explores the use of a hybrid of methods to detect vulnerabilities with less false positives.
10	Personalized collaborative clustering	Yisong Yue, Chong Wang, Khalid El-Arini, Carlos Guestrin	We propose a simple yet effective latent factor model to learn the variability of similarity functions across a user population. We propose and study a new machine learning problem for personalization, which we call collaborative clustering.
11	Local collaborative ranking	Joonseok Lee, Samy Bengio, Seungyeon Kim, Guy Lebanon, Yoram Singer	In this paper, we examine an alternative approach in which the rating matrix is locally low-rank.
12	CoBaFi: collaborative bayesian filtering	Alex Beutel, Kenton Murray, Christos Faloutsos, Alexander J. Smola	In this paper we describe a unified Bayesian approach to Collaborative Filtering that accomplishes all of these goals.
13	Efficient estimation for high similarities using odd sketches	Michael Mitzenmacher, Rasmus Pagh, Ninh Pham	In this paper we introduce the Odd Sketch, a compact binary sketch for estimating the Jaccard similarity of two sets.
14	Composite retrieval of heterogeneous web search	Horatiu Bota, Ke Zhou, Joemon M. Jose, Mounia Lalmas	In this paper, we go one step further and study a different search paradigm: composite retrieval.
15	Contextual and dimensional relevance judgments for reusable SERP-level evaluation	Peter B. Golbus, Imed Zitouni, Jin Young Kim, Ahmed Hassan, Fernando Diaz	In this work, we aim to investigate the nature of relevance judgment collection.
16	Quizz: targeted crowdsourcing with a billion (potential) users	Panagiotis G. Ipeirotis, Evgeniy Gabrilovich	We describe Quizz, a gamified crowdsourcing system that simultaneously assesses the knowledge of users and acquires new knowledge from them.
17	Community-based bayesian aggregation models for crowdsourcing	Matteo Venanzi, John Guiver, Gabriella Kazai, Pushmeet Kohli, Milad Shokouhi	To mitigate this issue, we propose a novel community-based Bayesian label aggregation model, CommunityBCC, which assumes that crowd workers conform to a few different types, where each type represents a group of workers with similar confusion matrices.
18	The wisdom of minority: discovering and targeting the right group of workers for crowdsourcing	Hongwei Li, Bo Zhao, Ariel Fuxman	In this paper, we propose a general crowd targeting framework that can automatically discover, for a given task, if any group of workers based on their attributes have higher quality on average; and target such groups, if they exist, for future work on the same task.
19	Monitoring web browsing behavior with differential privacy	Liyue Fan, Luca Bonomi, Li Xiong, Vaidy Sunderam	In this paper, we adopt differential privacy, a strong, provable privacy definition, and show that differentially private aggregates of web browsing activities can be released in real-time while preserving the utility of shared data.
20	Quite a mess in my cookie jar!: leveraging machine learning to protect web authentication	Stefano Calzavara, Gabriele Tolomei, Michele Bugliesi, Salvatore Orlando	In this paper, we conduct the first such formal assessment, based on a gold set of cookies we collect from 70 popular websites of the Alexa ranking.
21	Reconciling mobile app privacy and usability on smartphones: could user privacy profiles help?	Bin Liu, Jialiu Lin, Norman Sadeh	In this paper, we report on the results of a study analyzing people’s privacy preferences when it comes to granting permissions to different mobile apps.
22	Random walks based modularity: application to semi-supervised learning	Robin Devooght, Amin Mantrach, Ilkka Kivimäki, Hugues Bersini, Alejandro Jaimes, Marco Saerens	We introduce here a novel, formal and well-defined modularity measure based on random walks.
23	High quality, scalable and parallel community detection for large real graphs	Arnau Prat-Pérez, David Dominguez-Sal, Josep-Lluis Larriba-Pey	In this paper, we propose a novel disjoint community detection algorithm called Scalable Community Detection (SCD).
24	Dynamic and historical shortest-path distance queries on large evolving networks by pruned landmark labeling	Takuya Akiba, Yoichi Iwata, Yuichi Yoshida	We propose two dynamic indexing schemes for shortest-path and distance queries on large time-evolving graphs, which are useful in a wide range of important applications such as real-time network-aware search and network evolution analysis.
25	To gather together for a better world: understanding and leveraging communities in micro-lending recommendation	Jaegul Choo, Daniel Lee, Bistra Dilkina, Hongyuan Zha, Haesun Park	Based on this approach, we achieved a competitive performance in predicting the lending activities for the top 200 teams.
26	Recommending investors for crowdfunding projects	Jisun An, Daniele Quercia, Jon Crowcroft	We thus set out to propose different ways of recommending investors found on Twitter for specific Kickstarter projects.
27	Understanding spatial homophily: the case of peer influence and social selection	Ke Zhang, Konstantinos Pelechrinis	In this work, we are interested in examining the forces of the above mechanisms in the context of the locations visited by people.
28	Designing and deploying online field experiments	Eytan Bakshy, Dean Eckles, Michael S. Bernstein	We thus introduce a language for online field experiments called PlanOut.
29	Local business ambience characterization through mobile audio sensing	He Wang, Dimitrios Lymberopoulos, Jie Liu	In this paper, we propose to automatically crowdsource such rich, local business ambience metadata through real user check-in events.
30	Social bootstrapping: how pinterest and last.fm social communities benefit by borrowing links from facebook	Changtao Zhong, Mostafa Salehi, Sunil Shah, Marius Cobzarenco, Nishanth Sastry, Meeyoung Cha	We find that the copied subgraph has a giant component, higher reciprocity and clustering, and confirm that the copied connections see higher social interactions.
31	Modeling contextual agreement in preferences	Loc Do, Hady W. Lauw	In this paper, we propose a generative model for contextual agreement in preferences.
32	A Monte Carlo algorithm for cold start recommendation	Yu Rong, Xiao Wen, Hong Cheng	In contrast to such methods, we propose a more general solution to address the cold start problem based on the observed user rating records only.
33	Was this review helpful to you?: it depends! context and voting patterns in online content	Ruben Sipos, Arpita Ghosh, Thorsten Joachims	In this paper, we explore how users respond to this question and find that their responses are not quite straightforward after all.
34	Reduce and aggregate: similarity ranking in multi-categorical bipartite graphs	Alessandro Epasto, Jon Feldman, Silvio Lattanzi, Stefano Leonardi, Vahab Mirrokni	We present a novel algorithmic framework that addresses both issues for the computation of several graph-theoretical similarity measures, including # common neighbors, and Personalized PageRank.
35	Robust multivariate autoregression for anomaly detection in dynamic product ratings	Nikou Günnemann, Stephan Günnemann, Christos Faloutsos	We propose an efficient algorithm solving our objective and we present interesting findings on various real world datasets.
36	Mining novelty-seeking trait across heterogeneous domains	Fuzheng Zhang, Nicholas Jing Yuan, Defu Lian, Xing Xie	In this paper, we focus on understanding individual novelty-seeking trait embodied at different levels and across heterogeneous domains.
37	Discovering emerging entities with ambiguous names	Johannes Hoffart, Yasemin Altun, Gerhard Weikum	In this paper we focus on the most difficult case where the names of new entities are ambiguous.
38	Effective named entity recognition for idiosyncratic web collections	Roman Prokofyev, Gianluca Demartini, Philippe Cudré-Mauroux	In this work, we propose novel approaches for NER on distinctive document collections (such as scientific articles) based on n-grams inspection and classification.
39	Deduplicating a places database	Nilesh Dalvi, Marian Olteanu, Manish Raghavan, Philip Bohannon	In this paper, we present a language model that can encapsulate both domain knowledge as well as local geographical knowledge.
40	The dynamics of repeat consumption	Ashton Anderson, Ravi Kumar, Andrew Tomkins, Sergei Vassilvitskii	Based on this, we develop a model by which the item from $t$ timesteps ago is reconsumed with a probability proportional to a function of t.
41	From devices to people: attribution of search activity in multi-user settings	Ryen W. White, Ahmed Hassan, Adish Singla, Eric Horvitz	We present methods for attributing search activity to individual searchers.
42	Demographics, weather and online reviews: a study of restaurant recommendations	Saeideh Bakhshi, Partha Kanuparthy, Eric Gilbert	In this work, we take a first look at online restaurant recommendation communities to study what endogenous (i.e., related to entities being reviewed) and exogenous factors influence people’s participation in the communities, and to what extent.
43	TripleProv: efficient processing of lineage queries in a native RDF store	Marcin Wylot, Philippe Cudre-Mauroux, Paul Groth	In the following, we present the overall architecture of our system, its different lineage storage models, and the various query execution strategies we have implemented to efficiently answer provenance-enabled queries.
44	RDF analytics: lenses over semantic graphs	Dario Colazzo, François Goasdoué, Ioana Manolescu, Alexandra Roatiş	In this work, we fully redesign, from the bottom up, core data analytics concepts and tools in the context of RDF data, leading to the first complete formal framework for warehouse-style RDF analytics.
45	Formalisation and experiences of R2RML-based SPARQL to SQL query translation using morph	Freddy Priyatna, Oscar Corcho, Juan Sequeda	In this paper we describe an extension of a well-known algorithm for SPARQL to SQL translation, originally formalised for RDBMS-backed triple stores, that takes into account R2RML mappings.
46	Codewebs: scalable homework search for massive open online programming courses	Andy Nguyen, Christopher Piech, Jonathan Huang, Leonidas Guibas	We outline a method for decomposing online homework submissions into a vocabulary of "code phrases", and based on this vocabulary, we architect a queryable index that allows for fast searches into the massive dataset of student homework submissions.
47	Joint question clustering and relevance prediction for open domain non-factoid question answering	Snigdha Chaturvedi, Vittorio Castelli, Radu Florian, Ramesh M. Nallapati, Hema Raghavan	In this paper, we address this problem by modeling the Answer Type as a latent variable that is learned in a data-driven fashion, allowing the model to be more adaptive to new domains and data sets.
48	Knowledge base completion via search-based question answering	Robert West, Evgeniy Gabrilovich, Kevin Murphy, Shaohua Sun, Rahul Gupta, Dekang Lin	In this paper, we propose a way to leverage existing Web-search-based question-answering technology to fill in the gaps in knowledge bases in a targeted way.
49	A time-based collective factorization for topic discovery and monitoring in news	Carmen K. Vaca, Amin Mantrach, Alejandro Jaimes, Marco Saerens	In this paper, we introduce a novel framework inspired from Collective Factorization for online topic discovery able to connect topics between different time-slots.
50	The dual-sparse topic model: mining focused topics and focused terms in short text	Tianyi Lin, Wentao Tian, Qiaozhu Mei, Hong Cheng	In this paper, we propose a dual-sparse topic model that addresses the sparsity in both the topic mixtures and the word usage.
51	Acquisition of open-domain classes via intersective semantics	Marius Paşca	Acquisition of open-domain classes via intersective semantics
52	Automated runtime recovery for QoS-based service composition	Tian Huat Tan, Manman Chen, Étienne André, Jun Sun, Yang Liu, Jin Song Dong	In this work, we propose an automated approach based on a genetic algorithm to calculate the recovery plan that could guarantee the satisfaction of functional properties of the composite service after recovery.
53	Similarity-based web browser optimization	Haoyu Wang, Mengxin Liu, Yao Guo, Xiangqun Chen	In this paper, we propose a similarity-based optimization approach to improve webpage processing performance of web browsers.
54	Temporal QoS-aware web service recommendation via non-negative tensor factorization	Wancai Zhang, Hailong Sun, Xudong Liu, Xiaohui Guo	By considering the third dynamic context information, a Temporal QoS-aware Web Service Recommendation Framework is presented to predict missing QoS value under various temporal context.
55	Adscape: harvesting and analyzing online display ads	Paul Barford, Igor Canadi, Darja Krushevskaja, Qiang Ma, S. Muthukrishnan	In this paper we report a first-of-its-kind study that seeks to broadly understand the features, mechanisms and dynamics of display advertising on the web – i.e., the Adscape.
56	Statistical inference in two-stage online controlled experiments with treatment selection and validation	Alex Deng, Tianxi Li, Yu Guo	In this paper, we propose a general methodology for combining the first screening stage data together with validation stage data for more sensitive hypothesis testing and more accurate point estimation of the treatment effect.
57	An experimental evaluation of bidders’ behavior in ad auctions	Gali Noti, Noam Nisan, Ilan Yaniv	The goal of the research was to understand users’ strategies in making bids.
58	Chaff from the wheat: characterization and modeling of deleted questions on stack overflow	Denzil Correa, Ashish Sureka	We present the first study of deleted questions on Stack Overflow.
59	Timeline generation: tracking individuals on twitter	Jiwei Li, Claire Cardie	In this paper, we preliminarily learn the problem of reconstructing users’ life history based on the their Twitter stream and proposed an unsupervised framework that create a chronological list for personal important events (PIE) of individuals.
60	Modeling and predicting the growth and death of membership-based websites	Bruno Ribeiro	In this work we present six years of the daily number of users (DAU) of twenty-two membership-based websites – encompassing online social networks, grassroots movements, online forums, and membership-only Internet stores – well balanced between successes and failures.
61	Word storms: multiples of word clouds for visual comparison of documents	Quim Castellà, Charles Sutton	We present a novel algorithm that creates a coordinated word storm, in which words that appear in multiple documents are placed in the same location, using the same color and orientation, across clouds.
62	Exploring the filter bubble: the effect of using recommender systems on content diversity	Tien T. Nguyen, Pik-Mai Hui, F. Maxwell Harper, Loren Terveen, Joseph A. Konstan	We contribute a novel metric to measure content diversity based on information encoded in user-generated tags, and we present a new set of methods to examine the temporal effect of recommender systems on the user experience.
63	Engaging with massive online courses	Ashton Anderson, Daniel Huttenlocher, Jon Kleinberg, Jure Leskovec	In this work, we use such trace data to develop a conceptual framework for understanding how users currently engage with MOOCs.
64	User satisfaction in competitive sponsored search	David Kempe, Brendan Lucier	We present a model of competition between web search algorithms, and study the impact of such competition on user welfare.
65	Price competition in online combinatorial markets	Moshe Babaioff, Noam Nisan, Renato Paes Leme	We consider a single buyer with a combinatorial preference that would like to purchase related products and services from different vendors,where each vendor supplies exactly one product.
66	Revenue monotone mechanisms for online advertising	Gagan Goel, Mohammad Reza Khani	In this work, we seek incentive-compatible mechanisms that are revenue-monotone.
67	Semantic stability in social tagging streams	Claudia Wagner, Philipp Singer, Markus Strohmaier, Bernardo A. Huberman	In this work we tackle these questions by (i) presenting a novel and robust method which overcomes a number of limitations in existing methods, (ii) empirically investigating semantic stabilization processes in a wide range of social tagging systems with distinct domains and properties and (iii) detecting potential causes for semantic stabilization, specifically imitation behavior, shared background knowledge and intrinsic properties of natural language.
68	Test-driven evaluation of linked data quality	Dimitris Kontokostas, Patrick Westphal, Sören Auer, Sebastian Hellmann, Jens Lehmann, Roland Cornelissen, Amrapali Zaveri	We present a methodology for test-driven quality assessment of Linked Data, which is inspired by test-driven software development.
69	Don’t like RDF reification?: making statements about statements using singleton property	Vinh Nguyen, Olivier Bodenreider, Amit Sheth	In this paper, we propose a novel approach called Singleton Property for representing statements about statements and provide a formal semantics for it.
70	Comment-based multi-view clustering of web 2.0 items	Xiangnan He, Min-Yen Kan, Peichu Xie, Xiao Chen	In this paper, we systematically investigate how user-generated comments can be used to improve the clustering of Web 2.0 items.
71	Finding progression stages in time-evolving event sequences	Jaewon Yang, Julian McAuley, Jure Leskovec, Paea LePendu, Nigam Shah	In this paper, we develop a model-based method for discovering common progression stages in general event sequences.
72	On estimating the average degree	Anirban Dasgupta, Ravi Kumar, Tamas Sarlos	In this work we consider the problem of estimating the average degree of a large network using efficient random sampling, where the number of nodes is not known to the algorithm.
73	Who proposed the relationship?: recovering the hidden directions of undirected social networks	Jun Zhang, Chaokun Wang, Jianmin Wang	In this study, we engage in the investigation of directionality patterns on real-world directed social networks and summarize our findings using four consistency hypotheses.
74	User profiling in an ego network: co-profiling attributes and relationships	Rui Li, Chi Wang, Kevin Chen-Chuan Chang	In this paper, we study the problem of profiling user attributes in social network.
75	Attributed graph models: modeling network structure with correlated attributes	Joseph J. Pfeiffer, Sebastian Moreno, Timothy La Fond, Jennifer Neville, Brian Gallagher	In this work, we present the Attributed Graph Model (AGM) framework to jointly model network structure and vertex attributes.
76	WikiWho: precise and efficient attribution of authorship of revisioned content	Fabian Flöck, Maribel Acosta	As a solution, we propose a graph-based model to represent revisioned content and an algorithm over this model that tackles both issues effectively.
77	What makes a good biography?: multidimensional quality analysis based on wikipedia article feedback data	Lucie Flekova, Oliver Ferschke, Iryna Gurevych	In this paper, we study the user-perceived quality of Wikipedia articles based on a novel Wikipedia user feedback dataset.
78	What makes an image popular?	Aditya Khosla, Atish Das Sarma, Raffay Hamid	In this paper, we show the importance of image cues such as color, gradients, deep learning features and the set of objects present, as well as the importance of various social cues such as number of friends or number of photos uploaded that lead to high or low popularity of images.
79	STFU NOOB!: predicting crowdsourced decisions on toxic behavior in online games	Jeremy Blackburn, Haewoon Kwak	In this paper, we propose a supervised learning approach for predicting crowdsourced decisions on toxic behavior with large-scale labeled data collections; over 10 million user reports involved in 1.46 million toxic players and corresponding crowdsourced decisions.
80	Unveiling group characteristics in online social games: a socio-economic analysis	Taejoong Chung, Jinyoung Han, Daejin Choi, Taekyoung Ted Kwon, Huy Kang Kim, Yanghee Choi	In this paper, we analyze the group activities of users in Aion, one of the largest MMORPGs, based on the records of the activities of 94,497 users.
81	XXXtortion?: inferring registration intent in the .XXX TLD	Tristan Halvorson, Kirill Levchenko, Stefan Savage, Geoffrey M. Voelker	We use this information to characterize each xxx domain and infer the registrant’s most likely intent.
82	The bursty dynamics of the Twitter information network	Seth A. Myers, Jure Leskovec	Here, we study ways in which network structure reacts to users posting and sharing content.
83	Can cascades be predicted?	Justin Cheng, Lada Adamic, P. Alex Dow, Jon Michael Kleinberg, Jure Leskovec	In this work, we develop a framework for addressing cascade prediction problems.
84	How to influence people with partial incentives	Erik D. Demaine, MohammadTaghi Hajiaghayi, Hamid Mahini, David L. Malec, S. Raghavan, Anshul Sawant, Morteza Zadimoghadam	Our main theoretical contribution is to show how to adapt the major positive results from the integral case to the fractional case.
85	Fast topic discovery from web search streams	Di Jiang, Kenneth Wai-Ting Leung, Wilfred Ng	In this paper, we propose a novel probabilistic topic model, the Web Search Stream Model (WSSM), which is delicately calibrated for handling two salient features of the web search data: it is in the format of streams and in massive volume.
86	A hierarchical Dirichlet model for taxonomy expansion for search engines	Jingjing Wang, Changsung Kang, Yi Chang, Jiawei Han	In this paper, we study the problem of how to expand an existing category hierarchy for a search/navigation system to accommodate the information needs of users more comprehensively.
87	Recent and robust query auto-completion	Stewart Whiting, Joemon M. Jose	To address this trade-off, we propose several practical completion suggestion ranking approaches, including: (i) a sliding window of query popularity evidence from the past 2-28 days, (ii) the query popularity distribution in the last N queries observed with a given prefix, and (iii) short-range query popularity prediction based on recently observed trends.