Palestras

Towards a Distributed Search Engine

Prof. Ricardo Baeza-Yates
Yahoo! Research
Barcelona, Spain

Abstract:Distributed search engines are often more complex to implement compared to centralized engines. Distributing a search engine across multiple sites, however, has several advantages. In particular, it enables the utilization of less computer resources and the exploitation of data and user locality. In this presentation we show the feasibility of distributed Web search engines, by proposing a model for assessing the total cost of a distributed Web-search engine that includes the computational costs as well as the communication cost among all distributed sites. Using examples, we show that a distributed Web search engine can be more cost effective than a centralized one, if there is a large percentage of local queries, which is usually the case. We then present a query-processing algorithm that maximizes the amount of queries answered locally, without sacrificing the quality of the results, by using caching and partial replication. We simulate our algorithm on real document collections and real query workloads to measure the actual parameters needed for our cost model, and we show that a distributed search engine can be competitive compared to a centralized architecture with respect to cost.

Biography:

Ricardo Baeza-Yates is VP of Yahoo! Research for Europe, Middle East and Latin America, leading the labs at Barcelona, Spain and Santiago, Chile; as well as supervising the Haifa lab. Until 2005 he was the director of the Center for Web Research at the Department of Computer Science of the Engineering School?of the University of Chile; and ICREA Professor at the Dept. of Technology of Univ. Pompeu Fabra in Barcelona, Spain. He is co-author of the book Modern Information Retrieval, published in 1999 by Addison-Wesley, as well as co-author of the 2nd edition of the Handbook of Algorithms and Data Structures, Addison-Wesley, 1991; and co-editor of Information Retrieval: Algorithms and Data Structures, Prentice-Hall, 1992, among more than 150 other publications. He has received the Organization of American States award for young researchers in exact sciences (1993) and with two Brazilian colleagues obtained the COMPAQ prize for the best CS Brazilian research article (1997). In 2003 he was the first computer scientist to be elected to the Chilean Academy of Sciences. During 2007 he was awarded the Graham Medalfor innovation in computing, given by the University of Waterloo to distinguished ex-alumni.

Notícias

XXIV SBBD XXIII SBES - 05 a 09 de Outubro de 2009