Weimao Ke and Javed Mostafa publish in ACM TOIS

With the ubiquitous production, distribution and consumption of information, today’s digital envi- ronments such as the Web are becoming increasingly large and decentralized. It is hardly possible to obtain central control over information collections and systems in these environments. Search- ing for information in these information spaces has brought about problems beyond traditional boundaries of information retrieval (IR) research. This article addresses one important aspect of scalability challenges facing information retrieval models and investigates a decentralized, organic view of information systems pertaining to search in large-scale networks. Drawing on observations from earlier studies, we conducted a series of experiments on decentralized searches in large-scale networked information spaces. Results show that how distributed systems interconnect is crucial to retrieval performance and scalability of searching. Particularly, in various experimental settings and retrieval tasks, we find a consistent phenomenon, namely the clustering paradox, in which the level of network clustering (semantic overlay) imposes a scalability limit. Scalable searches are well supported by a specific, balanced level of network clustering emerging from local system interconnectivity. Departure from that level, either stronger or weaker clustering, leads to search performance degradation, which is dramatic in large-scale networks. 

Ke WMostafa J.  2013.  Studying the Clustering Paradox and Scalability of Search in Highly Distributed EnvironmentsTo appear in ACM Transactions on Information Systems. 31(2):1-40. Google Scholar BibTex RIS