Large-scale and distributed information retrieval

The goal was to investigate basic principles underlying efficient and effective search operations given the magnitude of information (big data) and distributed computing resources (cloud). I developed the theory of clustering paradox in distributed IR and studied its impacts on searching in large-scale information networks (see representative publication to appear in ACM TOIS 2013). I have also focused on the challenge of text clustering, particularly user-driven interactive clustering on large-scale data, and aimed to develop efficient and effective clustering methods using distributed computing resources. I have investigated web-based distributed methods as well as those based on the MapReduce/Hadoop framework.

I have also worked on text clustering and related scalability challenges. We have proposed several apporaches to make clustering efficient and scalable in highly interactive settings, e.g. for Scatter/Gather interactions. We've implemented several systems/interfaces based on the Scatter/Gather method: 

 

Related Information

Weimao Ke and Javed Mostafa publish in ACM TOIS

With the ubiquitous production, distribution and consumption of information, today’s digital envi- ronments such as the Web are becoming increasingly large and decentralized. It is hardly possible to obtain central control over information collections and systems in these environments. Search- ing for information in these information spaces has brought about problems beyond traditional boundaries of information retrieval (IR) research.

Related Publications