The goal was to investigate basic principles underlying efficient and effective search operations given the magnitude of information (big data) and distributed computing resources (cloud). I developed the theory of clustering paradox in distributed IR and studied its impacts on searching in large-scale information networks (see representative publication to appear in ACM TOIS 2013). I have also focused on the challenge of text clustering, particularly user-driven interactive clustering on large-scale data, and aimed to develop efficient and effective clustering methods using distributed computing resources. I have investigated web-based distributed methods as well as those based on the MapReduce/Hadoop framework.
I have also worked on text clustering and related scalability challenges. We have proposed several apporaches to make clustering efficient and scalable in highly interactive settings, e.g. for Scatter/Gather interactions. We've implemented several systems/interfaces based on the Scatter/Gather method: