Hadoop Data Explorer


The system prototype interface is available at:


Large-scale Data Exploration

Hadoop is a powerful framework for processing large-scale data but works primarily in the batch mode without user interaction. There are many scenarios in which users such as business analysts and data scientists need to:

Scatter/Gather Searching and Browsing with Bing API

Scatter/Gather Browser on TREC HARD track (news)

Scatter/Gather Browser on Computing & Information Sciences

Information theory and retrieval modeling

This research aimed to study existing information theories as well as potential new information measures that can be used for IR modeling, among other applications. I have developed a new theory, namely the Least Information Theory (LIT), and conducted several studies to evaluate its application in IR, which produced very strong empirical results as compared to classic methods derived from existing theories.

Complex systems and networks

My work on distributed IR has drawn on theories and inspirations not only from information retrieval but also from complex networks research (interconnectivity of distributed systems). Understanding structural properties of interconnected systems provides important insight into a broad range of applications such as communication, distributed computing, and bibliometrics (e.g., by taking citations/coauthorships as network edges).

Large-scale and distributed information retrieval

The goal was to investigate basic principles underlying efficient and effective search operations given the magnitude of information (big data) and distributed computing resources (cloud). I developed the theory of clustering paradox in distributed IR and studied its impacts on searching in large-scale information networks (see representative publication to appear in ACM TOIS 2013).

Hybrid Scatter/Gather browsing based on Bing search API

Another Scatter/Gather implementation for searching + browsing based on Bing search API:

Scatter/Gather browser on a news collection

We implemented a Scatter/Gather browser using 33k news documents provided by HARD track from the Text REtrieval Conference (TREC). Scatter/Gather provides a different approach (from classic paradigms of searching and browsing) to finding information and is often useful in situations where the user may feel difficult to formulate a search query. The included image is a snapshot a working/demo system, which can be accessed at:


Subscribe to RSS - Projects