Papers of LiNCS researchers accepted to ACM/IEEE JCDL

HomeOne research paper authored by Weimao Ke and the other first-authored by Xuemei Gong have been accepted to the ACM/IEEE Joint Conference on Digital Libraries (JCDL'13).

Ke, Weimao. "Information-theoretic Term Weighting Schemes for Document Clustering." In ACM/IEEE Joint Conference on Digital Libraries, 1-10., 2013.

We propose a new theory that quantifies information in probability distributions and derive a new document representation model for text clustering. By extending Shannon entropy to accommodate a non-linear relation between information and uncertainty, the proposed Least Information theory (LIT) provides insight into how terms can be weighted based on their probability distributions in documents vs. in the collection. We derive two basic quantities in the document clustering context: 1) LI Binary (LIB) which quantifies information due to the observation of a term's (binary) occurrence in a document; and 2) LI Frequency (LIF) which measures information for the observation of a randomly picked term from the document. Both quantities are computed given term distributions in the document collection as prior knowledge and can be used separately or combined to represent documents for text clustering. Experiments on four benchmark text collections demonstrate strong performances of the proposed methods compared to classic TF*IDF. Particularly, the LIB*LIF weighting scheme, which combines LIB and LIF, consistently outperforms TF*IDF in terms of multiple evaluation metrics. The least information measure has a potentially broad range of applications beyond text clustering.

 
Gong, Xuemei, Weimao Ke, Yan Zhang, and Ramona Broussard. "nteractive Search Result Clustering: A Study of User Behavior and Retrieval Effectiveness." In ACM/IEEE Joint Conference on Digital Libraries, 1-4. Indianapolis, IN, 2013.

Scatter/Gather is a document browsing and information retrieval method based on document clustering. It is designed to facilitate user articulation of information needs through iterative clustering and interactive browsing. This paper reports on a study that investigated the effectiveness of Scatter/Gather browsing for information retrieval. We conducted a within-subject user study of 24 college students to investigate the utility of a Scatter/Gather system, to examine its strengths and weaknesses, and to receive feedback from users on the system. Results show that the clustering-based Scatter/Gather method was more difficult to use than the classic information retrieval systems in terms of user perception. However, clustering helped the subjects accomplish the tasks more efficiently. Scatter/Gather clustering was particularly useful in helping users finish tasks that they were less familiar with and allowed them to search with fewer words. Scatter/Gather tended to be more useful when it was more difficult for the user to do query specification for an information need. Topic familiarity and specificity had significant influences on user perceived retrieval effectiveness. The influences appeared to be greater with the Scatter/Gather system compared to a classic search system. Topic familiarity also had significant influences on query formulation.