Weimao Ke's paper receives best paper nomination at JCDL

One research paper authored by Weimao Ke has been nominated for the Vannevar Bush Best Paper Award for the ACM/IEEE Joint Conference on Digital Libraries (JCDL'13).
Ke, Weimao. "Information-theoretic Term Weighting Schemes for Document Clustering." In ACM/IEEE Joint Conference on Digital Libraries, 1-10., 2013.

We propose a new theory that quantifies information in probability distributions and derive a new document representation model for text clustering. By extending Shannon entropy to accommodate a non-linear relation between information and uncertainty, the proposed Least Information theory (LIT) provides insight into how terms can be weighted based on their probability distributions in documents vs. in the collection. We derive two basic quantities in the document clustering context: 1) LI Binary (LIB) which quantifies information due to the observation of a term's (binary) occurrence in a document; and 2) LI Frequency (LIF) which measures information for the observation of a randomly picked term from the document. Both quantities are computed given term distributions in the document collection as prior knowledge and can be used separately or combined to represent documents for text clustering. Experiments on four benchmark text collections demonstrate strong performances of the proposed methods compared to classic TF*IDF. Particularly, the LIB*LIF weighting scheme, which combines LIB and LIF, consistently outperforms TF*IDF in terms of multiple evaluation metrics. The least information measure has a potentially broad range of applications beyond text clustering.