Scatter/Gather Browser on TREC HARD track (news)

Scatter/Gather Browser

Live demo at: http://lincs.ischool.drexel.edu/sgbrowser/btrec/

We implemented a Scatter-Gather browser based on the proposed LAIR2 algorithm. Using information visualization techniques, this browser would help users refine their search and narrow down search results interactively and visually. Features of the interface were carefully designed in terms of what representations were most appropriate for Scatter/Gather, what information could be efficiently conveyed through visualization, and how all this could be integrated to support Scatter/ Gather interactions [12, 7]. As shown in Figure, the primary elements of the user interface include:

  • Cluster (color-coded circle): a group of documents similar/ related to each other.
  • Cluster size (radius, log-tranformed): determined by the number of documents that belong to a cluster. Note that it is based on a log function of the number in order to be scalable.
  • Cluster color: determined by the homogeneity of the given cluster–the warmer the color, the higher the level of homogeneity of the cluster.
  • Cluster position: determined by similarity of two given clusters–similar clusters tend to be together.
  • Gather & Scatter button: function for clustering on selected clusters, after which a new level of clusters will be generated and displayed.
  • Back button: used to return to previous cluster selection at any time in the Scatter/Gather process.
  • Reset button: used to reset the Scatter/Gather browser to its default initial state, i.e., to top level clusters.
  • Slider: used to control the desired number of clusters.

Using the above mentioned functionalities, the systems operate in the following way. The initial index page of the Scatter/Gather browser shows, by default, seven clusters/ nodes displaying seven main topics of the the text collection. These clusters are arranged near or away from each other, based on similarities of associated documents. Moving the cursor over a specific cluster displays more information about it in the middle window.
The list of articles (initially all articles) related to the shown clusters is displayed in the bottom window. The first page shows the first ten related documents with brief descriptions and links to detailed information. Links to additional pages appear at the bottom of the current page. Clicking on a title will display the document in the bottom-right frame, where the user can read the article and determine if it is relevant.
For searching on the desired topic, the user selects one or more clusters by clicking on the clusters. A blue border appears around the selected clusters, identifying them as chosen for further examination. To deselect a cluster, the user clicks again on the same cluster and the blue border disappears. To produce the iteration, the user presses the Gather & Scatter button, located on the top left side of the window. This produces a new display of clusters, showing information related to the selected clusters.
In both the article list and article display frames, the user is presented with a set of rating buttons, i.e., the icons showing one star to three stars. The user can rate how relevant the article is by clicking one of the “star” buttons (one-star denotes somewhat relevant and three-stars highly relevant). Once selected, the article will appear in the retrieved articles frame on the upper right side of the display. A “delete” button (with a trash can icon) is provided in case the user later decides that an article is irrelevant. The user can continue to search and select articles in this way. At any time, the user can click on the Reset button to return to the default initial page (without losing any of the retrieved/ranked documents). 
 

References: