Terms Reduction with Web Page Clustering

Abstract

As the size of the World Wide Web has grown largely, it has become difficult to retrieve useful information quickly. User looking for information may have to browse lots of pages to get the desired information from the pool of World Wide Web (WWW). A technique is required which organizes documents content efficiently so that information can be easily obtained from largest data repository of WWW. Clustering is an unsupervised classification technique which puts related data in one set (Cluster). Clustering can help user to get interested information quickly from these abundance of information. However clustering methods are suffered from the huge size of documents with the high dimensionality of text features. We have proposed web page clustering scheme that works efficiently in higher dimension. We have presented the method to reduce the dimensionality of the feature vector by selecting the most informative words and still maintaining the quality of the clusters.

Topics

    6 Figures and Tables

    Download Full PDF Version (Non-Commercial Use)