Title page for etd-0214107-150013


[Back to Results | New Search]

URN etd-0214107-150013
Author Hsiao-Wen Liu
Author's Email Address m934020032@student.nsysu.edu.tw
Statistics This thesis had been viewed 5745 times. Download 22 times.
Department Information Management
Year 2006
Semester 1
Degree Master
Type of Document
Language English
Title Summary-based document categorization with LSI
Date of Defense 2006-07-20
Page Count 54
Keyword
  • Document Categorization
  • Latent Semantic Indexing
  • Text Summarization
  • Abstract Text categorization to automatically assign documents into the appropriate pre-defined category or categories is essential to facilitating the retrieval of desired documents efficiently and effectively from a huge text depository, e.g., the world-wide web. Most techniques, however, suffer from the feature selection problem and the vocabulary mismatch problem. A few research works have addressed on text categorization via text summarization to reduce the size of documents, and consequently the number of features to consider, while some proposed using latent semantic indexing (LSI) to reveal the true meaning of a term via its association with other terms. Few works, however, have studied the joint effect of text summarization and the semantic dimension reduction technique in the literature. The objective of this research is thus to propose a practical approach, SBDR to deal with the above difficulties in text categorization tasks.
    Two experiments are conducted to validate our proposed approach. In the first experiment, the results show that text summarization does improve the performance in categorization. In addition, to construct important sentences, the association terms of both noun-noun and noun-verb pairs should be considered. Results of the second experiment indicate slight better performance with the approach of adopting LSI exclusively (i.e. no summarization) than that with SBDR (i.e. with summarization). Nonetheless, the minor accuracy reduction can be largely compensated for the computational time saved using LSI with text summarized. The feasibility of the SBDR approach is thus justified.
    Advisory Committee
  • Wen-Feng Hsiao - chair
  • Pei-Chen Sun - co-chair
  • Te-Min Chang - advisor
  • Files
  • etd-0214107-150013.pdf
  • indicate in-campus access in a year and off_campus not accessible
    Date of Submission 2007-02-14

    [Back to Results | New Search]


    Browse | Search All Available ETDs

    If you have more questions or technical problems, please contact eThesys