Off-campus Eastern Washington University users: To download EWU Only theses, please use the following link to log into our proxy server with your EWU NetID and password.

Non-EWU users: Please talk to your local librarian about requesting this thesis through Interlibrary loan.

Date of Award

Fall 2002

Rights

Access perpetually restricted to EWU users with an active EWU NetID

Document Type

Thesis: EWU Only

Degree Name

Master of Science (MS) in Computer Science

Department

Computer Science

Abstract

The expanding popularity of the Internet in recent years has lead to a corresponding increase in the amount of textual data available. This increase is found in the number of web pages, the size and complexity of search engines, and massive volumes of email. For any one attempting to sort through or make sense of this data, one of the fundamental tasks is text classification. Text classification is the task of identifying the category that a given piece of text or document belongs to. In the case of e-mail directed at an on line retailer the categories might be the various product departments. In the case of a search engine the category could be the set of documents relevant to a search topic. In recent years, a new inference method known as Support Vector Machines (SVMs) has been increasingly applied to the task of text classification. The results have been promising and research shows that they outperform several conventional methods. One the key components of SVMs are kernel functions. The choice of kernel function can have substantial effects on the performance of SVMs. In this paper we explore kernels based off of N-grams or consecutive sequences of words.

Share

COinS