Off-campus Eastern Washington University users: To download EWU Only theses, please use the following link to log into our proxy server with your EWU NetID and password.
Non-EWU users: Please talk to your local librarian about requesting this thesis through Interlibrary loan.
Date of Award
Fall 2002
Rights
Access perpetually restricted to EWU users with an active EWU NetID
Document Type
Thesis: EWU Only
Degree Name
Master of Science (MS) in Computer Science
Department
Computer Science
Abstract
The expanding popularity of the Internet in recent years has lead to a corresponding increase in the amount of textual data available. This increase is found in the number of web pages, the size and complexity of search engines, and massive volumes of email. For any one attempting to sort through or make sense of this data, one of the fundamental tasks is text classification. Text classification is the task of identifying the category that a given piece of text or document belongs to. In the case of e-mail directed at an on line retailer the categories might be the various product departments. In the case of a search engine the category could be the set of documents relevant to a search topic. In recent years, a new inference method known as Support Vector Machines (SVMs) has been increasingly applied to the task of text classification. The results have been promising and research shows that they outperform several conventional methods. One the key components of SVMs are kernel functions. The choice of kernel function can have substantial effects on the performance of SVMs. In this paper we explore kernels based off of N-grams or consecutive sequences of words.
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Recommended Citation
Mill, John, "Support vector machines, N-gram kernels, and text classification" (2002). EWU Masters Thesis Collection. 823.
https://dc.ewu.edu/theses/823