EWU Masters Thesis Collection

Off-campus Eastern Washington University users: To download EWU Only theses, please use the following link to log into our proxy server with your EWU NetID and password.

Non-EWU users: Please talk to your local librarian about requesting this thesis through Interlibrary loan.

Support vector machines, N-gram kernels, and text classification

John Mill, Eastern Washington University

Date of Award

Fall 2002

Rights

Access perpetually restricted to EWU users with an active EWU NetID

Document Type

Thesis: EWU Only

Degree Name

Master of Science (MS) in Computer Science

Department

Computer Science

Abstract

The expanding popularity of the Internet in recent years has lead to a corresponding increase in the amount of textual data available. This increase is found in the number of web pages, the size and complexity of search engines, and massive volumes of email. For any one attempting to sort through or make sense of this data, one of the fundamental tasks is text classification. Text classification is the task of identifying the category that a given piece of text or document belongs to. In the case of e-mail directed at an on line retailer the categories might be the various product departments. In the case of a search engine the category could be the set of documents relevant to a search topic. In recent years, a new inference method known as Support Vector Machines (SVMs) has been increasingly applied to the task of text classification. The results have been promising and research shows that they outperform several conventional methods. One the key components of SVMs are kernel functions. The choice of kernel function can have substantial effects on the performance of SVMs. In this paper we explore kernels based off of N-grams or consecutive sequences of words.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Recommended Citation

Mill, John, "Support vector machines, N-gram kernels, and text classification" (2002). EWU Masters Thesis Collection. 823.
https://dc.ewu.edu/theses/823

Download

Off-Campus Download

COinS

EWU Digital Commons

EWU Masters Thesis Collection

Support vector machines, N-gram kernels, and text classification

Date of Award

Rights

Document Type

Degree Name

Department

Abstract

Creative Commons License

Recommended Citation

Search

Browse

Author Corner

Links

EWU Digital Commons

EWU Masters Thesis Collection

Support vector machines, N-gram kernels, and text classification

Author

Date of Award

Rights

Document Type

Degree Name

Department

Abstract

Creative Commons License

Recommended Citation

Share

Search

Browse

Author Corner

Links