Off-campus Eastern Washington University users: To download EWU Only theses, please use the following link to log into our proxy server with your EWU NetID and password.

Non-EWU users: Please talk to your local librarian about requesting this thesis through Interlibrary loan.

Date of Award

Winter 2005

Rights

Access perpetually restricted to EWU users with an active EWU NetID

Document Type

Thesis: EWU Only

Degree Name

Master of Science (MS) in Computer Science

Department

Computer Science

Abstract

Indexing is significant for real-time responses to queries submitted to database management systems (DBMS). This real-time response is realized by being able to efficiently identify locations of data stored in a database given the value of an attribute. Such efficiency is determined in terms of asymptotic computational complexity of searching; i.e. constant if indexes are optimal to queries (i.e. the best case), and the order linearly corresponding to the number of records stored in a database (i.e. the worst case). Currently, the generation of indexes is performed manually by database administrators so that optimality is entirely up to the administrators. To identify the optimality, data stored in a database needs to be analyzed. In fields of data analysis as well as machine learning, there is a method of so-called clustering analysis. This method identifies a partition (i.e. a collection of classes or clusters) over data corresponding to their density. The thesis generally hypothesizes that a result of clustering analysis strongly corresponds to the optimal indexes. In specific, partitions on data generated as a result of clustering analysis lead to the generation of optimal indexes as well as other potential contributions to query processing. The results of this study should make a significant impact on efforts of query optimization and, to be more specific, automated indexing for query processing on DBMS. The scope of this thesis mainly concerns a preliminary study by incorporating clustering analysis within a prototype of realistic applications such as apartment search. Fuzzy c-means clustering together with consideration of categorical data is investigated in comparative studies of fuzzy partitions generated as a result and actual data stored on a database referred to by queries. Data was formed as tables consisting of 2 or 3 attributes. Results in this thesis suggest that fuzzy partitions considered as optimal indexes bring more satisfactory results to the user compared to indexes based on (crisp) partitions. In addition, we find that such indexes based on clustering analysis may be utilized to interpolate missing values when generating responses to queries on DBMS

Share

COinS