Off-campus Eastern Washington University users: To download EWU Only theses, please use the following link to log into our proxy server with your EWU NetID and password.
Non-EWU users: Please talk to your local librarian about requesting this thesis through Interlibrary loan.
Date of Award
Access perpetually restricted to EWU users with an active EWU NetID
Thesis: EWU Only
Master of Science (MS) in Computer Science
This thesis is an exploration of hybrid dictionary/statistical algorithms for compressing textual information. The specific intent was to attempt to find an algorithm which would yield greater compression on text files than the algorithm PPMC without requiring significantly greater computing resources; PPMC is widely regarded as the algorithm which yields the greatest compression on most typical text files. Existing hybrid algorithms are reviewed in the initial sections of the thesis, and then the research results from this work are reported. Three distinctly different and to the author's knowledge new approaches were investigated. The first approach was to cascade arithmetic coding with LZMW; some improvement in compression over LZMW was obtained, but the compression on the test files was inferior to PPMC. The second approach was a textual substitution algorithm referred to as VOWL. VOWL is a generalization of dictionary algorithms, in which there is no context, i.e. no overlap between successive dictionary entries encoded, and PPMC, in which there is exactly one new character encoded beyond the context. VOWL obtained less compression than did PPMC. The third approach was to modify PPMC by augmenting the character set at the maximum model order. This algorithm is referred to as PPMW. Instead of encoding only one character at a time, PPMW can encode a complete word ending. PPMW gave an average reduction of 5.6 percent in the size of the compressed file in comparison with PPMC for the five text files evaluated. Also, PPMW executed on average over 20 percent faster than PPMC. The author is unaware of any other claim in the literature for improvements of this magnitude to PPMC.
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Piotrowski, Michael E., "Hybrid dictionary/statistical text compression algorithms" (1992). EWU Masters Thesis Collection. 878.