dc.description.abstract |
Optical Character Recognition (OCR) is now a reality for documents printed in English. In the present study, the groundwork for the recognition of Sinhala characters is done. Matrix Matching and Feature Analysis are the two commonly used methods for the recognition of English letters. In this study the Feature Analysis method is investigated to recognize Sinhala characters.
Matrix matching method is found to be suitable for recognizing documents containing text with known font and typeface. It should also be used to identify and extract the modifiers used on top, down or after the character. This helps in the identification of the base character using feature analysis.
Several features of Sinhala characters can be extracted by running simple programs on the pixel array of the character. These features include aspect ratio, inscribing octagon, and number of pixel curves crossed when the character is sliced at different angles. By running these programs on Sinhala characters one can prepare a set of values of these features for standard characters. Afterwards the features of an unknown character can be compared with the standard data for recognition.
Programming for Sinhala Character Recognition is done using MathCAD, an application package for complex mathematical calculations. Since the algorithms are written in pseudocode it is easy to convert these algorithms to a C++ program. |
en_US |