Abstract:
With the advancement of technology, the need for maintaining national data and information becomes important. Most of these data and information have to be maintained in the local languages because majority of the Sri Lankans are still not very conversant in English. Therefore when public organizations embrace IT, their data including personal data has to be maintained in local languages. When data and information are available in the local language, searching and retrieving them using the local language become essential./
Proper nouns have an inherent problem because a given proper noun, for example a name can be spelt in several different ways. This problem becomes more prominent when a name from one language origin is spelt using another language. For example, the Sinhala name S®dg)d can be spelt in several ways such as Se&azsto, B&odo or Sg»26>3 using Sinhala itself. Therefore, one who would search an information store for a proper name may not encounter a match, if a different spelling is used to search from that being stored./
This research was to provide a solution to the problem mentioned above using Sinhala language. That is to build a rule based search application that would take a Sinhala input string, search an information store and retrieve matching results even if they were stored with a different spelling.
This was achieved by building a rule base to replace characters of a key word with different characters in order to generate a set of words with different spelling. Then this set of words is searched in the information store and results are displayed. Rules were organized in different levels so that the user can select the level of charactcr replacement, thus it would retrieve matches with a slight spelling difference or retrieve matches with drastic spelling differences. /A special rule set was built for matching Tamil names written in Sinhala. The user has option to independently enable/disable this rule set. An application, which uses a general-purpose rule engine to process rules was designed and implemented to demonstrate this technology. This application consist of a web based user interface and a sample database as the information store. This was designed in a layered architecture such that future expansions and component reuse can be done. All character replacement rules are declared in text files, so changes and updates to the rule base can be done without modifying the system./
It is shown that the application, with the rule base that was built, will provide a solution to the proper name search problem stated above. This system can be integrated with future information systems in government and business organisations.
Citation:
Fernando, S.C. (2007). Inexact matching of proper names in Sinhala [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.mrt.ac.lk/handle/123/666