Abstract:
Automated software log file analysis holds an important position in software maintenance. Currently available analysis
tools are not generic. They tend to focus on specific software or servers and their flexibilities are minimal.
Furthermore, costs of commercially available log analysis tools are not affordable for small and medium scale firms.
This has left a void in the market for generic, customizable and open source log file analysis tools. The impediment to
such a tool emerging is the unavailability of a generic log file data extraction mechanism. A generic log file format
definition language and an underlying persistent data storage system is a solution to this problem. Log file structures
could be defined by the aforementioned language and the data extracted would be stored in the persistent storage. This
methodology enables generic log file analysis on top of the extracted data. Through the research and implementations
carried out, it was identified that a modified version of simple declarative language is suitable for the log file format
definition language. II would have the capability of handling and defining all patterns of text based log files.
Additionally. the results revealed that the appropriate storage mechanism would be an Extensible Markup Language
(XML) database mainly because of the similarities between the hierarchical nature of XML and common log file
structures.