Abstract:
Ever growing knowledge bases of enterprises present the demanding challenge of proper organization of
information that would enable fast retrieval of related and intended information. Document repositories of enterprises
consist of large collections of documents of varying size, format and writing styles. This diversified and unstructured
nature of documents restrict the possibilities of developing uniform techniques for extracting important concepts and
relationships for summarization, structured representation and fast retrieval. The documented textual content is used as
the input for the construction of this concept map. Here a rule based approach is used to extract concepts and
relationships among them. Sentence level breakdown enables these rules to identify those concepts and relationships.
These rules are based on elements in a phase structure tree of a sentence. For improving accuracy and the relevance of
the extracted concepts and relationships, the special features such as titles, bold and upper case texts are used.
This paper discusses how to overcome these challenges by utilizing high level natural language processing
techniques, document preprocessing techniques and developing easily understandable and extractable compact
representation of concept maps. Each document in the repository is converted to a concept map representation to
capture concepts and relationships among concepts described in the said document. This organization would represent
a summary of the document. These individual concept maps are utilized to generate concept maps that represent
sections of the repository or the entire document repository. This paper discusses how the statistical techniques are
used to calculate certain metrics which facilitate certain requirements of the solution. Principle component analysis is
used in ranking the documents by importance. The concept map is visualized using force directed type graphs which
represent concepts by nodes and relationships byedges.