Abstract:
Humans make lot of decisions in their day-to-day life. In order to make right decisions they need more information. The WWW contains enormous amount of information. It is a huge complex system. Finding correct up-to-date information on WWW is a difficult task. Search engines make that task easier. Search engines are the main tool used to search WWW. If a search engine starts searching for a web page as soon as user enters the query, the searching will take almost infinite time since WWW is a vast collection of web pages. The reason search engine provides information so quickly is due to the fact that search engine has already crawled the WWW and stored data in an index. Index is one the most important components of the search engine.
Web crawlers are used to populate index of a search engine by crawling web sites. Search engine without an up-to-date index is pointless since web pages in WWW gets updated all the time and more and more web pages and web sites emerges. Therefor this index has to be updated regularly. Maintaining an index with up-to-date information on such a complex system is a difficult task. This project addresses the issue of inefficient information retrieval in search engine domain.
The social networks and other social media reflect current world trends. Therefor social networks can be used to identify current world trends. This project uses swarm intelligence to identify current world trends via social networks. This is done by collecting and analyzing status messages and micro-blog messages that users publish on popular social networks. Swarm intelligence is used to analyze the status messages and micro-blog messages on social networks; and identifies the current world trends.
A MAS based web crawler system was designed and developed to crawl the WWW based on current world trends identified by swarm intelligence based analysis of status messages and micro-blog messages on popular social networks. The proposed MAS based crawler system was compared against a conventional crawler system on identifying newly updated web pages. The proposed MAS based web crawler system, crawls web efficiently to retrieve updated web pages based on their utility value or the importance on timely manner based on the identified current world trends.