Content based visualization of linked online data
Piesala, Iines (2015)
Piesala, Iines
2015
Tietotekniikan koulutusohjelma
Tieto- ja sähkötekniikan tiedekunta - Faculty of Computing and Electrical Engineering
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2015-11-04
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tty-201510221685
https://urn.fi/URN:NBN:fi:tty-201510221685
Tiivistelmä
The amount of data has grown exponentially since the first introduction of Web. The search for specific data from the Internet is ever more difficult because of the amount of data the web holds. Search engines offer a way to search the Internet based on keywords. Search results are shown as a list of links to most relevant websites. The search engines search and present well responses to specific keywords. However these search engines do not handle questions about overall picture of a keyword well.
This master’s thesis researched an idea how to enable the search of whole ecosystem and showing the result to the user. The search would only take simple a user request as input. An example story was used throughout this thesis. This story was about user, who wanted to see all technology conferences visualized through connections of speakers and sponsors of technology conferences. This visualization would have enabled the user to see at glance how many sponsors and speakers the technology conferences have in common. The goal of this master’s thesis was either to find existing application or build own implementation, which would solve the problem previously described.
The proposed systems theory is based on genre recognition of a website, web search, named-entity recognition (NER) and graphs. Based on the scientific literature these subjects were presented and discussed. There were no existing application that would have worked as mentioned before. The other option was to implement the proposed system, this option was chosen. The prototype application is a mix of own implementation and external APIs. These APIs were used to search the web and recognize named-entities. The prototype application was implemented as web application, which used web technologies such as JavaScript and Node.js.
The prototype application was tested with a case study. The case study used the technology conference example mentioned above. The results of the prototype application were compared to manually acquired data from five technology conference websites. 82% of the technology conferences found by the prototype application were real technology conferences. Based on the results the speakers were more recognized than the sponsors. However the sponsors were more accurately recognized. Only few of the sponsors in the result graph were not actual sponsors of the conferences. The resulting graph had more false speakers than false sponsors.
The prototype application proved the idea successful. However the prototype application did not meet the initial plan of general usage. The technology conference case study showed the potential of the idea. Still further research and work is needed to utilize the full potential of the prototype application.
This master’s thesis researched an idea how to enable the search of whole ecosystem and showing the result to the user. The search would only take simple a user request as input. An example story was used throughout this thesis. This story was about user, who wanted to see all technology conferences visualized through connections of speakers and sponsors of technology conferences. This visualization would have enabled the user to see at glance how many sponsors and speakers the technology conferences have in common. The goal of this master’s thesis was either to find existing application or build own implementation, which would solve the problem previously described.
The proposed systems theory is based on genre recognition of a website, web search, named-entity recognition (NER) and graphs. Based on the scientific literature these subjects were presented and discussed. There were no existing application that would have worked as mentioned before. The other option was to implement the proposed system, this option was chosen. The prototype application is a mix of own implementation and external APIs. These APIs were used to search the web and recognize named-entities. The prototype application was implemented as web application, which used web technologies such as JavaScript and Node.js.
The prototype application was tested with a case study. The case study used the technology conference example mentioned above. The results of the prototype application were compared to manually acquired data from five technology conference websites. 82% of the technology conferences found by the prototype application were real technology conferences. Based on the results the speakers were more recognized than the sponsors. However the sponsors were more accurately recognized. Only few of the sponsors in the result graph were not actual sponsors of the conferences. The resulting graph had more false speakers than false sponsors.
The prototype application proved the idea successful. However the prototype application did not meet the initial plan of general usage. The technology conference case study showed the potential of the idea. Still further research and work is needed to utilize the full potential of the prototype application.