Development of Machine Learning Applications: Named Entity Recognizer
Abarbou, Ghassan (2018)
Abarbou, Ghassan
2018
Tietojenkäsittelytieteiden tutkinto-ohjelma - Degree Programme in Computer Sciences
Luonnontieteiden tiedekunta - Faculty of Natural Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2018-06-14
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:uta-201806282137
https://urn.fi/URN:NBN:fi:uta-201806282137
Tiivistelmä
Machine Learning is described in today’s Information Technology world as one of the most promising research fields with great potential for providing a huge paradigm shift in modern systems. With the growth and the abundant availability of data, the need to structure, analyze and exploit these data has become a necessity for modern systems and a must for the major players within the field. Systems need to discover and structure data with minimal human involvement, while being able to adapt to the nature of the data, handle unseen patterns and still structure the data properly. One of the best-known applications of Machine Learning and one which output is considered the building block upon which more advanced systems rely is Named Entity Recognition. Named Entity Recognition (NER) is a classification task known better as one of the major applications of Natural Language Processing, which consists of classifying and assigning descriptive labels to sequences of text based on predefined classification categories.
The presented work aims at the conceptualization, design, implementation and evaluation of a system able to perform Named Entity Recognition on different datasets, with the maximum attainable performance by using the best result-yielding techniques and following the conventions of the field. The developed system implements a well-known statistical prediction framework proven to be best suited for classification tasks similar to NER; Conditional Random Fields (CRF) models were used to perform the initial recognition. Combined with the CRF models, the system developed different postprocessing methods to implement a Hybrid NER system oriented towards achieving performance levels comparable to the state-of-the-art literature in the field.
The research achieved language independent NER using the core of the developed system, and satisfying performance levels that were evaluated by conducting different experiments with different datasets and on different types of data.
The presented work aims at the conceptualization, design, implementation and evaluation of a system able to perform Named Entity Recognition on different datasets, with the maximum attainable performance by using the best result-yielding techniques and following the conventions of the field. The developed system implements a well-known statistical prediction framework proven to be best suited for classification tasks similar to NER; Conditional Random Fields (CRF) models were used to perform the initial recognition. Combined with the CRF models, the system developed different postprocessing methods to implement a Hybrid NER system oriented towards achieving performance levels comparable to the state-of-the-art literature in the field.
The research achieved language independent NER using the core of the developed system, and satisfying performance levels that were evaluated by conducting different experiments with different datasets and on different types of data.