Processing Academic Article Information for Predicting the Quality of Academic Journals
Kolesnikov, Oleg (2018)
Kolesnikov, Oleg
2018
Teknis-luonnontieteellinen tiedekunta - Faculty of Natural Sciences
This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Hyväksymispäivämäärä
2018-12-05
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:tty-201811082554
https://urn.fi/URN:NBN:fi:tty-201811082554
Tiivistelmä
The main method of corresponding scientific ideas and results is to publish articles. The number of scientific journals has been growing rapidly in recent years. All journals are not of the same quality and standard. There are even predatory journals that aim at earn-ing through publication fees paid by the authors. All of this has made it necessary to rank scientific publication forums according to their quality and selectiveness. There is also a Finnish ranking system known as JUFO (julkaisufoorumi).
On the other hand, there are several open electronic libraries available that record in-formation of published articles. Well known examples of such libraries include Google scholar, Semantic scholar, and Digital Bibliography and Library Project (DBLP). The last library is concentrated particularly on computer science publications. Also scientific publishers maintain their own repositories of articles published in their books, journals, and collections. Well known examples include Springer and Elsevier as well as more computing related Association for Computing Machinery (ACM) and Institute of Elec-trical and Electronics Engineers (IEEE).
Full article texts of publications that are proprietary to commercial publishers are not usually openly available. Electronic libraries gather all basic information like author names, article title, publication year, and so forth. Sometimes also the abstract of the paper is publicly available. In particular, publishing houses are willing to divulge this information.
It is a natural idea to connect electronic libraries and publication ranking sites together and try to learn to rank journals automatically. This is the setting of this work. This work aims at implementing a program that is able to collect a comprehensive data set consisting of article information and the ranking given for the journal. We concentrate on the DBLP library and JUFO ranking. As a second contribution we validate the feasi-bility of the proposed approach by applying a couple of machine learning algorithms from the WEKA collection to the data set collected. The experiments show that quite high prediction accuracies can be achieved by using the information gathered from the abstracts of the articles.
On the other hand, there are several open electronic libraries available that record in-formation of published articles. Well known examples of such libraries include Google scholar, Semantic scholar, and Digital Bibliography and Library Project (DBLP). The last library is concentrated particularly on computer science publications. Also scientific publishers maintain their own repositories of articles published in their books, journals, and collections. Well known examples include Springer and Elsevier as well as more computing related Association for Computing Machinery (ACM) and Institute of Elec-trical and Electronics Engineers (IEEE).
Full article texts of publications that are proprietary to commercial publishers are not usually openly available. Electronic libraries gather all basic information like author names, article title, publication year, and so forth. Sometimes also the abstract of the paper is publicly available. In particular, publishing houses are willing to divulge this information.
It is a natural idea to connect electronic libraries and publication ranking sites together and try to learn to rank journals automatically. This is the setting of this work. This work aims at implementing a program that is able to collect a comprehensive data set consisting of article information and the ranking given for the journal. We concentrate on the DBLP library and JUFO ranking. As a second contribution we validate the feasi-bility of the proposed approach by applying a couple of machine learning algorithms from the WEKA collection to the data set collected. The experiments show that quite high prediction accuracies can be achieved by using the information gathered from the abstracts of the articles.