Slovenian scientific texts: sources and description

Name: Slovenian scientific texts: sources and description

Duration: January 1, 2016 to December 31, 2018)

Project partners: leadership at UM-FERI, project coordinator is IJS

Funders: ARRS

In the project, we will create an extensive corpus of Slovenian scientific Slovenian, which will contain texts taken from the open science portal. The corpus is linguistically annotated with newly developed tools that further improve the quality of annotation of Slovenian language resources. We have developed methods for text classification and key phrase extraction that improve the usability of the Slovenian open science portal by enabling more complex content search, and recommendations of key phrases. The corpus serves as a basis for new methods of automated extraction of Slovenian terminology. The extracted terminological candidates will be published through a freely accessible web dictionary interface that will enable both browsing and editing of collections, which will enable Slovenian scientific communities from various fields to participate in managing the terminology of their field.