Browsing by Author "Yenice, A. B."
Now showing 1 - 5 of 5
- Results Per Page
- Sort Options
Conference paperPublication Open Access Introducing StarDust: A UD-based Dependency Annotation Tool(European Language Resources Association (ELRA), 2022) Yenice, A. B.; Cesur, N.; Kurgun, A.; Yıldız, Olcay Taner; Computer Science; YILDIZ, Olcay TanerThis paper aims to introduce StarDust, a new, open-source annotation tool designed for NLP studies. StarDust is designed specifically to be intuitive and simple for the annotators while also supporting the annotation of multiple languages with different morphological typologies, e.g. Turkish and English. This demonstration will mainly focus on our UD-based annotation tool for dependency syntax. Linked to a morphological analyzer, the tool can detect certain annotator mistakes and limit undesired dependency relations as well as offering annotators a quick and effective annotation process thanks to its new simple interface. Our tool can be downloaded from the Github. © 2022 European Language Resources Association (ELRA).Conference paperPublication Open Access A learning-based dependency to constituency conversion algorithm for the turkish language(European Language Resources Association (ELRA), 2022) Marşan, B.; Yıldız, O. K.; Kuzgun, A.; Yenice, A; Cesur, N.; Yenice, A. B.; Sanıyar, E.; Kuyrukçu, O.; Arıcan, B. N.; Yıldız, Olcay Taner; Computer Science; YILDIZ, Olcay TanerThis study aims to create the very first dependency-to-constituency conversion algorithm optimised for Turkish language. For this purpose, a state-of-the-art morphologic analyser (Yıldız et al., 2019) and a feature-based machine learning model was used. In order to enhance the performance of the conversion algorithm, bootstrap aggregating meta-algorithm was integrated. While creating the conversation algorithm, typological properties of Turkish were carefully considered. A comprehensive and manually annotated UD-style dependency treebank was the input, and constituency trees were the output of the conversion algorithm. A team of linguists manually annotated a set of constituency trees. These manually annotated trees were used as the gold standard to assess the performance of the algorithm. The conversion process yielded more than 8000 constituency trees whose UD-style dependency trees are also available on GitHub. In addition to its contribution to Turkish treebank resources, this study also offers a viable and easy-to-implement conversion algorithm that can be used to generate new constituency treebanks and training data for NLP resources like constituency parsers.Conference paperPublication Open Access Morpholex Turkish: A morphological lexicon for Turkish(European Language Resources Association (ELRA), 2022) Arıcan, B. N.; Kuzgun, A.; Marşan, B.; Aslan, D. B.; Sanıyar, E.; Cesur, N.; Kara, N.; Kuyrukçu, O.; Özçelik, M.; Yenice, A. B.; Doğan, M.; Oksal, C.; Ercan, G.; Yıldız, Olcay Taner; Computer Science; YILDIZ, Olcay TanerMorphoLex is a study in which root, prefix and suffixes of words are analyzed. With MorphoLex, many words can be analyzed according to certain rules and a useful database can be created. Due to the fact that Turkish is an agglutinative language and the richness of its language structure, it offers different analyzes and results from previous studies in MorphoLex. In this study, we revealed the process of creating a database with 48,472 words and the results of the differences in language structure.Conference paperPublication Open Access Time travel in Turkish: WordNets for modern Turkish(European Language Resources Association (ELRA), 2022) Oksal, C.; Oğuz, H. N.; Çatal, M.; Erbay, N.; Duvarcı, A.; Yüzer, Ö.; Ünsal, İ. B.; Kuyrukçu, O.; Yenice, A. B.; Kuzgun, A.; Marşan, B:; Marşan, B.; Sanıyar, E.; Arıcan, B. N.; Doğan, M.; Bakay, Ö.; Yıldız, Olcay Taner; Computer Science; YILDIZ, Olcay TanerWordnets have been popular tools for providing and representing semantic and lexical relations of languages. They are useful tools for various purposes in NLP studies. Many researches created WordNets for different languages. For Turkish, there are two WordNets, namely the Turkish WordNet of BalkaNet and KeNet. In this paper, we present new WordNets for Turkish each of which is based on one of the first 9 editions of the Turkish dictionary starting from the 1944 edition. These WordNets are historical in nature and make implications for Modern Turkish. They are developed by extending KeNet, which was created based on the 2005 and 2011 editions of the Turkish dictionary. In this paper, we explain the steps in creating these 9 new WordNets for Turkish, discuss the challenges in the process and report comparative results about the WordNets.Conference paperPublication Open Access WordNet and wikipedia connection in Turkish WordNet KeNet(European Language Resources Association (ELRA), 2022) Doğan, M.; Oksal, C.; Yenice, A. B.; Beyhan, F.; Yeniterzi, R.; Yıldız, Olcay Taner; Computer Science; YILDIZ, Olcay TanerThis paper aims to present WordNet and Wikipedia connection by linking synsets from Turkish WordNet KeNet with Wikipedia and thus, provide a better machine-readable dictionary to create an NLP model with rich data. For this purpose, manual mapping between two resources is realized and 11,478 synsets are linked to Wikipedia. In addition to this, automatic linking approaches are utilized to analyze possible connection suggestions. Baseline Approach and ElasticSearch Based Approach help identify the potential human annotation errors and analyze the effectiveness of these approaches in linking. Adopting both manual and automatic mapping provides us with an encompassing resource of WordNet and Wikipedia connections.