Person: YILDIZ, Olcay Taner
Name
Job Title
First Name
Olcay Taner
Last Name
YILDIZ
15 results
Publication Search Results
Now showing 1 - 10 of 15
Conference ObjectPublication Open Access Morpholex Turkish: A morphological lexicon for Turkish(European Language Resources Association (ELRA), 2022) Arıcan, B. N.; Kuzgun, A.; Marşan, B.; Aslan, D. B.; Sanıyar, E.; Cesur, N.; Kara, N.; Kuyrukçu, O.; Özçelik, M.; Yenice, A. B.; Doğan, M.; Oksal, C.; Ercan, G.; Yıldız, Olcay Taner; Computer Science; YILDIZ, Olcay TanerMorphoLex is a study in which root, prefix and suffixes of words are analyzed. With MorphoLex, many words can be analyzed according to certain rules and a useful database can be created. Due to the fact that Turkish is an agglutinative language and the richness of its language structure, it offers different analyzes and results from previous studies in MorphoLex. In this study, we revealed the process of creating a database with 48,472 words and the results of the differences in language structure.Conference ObjectPublication Metadata only CheckMate: English grammatical error correction for native Turkish speakers(IEEE, 2023) Ersoy, Asım; Savlak, O.; Kart, C.; Yıldız, Olcay Taner; Computer Science; YILDIZ, Olcay Taner; Ersoy, AsımThe covid-19 outbreak left many countries have no choice but turn to online education. Turkish students, who were among those who were affected, faced difficulties in improving their English as the opportunity to have face-to-face feedback was not available. In this work, we build and open-source a dataset for grammatical error correction composed of essays written by Turkish students from different universities capturing the errors Turkish native speakers tend to make. We utilize the dataset and build a model, which we deploy along with a web interface.Conference ObjectPublication Open Access HisNet: A polarity lexicon based on WordNet for emotion analysis(Global Wordnet Association, 2021-01) Özçelik, M.; Arıcan, B. N.; Bakay, Ö.; Sarmış, E.; Bayazıt, N. G.; Ergelen, Ö.; Taner, Olcay Taner; Computer Science; YILDIZ, Olcay TanerDictionary-based methods in sentiment analysis have received scholarly attention recently, the most comprehensive examples of which can be found in English. However, many other languages lack polarity dictionaries, or the existing ones are small in size as in the case of SentiTurkNet, the first and only polarity dictionary in Turkish. Thus, this study aims to extend the content of SentiTurkNet by comparing the two available WordNets in Turkish, namely KeNet and TR-wordnet of BalkaNet. To this end, a current Turkish polarity dictionary has been created relying on 76,825 synsets matching KeNet, where each synset has been annotated with three polarity labels, which are positive, negative and neutral. Meanwhile, the comparison of KeNet and TR-wordnet of BalkaNet has revealed their weaknesses such as the repetition of the same senses, lack of necessary merges of the items belonging to the same synset and the presence of redundant narrower versions of synsets, which are discussed in light of their potential to the improvement of the current lexical databases of Turkish.Conference ObjectPublication Open Access Introducing StarDust: A UD-based Dependency Annotation Tool(European Language Resources Association (ELRA), 2022) Yenice, A. B.; Cesur, N.; Kurgun, A.; Yıldız, Olcay Taner; Computer Science; YILDIZ, Olcay TanerThis paper aims to introduce StarDust, a new, open-source annotation tool designed for NLP studies. StarDust is designed specifically to be intuitive and simple for the annotators while also supporting the annotation of multiple languages with different morphological typologies, e.g. Turkish and English. This demonstration will mainly focus on our UD-based annotation tool for dependency syntax. Linked to a morphological analyzer, the tool can detect certain annotator mistakes and limit undesired dependency relations as well as offering annotators a quick and effective annotation process thanks to its new simple interface. Our tool can be downloaded from the Github. © 2022 European Language Resources Association (ELRA).Conference ObjectPublication Open Access Turkish WordNet KeNet(Global WordNet Association, 2021) Bakay, Ö.; Ergelen, Ö.; Sarmış, E.; Yıldırım, Selin; Kocabalcıoglu, A.; Arıcan, B. N.; Özçelik, M.; Sanıyar, E.; Kuyrukçu, O.; Avar, B.; Yıldız, Olcay Taner; Computer Science; YILDIZ, Olcay TanerCurrently, there are two available wordnets for Turkish: TR-wordnet of BalkaNet and KeNet. As the more comprehensive wordnet for Turkish, KeNet includes 76,757 synsets. KeNet has both intralingual semantic relations and is linked to PWN through interlingual relations. In this paper, we present the procedure adopted in creating KeNet, give details about our approach in annotating semantic relations such as hypernymy and discuss the language-specific problems encountered in these processes.Conference ObjectPublication Open Access A learning-based dependency to constituency conversion algorithm for the turkish language(European Language Resources Association (ELRA), 2022) Marşan, B.; Yıldız, O. K.; Kuzgun, A.; Yenice, A; Cesur, N.; Yenice, A. B.; Sanıyar, E.; Kuyrukçu, O.; Arıcan, B. N.; Yıldız, Olcay Taner; Computer Science; YILDIZ, Olcay TanerThis study aims to create the very first dependency-to-constituency conversion algorithm optimised for Turkish language. For this purpose, a state-of-the-art morphologic analyser (Yıldız et al., 2019) and a feature-based machine learning model was used. In order to enhance the performance of the conversion algorithm, bootstrap aggregating meta-algorithm was integrated. While creating the conversation algorithm, typological properties of Turkish were carefully considered. A comprehensive and manually annotated UD-style dependency treebank was the input, and constituency trees were the output of the conversion algorithm. A team of linguists manually annotated a set of constituency trees. These manually annotated trees were used as the gold standard to assess the performance of the algorithm. The conversion process yielded more than 8000 constituency trees whose UD-style dependency trees are also available on GitHub. In addition to its contribution to Turkish treebank resources, this study also offers a viable and easy-to-implement conversion algorithm that can be used to generate new constituency treebanks and training data for NLP resources like constituency parsers.Conference ObjectPublication Open Access FrameForm: An open-source annotation interface for framenet(Association for Computational Linguistics (ACL), 2021) Marşan, B.; Yıldız, Olcay Taner; Computer Science; Gkatzia, D.; Seddah, D.; YILDIZ, Olcay TanerIn this paper, we introduce FrameForm1, an open-source annotation tool designed to accommodate predicate annotations based on Frame Semantics (Fillmore et al., 1976). FrameForm is a user-friendly tool for creating, annotating and maintaining computational lexicography projects like FrameNet and has been used while building the Turkish FrameNet (Marşan et al., 2021). Responsive and open-source, FrameForm can be easily modified to answer the annotation needs of a wide range of different languages.Conference ObjectPublication Open Access WordNet and wikipedia connection in Turkish WordNet KeNet(European Language Resources Association (ELRA), 2022) Doğan, M.; Oksal, C.; Yenice, A. B.; Beyhan, F.; Yeniterzi, R.; Yıldız, Olcay Taner; Computer Science; YILDIZ, Olcay TanerThis paper aims to present WordNet and Wikipedia connection by linking synsets from Turkish WordNet KeNet with Wikipedia and thus, provide a better machine-readable dictionary to create an NLP model with rich data. For this purpose, manual mapping between two resources is realized and 11,478 synsets are linked to Wikipedia. In addition to this, automatic linking approaches are utilized to analyze possible connection suggestions. Baseline Approach and ElasticSearch Based Approach help identify the potential human annotation errors and analyze the effectiveness of these approaches in linking. Adopting both manual and automatic mapping provides us with an encompassing resource of WordNet and Wikipedia connections.Conference ObjectPublication Metadata only ORTPiece: An ORT-based Turkish image captioning network based on transformers and WordPiece(IEEE, 2023) Ersoy, Asım; Yıldız, Olcay Taner; Özer, Sedat; Computer Science; YILDIZ, Olcay Taner; ÖZER, Sedat; Ersoy, AsımRecent transformers-based systems are advancing image captioning applications. However, those works have been mainly applied to English-based image captioning problems. In this paper, we introduce a transformers-based Turkish-based image captioning algorithm. Our proposed algorithm uses appearance and geometry features from the input image and combines them along with the WordPiece embeddings to generate the Turkish-based caption. Our experimental results show improvement when compared to the other existing techniques including the original ORT and the show-and-tell algorithms.Conference ObjectPublication Open Access From constituency to UD-Style dependency: Building the first conversion tool of Turkish(Incoma Ltd, 2021-09) Kuzgun, A.; Yıldız, O. K.; Cesur, N.; Marşan, B.; Yenice, A. B; Sanıyar, E.; Kuyrukçu, O.; Arıcan, B. N.; Yıldız, Olcay Taner; Computer Science; YILDIZ, Olcay TanerThis paper deliberates on the process of building the first constituency-to-dependency conversion tool of Turkish. The starting point of this work is a previous study in which 10,000 phrase structure trees were manually transformed into Turkish from the original PennTreebank corpus. Within the scope of this project, these Turkish phrase structure trees were automatically converted into UD-style dependency structures, using both a rule-based algorithm and a machine learning algorithm specific to the requirements of the Turkish language. The results of both algorithms were compared and the machine learning approach proved to be more accurate than the rule-based algorithm. The output was revised by a team of linguists. The refined versions were taken as gold standard annotations for the evaluation of the algorithms. In addition to its contribution to the UD Project with a large dataset of 10,000 Turkish dependency trees, this project also fulfills the important gap of a Turkish conversion tool, enabling the quick compilation of dependency corpora which can be used for the training of better dependency parsers.