A learning-based dependency to constituency conversion algorithm for the turkish language

Marşan, B.; Yıldız, O. K.; Kuzgun, A.; Yenice, A; Cesur, N.; Yenice, A. B.; Sanıyar, E.; Kuyrukçu, O.; Arıcan, B. N.; Yıldız, Olcay Taner

dc.contributor.author	Marşan, B.
dc.contributor.author	Yıldız, O. K.
dc.contributor.author	Kuzgun, A.
dc.contributor.author	Yenice, A
dc.contributor.author	Cesur, N.
dc.contributor.author	Yenice, A. B.
dc.contributor.author	Sanıyar, E.
dc.contributor.author	Kuyrukçu, O.
dc.contributor.author	Arıcan, B. N.
dc.contributor.author	Yıldız, Olcay Taner
dc.date.accessioned	2023-08-08T08:17:29Z
dc.date.available	2023-08-08T08:17:29Z
dc.date.issued	2022
dc.identifier.isbn	979-109554672-6
dc.identifier.uri	http://hdl.handle.net/10679/8587
dc.identifier.uri	http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.540.pdf
dc.description.abstract	This study aims to create the very first dependency-to-constituency conversion algorithm optimised for Turkish language. For this purpose, a state-of-the-art morphologic analyser (Yıldız et al., 2019) and a feature-based machine learning model was used. In order to enhance the performance of the conversion algorithm, bootstrap aggregating meta-algorithm was integrated. While creating the conversation algorithm, typological properties of Turkish were carefully considered. A comprehensive and manually annotated UD-style dependency treebank was the input, and constituency trees were the output of the conversion algorithm. A team of linguists manually annotated a set of constituency trees. These manually annotated trees were used as the gold standard to assess the performance of the algorithm. The conversion process yielded more than 8000 constituency trees whose UD-style dependency trees are also available on GitHub. In addition to its contribution to Turkish treebank resources, this study also offers a viable and easy-to-implement conversion algorithm that can be used to generate new constituency treebanks and training data for NLP resources like constituency parsers.	en_US
dc.language.iso	eng	en_US
dc.publisher	European Language Resources Association (ELRA)	en_US
dc.relation.ispartof	2022 Language Resources and Evaluation Conference, LREC 2022
dc.rights	Attribution-NonCommercial 4.0 International	*
dc.rights	openAccess
dc.rights.uri	http://creativecommons.org/licenses/by-nc/4.0/	*
dc.title	A learning-based dependency to constituency conversion algorithm for the turkish language	en_US
dc.type	Conference paper	en_US
dc.publicationstatus	Published	en_US
dc.contributor.department	Özyeğin University
dc.contributor.authorID	(ORCID 0000-0001-5838-4615 & YÖK ID 19848) Yıldız, Olcay Taner
dc.contributor.ozuauthor	Yıldız, Olcay Taner
dc.identifier.startpage	5054	en_US
dc.identifier.endpage	5062	en_US
dc.identifier.wos	WOS:000889371705018
dc.subject.keywords	Constituency parsing	en_US
dc.subject.keywords	Constitueny dpendency conversion	en_US
dc.subject.keywords	Dependency parsing	en_US
dc.identifier.scopus	SCOPUS:2-s2.0-85144336878
dc.relation.publicationcategory	Conference Paper - International - Institutional Academic Staff