Show simple item record

dc.contributor.authorGörgün, O.
dc.contributor.authorYıldız, Olcay Taner
dc.date.accessioned2023-08-16T10:15:31Z
dc.date.available2023-08-16T10:15:31Z
dc.date.issued2022
dc.identifier.issn1300-0632en_US
dc.identifier.urihttp://hdl.handle.net/10679/8699
dc.identifier.urihttps://journals.tubitak.gov.tr/elektrik/vol30/iss1/13/
dc.description.abstractThis study extends our initial efforts in building an English-Turkish parallel treebank corpus for statistical machine translation tasks. We manually generated parallel trees for about 17K sentences selected from the Penn Treebank corpus. English sentences vary in length: 15 to 50 tokens including punctuation. We constrained the translation of trees by (i) reordering of leaf nodes based on suffixation rules in Turkish, and (ii) gloss replacement. We aim to mimic human annotator’s behavior in real translation task. In order to fill the morphological and syntactic gap between languages, we do morphological annotation and disambiguation. We also apply our heuristics by creating Nokia English-Turkish Treebank (NTB) to address technical document translation tasks. NTB also includes 8.3K sentences in varying lengths. We validate the corpus both extrinsically and intrinsically, and report our evaluation results regarding perplexity analysis and translation task results. Results prove that our heuristics yield promising results in terms of perplexity and are suitable for translation tasks in terms of BLEU scores.en_US
dc.description.sponsorshipTÜBİTAK
dc.language.isoengen_US
dc.publisherTÜBİTAKen_US
dc.relationinfo:turkey/grantAgreement/TUBITAK/3140986
dc.relation.ispartofTurkish Journal of Electrical Engineering and Computer Sciences
dc.rightsopenAccess
dc.rightsAttribution 4.0 Internationa
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.titleEvaluating the English-Turkish parallel treebank for machine translationen_US
dc.typeArticleen_US
dc.description.versionPublisher versionen_US
dc.peerreviewedyesen_US
dc.publicationstatusPublisheden_US
dc.contributor.departmentÖzyeğin University
dc.contributor.authorID(ORCID 0000-0001-5838-4615 & YÖK ID 19848) Yıldız, Olcay Taner
dc.contributor.ozuauthorYıldız, Olcay Taner
dc.identifier.volume30en_US
dc.identifier.issue1en_US
dc.identifier.startpage184en_US
dc.identifier.endpage199en_US
dc.identifier.wosWOS:000745992300003
dc.identifier.doi10.3906/elk-2102-57en_US
dc.subject.keywordsParallel treebanken_US
dc.subject.keywordsParallel corporaen_US
dc.subject.keywordsTurkishen_US
dc.subject.keywordsEnglishen_US
dc.subject.keywordsSyntax-baseden_US
dc.identifier.scopusSCOPUS:2-s2.0-85125870774
dc.relation.publicationcategoryArticle - International Refereed Journal - Institutional Academic Staff


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

openAccess
Except where otherwise noted, this item's license is described as openAccess

Share this page