Publication:
Evaluating the English-Turkish parallel treebank for machine translation

dc.contributor.authorGörgün, O.
dc.contributor.authorYıldız, Olcay Taner
dc.contributor.departmentComputer Science
dc.contributor.ozuauthorYILDIZ, Olcay Taner
dc.date.accessioned2023-08-16T10:15:31Z
dc.date.available2023-08-16T10:15:31Z
dc.date.issued2022
dc.description.abstractThis study extends our initial efforts in building an English-Turkish parallel treebank corpus for statistical machine translation tasks. We manually generated parallel trees for about 17K sentences selected from the Penn Treebank corpus. English sentences vary in length: 15 to 50 tokens including punctuation. We constrained the translation of trees by (i) reordering of leaf nodes based on suffixation rules in Turkish, and (ii) gloss replacement. We aim to mimic human annotator’s behavior in real translation task. In order to fill the morphological and syntactic gap between languages, we do morphological annotation and disambiguation. We also apply our heuristics by creating Nokia English-Turkish Treebank (NTB) to address technical document translation tasks. NTB also includes 8.3K sentences in varying lengths. We validate the corpus both extrinsically and intrinsically, and report our evaluation results regarding perplexity analysis and translation task results. Results prove that our heuristics yield promising results in terms of perplexity and are suitable for translation tasks in terms of BLEU scores.en_US
dc.description.sponsorshipTÜBİTAK
dc.description.versionPublisher versionen_US
dc.identifier.doi10.3906/elk-2102-57en_US
dc.identifier.endpage199en_US
dc.identifier.issn1300-0632en_US
dc.identifier.issue1en_US
dc.identifier.scopus2-s2.0-85125870774
dc.identifier.startpage184en_US
dc.identifier.urihttp://hdl.handle.net/10679/8699
dc.identifier.urihttps://doi.org/10.3906/elk-2102-57
dc.identifier.volume30en_US
dc.identifier.wos000745992300003
dc.language.isoengen_US
dc.peerreviewedyesen_US
dc.publicationstatusPublisheden_US
dc.publisherTÜBİTAKen_US
dc.relationinfo:eu-repo/grantAgreement/TUBITAK/1001 - Araştırma/3140986
dc.relation.ispartofTurkish Journal of Electrical Engineering and Computer Sciences
dc.relation.publicationcategoryInternational Refereed Journal
dc.rightsopenAccess
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subject.keywordsParallel treebanken_US
dc.subject.keywordsParallel corporaen_US
dc.subject.keywordsTurkishen_US
dc.subject.keywordsEnglishen_US
dc.subject.keywordsSyntax-baseden_US
dc.titleEvaluating the English-Turkish parallel treebank for machine translationen_US
dc.typearticleen_US
dspace.entity.typePublication
relation.isOrgUnitOfPublication85662e71-2a61-492a-b407-df4d38ab90d7
relation.isOrgUnitOfPublication.latestForDiscovery85662e71-2a61-492a-b407-df4d38ab90d7

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Evaluating the English-Turkish parallel treebank for machine translation.pdf
Size:
363.1 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Placeholder
Name:
license.txt
Size:
1.45 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections