Publication:
BERT2OME: Prediction of 2′-O-methylation modifications from RNA sequence by transformer architecture based on BERT

dc.contributor.authorSoylu, Necla Nisa
dc.contributor.authorSefer, Emre
dc.contributor.departmentComputer Science
dc.contributor.ozuauthorSEFER, Emre
dc.contributor.ozugradstudentSoylu, Necla Nisa
dc.date.accessioned2023-09-15T13:18:56Z
dc.date.available2023-09-15T13:18:56Z
dc.date.issued2023-06
dc.description.abstractRecent work on language models has resulted in state-of-the-art performance on various language tasks. Among these, Bidirectional Encoder Representations from Transformers (BERT) has focused on contextualizing word embeddings to extract context and semantics of the words. On the other hand, post-transcriptional 2'-O-methylation (Nm) RNA modification is important in various cellular tasks and related to a number of diseases. The existing high-throughput experimental techniques take longer time to detect these modifications, and costly in exploring these functional processes. Here, to deeply understand the associated biological processes faster, we come up with an efficient method B2O to infer 2'-O-methylation RNA modification sites from RNA sequences. B2O combines BERT-based model with convolutional neural networks (CNN) to infer the relationship between the modification sites and RNA sequence content. Unlike the methods proposed so far, B2O assumes each given RNA sequence as a text and focuses on improving the modification prediction performance by integrating the pretrained deep learning-based language model BERT. Additionally, our transformer-based approach could infer modification sites across multiple species. According to 5-fold cross-validation, human and mouse accuracies were and respectively. Similarly, ROC AUC scores were 0.99, 0.94 for the same species. Detailed results show that B2O reduces the time consumed in biological experiments and outperforms the existing approaches across different datasets and species over multiple metrics. Additionally, deep learning approaches such as 2D CNNs are more promising in learning BERT attributes than more conventional machine learning methods. Our code and datasets can be found at .en_US
dc.identifier.doi10.1109/TCBB.2023.3237769en_US
dc.identifier.endpage2189en_US
dc.identifier.issn1545-5963en_US
dc.identifier.issue3en_US
dc.identifier.scopus2-s2.0-85147302015
dc.identifier.startpage2177en_US
dc.identifier.urihttp://hdl.handle.net/10679/8844
dc.identifier.urihttps://doi.org/10.1109/TCBB.2023.3237769
dc.identifier.volume20en_US
dc.identifier.wos001006656100050
dc.language.isoengen_US
dc.peerreviewedyesen_US
dc.publicationstatusPublisheden_US
dc.publisherIEEEen_US
dc.relation.ispartofIEEE/ACM Transactions on Computational Biology and Bioinformatics
dc.relation.publicationcategoryInternational Refereed Journal
dc.rightsrestrictedAccess
dc.subject.keywords2'-O-methylationen_US
dc.subject.keywordsBERTen_US
dc.subject.keywordsBiological system modelingen_US
dc.subject.keywordsBit error rateen_US
dc.subject.keywordsConvolutional Neural Networken_US
dc.subject.keywordsConvolutional neural networksen_US
dc.subject.keywordsPredictive modelsen_US
dc.subject.keywordsRNAen_US
dc.subject.keywordsRNAen_US
dc.subject.keywordsTask analysisen_US
dc.subject.keywordsTransformersen_US
dc.subject.keywordsTransformersen_US
dc.titleBERT2OME: Prediction of 2′-O-methylation modifications from RNA sequence by transformer architecture based on BERTen_US
dc.typearticleen_US
dspace.entity.typePublication
relation.isOrgUnitOfPublication85662e71-2a61-492a-b407-df4d38ab90d7
relation.isOrgUnitOfPublication.latestForDiscovery85662e71-2a61-492a-b407-df4d38ab90d7

Files

License bundle

Now showing 1 - 1 of 1
Placeholder
Name:
license.txt
Size:
1.45 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections