Person: SEFER, Emre
Name
Job Title
First Name
Emre
Last Name
SEFER
12 results
Publication Search Results
Now showing 1 - 10 of 12
ArticlePublication Metadata only BioCode: A data-driven procedure to learn the growth of biological networks(IEEE, 2022-11) Sefer, Emre; Computer Science; SEFER, EmreProbabilistic biological network growth models have been utilized for many tasks including but not limited to capturing mechanism and dynamics of biological growth activities, null model representation, capturing anomalies, etc. Well-known examples of these probabilistic models are Kronecker model, preferential attachment model, and duplication-based model. However, we should frequently keep developing new models to better fit and explain the observed network features while new networks are being observed. Additionally, it is difficult to develop a growth model each time we study a new network. In this paper, we propose BioCode, a framework to automatically discover novel biological growth models matching user-specified graph attributes in directed and undirected biological graphs. BioCode designs a basic set of instructions which are common enough to model a number of well-known biological graph growth models. We combine such instruction-wise representation with a genetic algorithm based optimization procedure to encode models for various biological networks. We mainly evaluate the performance of BioCode in discovering models for biological collaboration networks, gene regulatory networks, and protein interaction networks which features such as assortativity, clustering coefficient, degree distribution closely match with the true ones in the corresponding real biological networks. As shown by the tests on the simulated graphs, the variance of the distributions of biological networks generated by BioCode is similar to the known models' variance for these biological network types.ArticlePublication Open Access ProbC: joint modeling of epigenome and transcriptome effects in 3D genome(BioMed Central Ltd, 2022-12) Sefer, Emre; Computer Science; SEFER, EmreBackground: Hi-C and its high nucleosome resolution variant Micro-C provide a window into the spatial packing of a genome in 3D within the cell. Even though both techniques do not directly depend on the binding of specific antibodies, previous work has revealed enriched interactions and domain structures around multiple chromatin marks; epigenetic modifications and transcription factor binding sites. However, the joint impact of chromatin marks in Hi-C and Micro-C interactions have not been globally characterized, which limits our understanding of 3D genome characteristics. An emerging question is whether it is possible to deduce 3D genome characteristics and interactions by integrative analysis of multiple chromatin marks and associate interactions to functionality of the interacting loci. Result: We come up with a probabilistic method ProbC to decompose Hi-C and Micro-C interactions by known chromatin marks. ProbC is based on convex likelihood optimization, which can directly take into account both interaction existence and nonexistence. Through ProbC, we discover histone modifications (H3K27ac, H3K9me3, H3K4me3, H3K4me1) and CTCF as particularly predictive of Hi-C and Micro-C contacts across cell types and species. Moreover, histone modifications are more effective than transcription factor binding sites in explaining the genome’s 3D shape through these interactions. ProbC can successfully predict Hi-C and Micro-C interactions in given species, while it is trained on different cell types or species. For instance, it can predict missing nucleosome resolution Micro-C interactions in human ES cells trained on mouse ES cells only from these 5 chromatin marks with above 0.75 AUC. Additionally, ProbC outperforms the existing methods in predicting interactions across almost all chromosomes. Conclusion: Via our proposed method, we optimally decompose Hi-C interactions in terms of these chromatin marks at genome and chromosome levels. We find a subset of histone modifications and transcription factor binding sites to be predictive of both Hi-C and Micro-C interactions and TADs across human, mouse, and different cell types. Through learned models, we can predict interactions on species just from chromatin marks for which Hi-C data may be limited.ArticlePublication Open Access Hi–C interaction graph analysis reveals the impact of histone modifications in chromatin shape(Springer, 2021-07-17) Sefer, Emre; Computer Science; SEFER, EmreChromosome conformation capture experiments such as Hi–C map the three-dimensional spatial organization of genomes in a genome-wide scale. Even though Hi–C interactions are not biased towards any of the histone modifications, previous analysis has revealed denser interactions around many histone modifications. Nevertheless, simultaneous effects of these modifications in Hi–C interaction graph have not been fully characterized yet, limiting our understanding of genome shape. Here, we propose CHROMATINCOVERAGE and its extension TEMPORALPRIZECOVERAGE methods to decompose Hi–C interaction graph in terms of known histone modifications. Both methods are based on set multicover with pairs, where each Hi–C interaction is tried to be covered by histone modification pairs. We find 4 histone modifications H3K4me1, H3K4me3, H3K9me3, H3K27ac to be significantly predictive of most Hi–C interactions across species, cell types and cell cycles. The proposed methods are quite effective in predicting Hi–C interactions and topologically-associated domains in one species, given it is trained on another species or cell types. Overall, our findings reveal the impact of subset of histone modifications in chromatin shape via Hi–C interaction graph.ArticlePublication Metadata only BERT2OME: Prediction of 2′-O-methylation modifications from RNA sequence by transformer architecture based on BERT(IEEE, 2023-06) Soylu, Necla Nisa; Sefer, Emre; Computer Science; SEFER, Emre; Soylu, Necla NisaRecent work on language models has resulted in state-of-the-art performance on various language tasks. Among these, Bidirectional Encoder Representations from Transformers (BERT) has focused on contextualizing word embeddings to extract context and semantics of the words. On the other hand, post-transcriptional 2'-O-methylation (Nm) RNA modification is important in various cellular tasks and related to a number of diseases. The existing high-throughput experimental techniques take longer time to detect these modifications, and costly in exploring these functional processes. Here, to deeply understand the associated biological processes faster, we come up with an efficient method B2O to infer 2'-O-methylation RNA modification sites from RNA sequences. B2O combines BERT-based model with convolutional neural networks (CNN) to infer the relationship between the modification sites and RNA sequence content. Unlike the methods proposed so far, B2O assumes each given RNA sequence as a text and focuses on improving the modification prediction performance by integrating the pretrained deep learning-based language model BERT. Additionally, our transformer-based approach could infer modification sites across multiple species. According to 5-fold cross-validation, human and mouse accuracies were and respectively. Similarly, ROC AUC scores were 0.99, 0.94 for the same species. Detailed results show that B2O reduces the time consumed in biological experiments and outperforms the existing approaches across different datasets and species over multiple metrics. Additionally, deep learning approaches such as 2D CNNs are more promising in learning BERT attributes than more conventional machine learning methods. Our code and datasets can be found at .ArticlePublication Metadata only Metric labeling and semimetric embedding for protein annotation prediction(Mary Ann Liebert, Inc., 2021-05-01) Sefer, Emre; Kingsford, C.; Computer Science; SEFER, EmreComputational techniques have been successful at predicting protein function from relational data (functional or physical interactions). These techniques have been used to generate hypotheses and to direct experimental validation. With few exceptions, the task is modeled as multilabel classification problems where the labels (functions) are treated independently or semi-independently. However, databases such as the Gene Ontology provide information about the similarities between functions. We explore the use of the Metric Labeling combinatorial optimization problem to make use of heuristically computed distances between functions to make more accurate predictions of protein function in networks derived from both physical interactions and a combination of other data types. To do this, we give a new technique (based on convex optimization) for converting heuristic semimetric distances into a metric with minimum least-squared distortion (LSD). The Metric Labeling approach is shown to outperform five existing techniques for inferring function from networks. These results suggest that Metric Labeling is useful for protein function prediction, and that LSD minimization can help solve the problem of converting heuristic distances to a metric.ArticlePublication Open Access A comparison of topologically associating domain callers over mammals at high resolution(BioMed Central Ltd, 2022-04-12) Sefer, Emre; Computer Science; SEFER, EmreBackground: Topologically associating domains (TADs) are locally highly-interacting genome regions, which also play a critical role in regulating gene expression in the cell. TADs have been first identified while investigating the 3D genome structure over High-throughput Chromosome Conformation Capture (Hi-C) interaction dataset. Substantial degree of efforts have been devoted to develop techniques for inferring TADs from Hi-C interaction dataset. Many TAD-calling methods have been developed which differ in their criteria and assumptions in TAD inference. Correspondingly, TADs inferred via these callers vary in terms of both similarities and biological features they are enriched in. Result: We have carried out a systematic comparison of 27 TAD-calling methods over mammals. We use Micro-C, a recent high-resolution variant of Hi-C, to compare TADs at a very high resolution, and classify the methods into 3 categories: feature-based methods, Clustering methods, Graph-partitioning methods. We have evaluated TAD boundaries, gaps between adjacent TADs, and quality of TADs across various criteria. We also found particularly CTCF and Cohesin proteins to be effective in formation of TADs with corner dots. We have also assessed the callers performance on simulated datasets since a gold standard for TADs is missing. TAD sizes and numbers change remarkably between TAD callers and dataset resolutions, indicating that TADs are hierarchically-organized domains, instead of disjoint regions. A core subset of feature-based TAD callers regularly perform the best while inferring reproducible domains, which are also enriched for TAD related biological properties. Conclusion: We have analyzed the fundamental principles of TAD-calling methods, and identified the existing situation in TAD inference across high resolution Micro-C interaction datasets over mammals. We come up with a systematic, comprehensive, and concise framework to evaluate the TAD-calling methods performance across Micro-C datasets. Our research will be useful in selecting appropriate methods for TAD inference and evaluation based on available data, experimental design, and biological question of interest. We also introduce our analysis as a benchmarking tool with publicly available source code.Conference ObjectPublication Metadata only A novel GBT-based approach for cross-channel fraud detection on real-world banking transactions(IEEE, 2022) Dolu, U.; Sefer, Emre; Computer Science; Maglogiannis, I.; Iliadis, L.; Macintyre, J.; SEFER, EmreThe most recent research on hundreds of financial institutions uncovered that only 26 % of them have a team assigned to detect cross-channel fraud. Due to the developing technologies, various fraud techniques have emerged and increased in digital environments. Fraud directly affects customer satisfaction. For instance, only in the UK, the total loss of fraud transactions was £1.26 billion in 2020. In this paper, we come up with a Gradient Boosting Tree (GBT)-based approach to efficiently detect cross-channel frauds. As part of our proposed approach, we also figured out a solution to generate training sets from imbalanced data, which also suffers from concept drift problems due to changing customer behaviors. We boost the performance of our GBT model by integrating additional demographic, economic, and behavioral features as a part of feature engineering. We evaluate the performance of our cross-channel fraud detection method on a real banking dataset which is highly imbalanced in terms of frauds which is another challenge in the fraud detection problem. We use our trained model to score real-time cross-channel transactions by a leading private bank in Turkey. As a result, our approach can catch almost 75 % of total fraud loss in a month with a low false-positive rate.Conference ObjectPublication Metadata only Joint modeling of histone modifications in 3D genome shape through Hi-C interaction graph(Springer, 2021) Sefer, Emre; Computer Science; SEFER, EmreChromosome conformation capture experiments such as Hi-C are used to map the three-dimensional spatial organization of genomes. Even though Hi-C interactions are not biased towards any of the histone modifications, previous analysis has revealed denser interactions around many histone modifications. Nevertheless, simultaneous effects of these modifications in Hi-C interaction graph have not been fully characterized yet, limiting our understanding of genome shape. Here, we propose Coverage Hi-C to decompose Hi-C interaction graph in terms of known histone modifications. Coverage Hi-C is based on set multicover with pairs, where each Hi-C interaction is covered by histone modification pairs. We find 4 histone modifications H3K4me1, H3K4me3, H3K9me3, H3K27ac to be significantly predictive of most Hi-C interactions across species and cell types. Coverage Hi-C is quite effective in predicting Hi-C interactions and topologically-associated domains (TADs) in one species, given it is trained on another species or cell types.Conference ObjectPublication Metadata only NFT primary sale price and secondary sale prediction via deep learning(Association for Computing Machinery, Inc, 2023-11-27) Seyhan, Betül; Sefer, Emre; Computer Science; SEFER, Emre; Seyhan, BetülNon Fungible Tokens (NFTs) are blockchain-based unique digital assets defining ownership deeds. They can characterize various different objects such as collectible, art, and in-game items. In general, NFTs are encoded by blockchains smart contracts, and they are traded via cryptocurrencies. Their price and investors attention on them has remarkably increased especially in 2021, making them promising alternative class of investment. Surprisingly, predicting their prices has only recently started to be analyzed systematically. In this paper, we focus on predicting NFT primary sale price and secondary sale via deep learning. We use multimodal data, NFT images and NFT text characteristics when predicting their prices. Here, we show that contrasting the different and similar (DS) hierarchical features of images and text serves as an important identifying marker for their price, with the consequence that we only need to direct our attention to this aspect when designing a multimodal NFT price predictor. When designing NFT price predictor from multimodal data without using any financial attributes, we come up with Fine-Grained Differences-Similarities Enhancement Network (FG-DSEN), which improves detection with a simple and interpretable structure to enhance the DS aspect between images and text. According to detailed assessment on publicly available NFT dataset, our proposed approach outperforms baselines on both price direction prediction and secondary sale participation prediction according to several machine learning classification metrics.ArticlePublication Open Access MOCMIN: convex inferring of modular low-rank contact networks over COVID diffusion data(TÜBİTAK, 2022) Sefer, Emre; Computer Science; SEFER, EmreSEIR (which consists of susceptible, exposed, infected, and recovered states) is a common diffusion model which could model different disease propagation dynamics across various domains such as influenza and COVID diffusion. As a motivation, across these domains, observing the node states is relatively easier than observing the network edges over which the diffusion is taking place, or it may not even be possible to observe the underlying network. This paper focuses on the problem of predicting modular low-rank human contact network edges only if a SEIR diffusion dynamics spreading among the human on their contact network can be observed. Such contact networks exhibit high modularity where the graph has dense connections between the vertices within modules, but sparse connections between vertices in different modules. We first formulate such inference problem as an optimization problem, discuss its convexity, and propose MOCMIN to optimally infer such unknown contacts of modular human contact network from COVID diffusion data. This modular contact network inference problem is important in the general case where human states such as infected with virus and recovered from virus can be identified more easily than the contacts between humans. Our contributions can be summarized as follows: (1) MOCMIN can handle noisy, incomplete, or undersampled diffusion data while inferring the unknown contact network; (2) The inferred contact networks are highly modular which cannot be ensured by the existing methods; (3) This paper applies MOCMIN to better understand COVID diffusion on contact network. We found MOCMIN to be accurate in modular real human contact network inference from COVID diffusion data under a number of challenging scenarios. As an example, such high school contact network can be inferred by tracking COVID diffusion among humans approximately 5% better than the compared methods by MOCMIN’s ability to model modularity of the network. Via such inference, we can also understand the details of COVID diffusion dynamics in real human contact network. Additionally, inferred human contact graphs nearly mimic the true contact network’s known graphical properties. Lastly, MOCMIN outperforms the competing approaches while estimating the synthetic networks.