Organizational Unit:
Department of Data Science

Loading...
OrgUnit Logo

Date established

City

Country

ID

Publication Search Results

Now showing 1 - 10 of 17
  • Placeholder
    Master ThesisPublication
    Modern data management strategies for machine learning tasks: A sports analytics use case on cloud platform
    Mete, Emrah; Albey, Erinç; Albey, Erinç; Özener, Okan Örsan; Güler, M. G.; Department of Data Science; Mete, Emrah
    There is no doubt that data is the most valuable asset today. The efforts of enterprises in digital transformation and creating a data-driven culture are the most concrete indicators of this. Nowadays, where data transforms all industries, it is possible to follow the rapidly developing technological developments in this field. Appropriate data management strategies are the basis of creating data-driven organizations. When the evolution of data management architectures is examined, it is possible to say that the biggest factor that triggers this evolution is the changing and increasing data sources and the velocity of data production. In addition, with the increase in the importance of business use cases that need to be done in real time, it has become a very crucial need to process data quickly and turn it into action. Today when data is strategic importance, enterprises that can manage data correctly could gain competitive advantages. Being able to the correct data management can be built with the support of up-to-date and modern approaches. The infrastructures established by the integration of new and modern methods into the platforms turn into more agile structures. This increases the number of value added services to be produced from data by providing speed and flexibility to organizations. Today, the outputs expected to be produced from data management platforms go beyond descriptive and diagnostic analytic. Now, artificial intelligence, machine learning and data science are important parts of these platforms and these are opened new channels for the future of enterprises. In this thesis, basic needs and capabilities of modern data management architectures are described and detailed explanations were made on reference architectures in the industry. Besides, data management strategies and expectations were discussed. An example prototype of the data management platforms, which is explained in detail in this thesis, has also been developed on the cloud platforms. In this prototype, the entire life cycle of the data was considered and each step was developed in detail. In addition, a data science project was developed using the data collected on the platform. Thus, an end-to-end solution has been implemented.
  • Placeholder
    Master ThesisPublication
    Finding the determinants of player market value in association football using FIFA video game data
    Eren, Ozan Can; Özener, Okan Örsan; Özener, Okan Örsan; Özener, Başak Altan; Albey, Erinç; Yanıkoğlu, İhsan; Güler, M. G.; Department of Data Science; Eren, Ozan Can
    Futbol, son onyıllarda dünyadaki en popüler sporlardan biri haline gelmiştir. Artan popülerliği ile birlikte futbol, büyüyen milyar dolarlık bir pazar halini almıştır. Büyük Avrupa liglerindeki üst düzey oyuncuların transferleri için yüksek meblağlar ödenmektedir. Bu durum, futbolcuların piyasa değerlerine dikkat çekilmesine sebep olmuştur. Bu alandaki ucu açık konulardan birisi ise futbol oyuncularının piyasa değerlerini belirleyen faktörlerin tespitidir. Daha önce bu faktörlerin bir kısmı üzerine araştırmalar yapılmış olmasına rağmen odak noktası genellikle saha içi istatistikler olmuştur. Bu tez kapsamında benzersiz bir veri seti kullanılmakta ve futbol oyuncularının FIFA 2015, 2016, 2017 ve 2018 oyunlarındaki esas özellik ve niteliklerine odaklanılmaktadır. Oyuncu piyasa değerleri "transfermarkt.com" adresinden temin edildi. 2014-2015, 2015-2016, 2016-2017 ve 2017-2018 sezonlarının her biri için Avrupa'nın en büyük 5 liginde (İngiliz Premier Ligi, İspanyol La Liga, Alman Bundes Liga, İtalyan Serie A ve Fransız Lig 1) en az doksan dakika sahada kalan futbolcular analiz edildi. Çeşitli değişken seçme metotları kullanılıp Sıradan En Küçük Kareler modelleri oluşturularak defans, orta saha ve hücum bölgelerindeki spesifik alt pozisyonlarda oynayan oyuncuları değerli kılan en önemli demografik, takımsal, fiziksel, teknik ve mental özellik ve nitelikler her sezon için tespit edildi. Farklı alt pozisyonlarda (Stoper, Bek, Merkezi Defansif Orta Saha, Merkezi Orta Saha, Kanatçı ve Forvet) oynayan oyuncuları değerli kılan farklı özellikler ve nitelikler olduğu gösterildi. Her 4 sezon ve 6 alt pozisyon için oluşturulan Sıradan En Küçük Kareler modellerinin çıktıları paylaşıldı.
  • Placeholder
    Master ThesisPublication
    Multilabel classification with neural network
    Ekşioğlu, Sezin; Özener, Okan Örsan; Özener, Okan Örsan; Özener, Başak Altan; Çelikyurt, U.; Department of Data Science; Ekşioğlu, Sezin
    Multi-label classification has huge importance for several applications, it is also a challenging research topic. It is a kind of supervised learning that contains binary targets. The distance between multilabel and binary classification is having more than one class in multilabel classification problems. Features can belong to one class or many classes. There exists a wide range of applications for multi-label prediction such as image labeling, text categorization, gene functionality. Even though features are classified in many classes, they may not always be properly classified. There are many ensemble methods for classification. However, most of the researchers have been concerned about better multi-label methods. Especially little ones focus on both efficiency of classifiers and pairwise relationships at the same time to implement better multi-label classification. In this paper, we worked on modified ensemble methods by getting benefits from k-Nearest Neighbors and neural network structure sequentially to address issues beneficially and to get better impacts from the multi-label classification. Publicly available datasets (yeast, emotion, scene, and birds) are performed to demonstrate the developed algorithm efficiency, and the technique is measured. Our algorithm outperforms benchmarks for each dataset with different metrics. The result of the algorithm is competitive with the state-of-the-art results. Especially, in the weighted average of false-positive minimization and false-negative minimization, the algorithm passes the benchmarks.
  • Placeholder
    Master ThesisPublication
    A novel sampling technique and gradient boosting tree-based approach for cross-channel fraud detection
    Dolu, Uğur; Sefer, Emre; Sefer, Emre; Özener, Okan Örsan; Kaynar, O.; Department of Data Science; Dolu, Uğur
    The most recent research on hundreds of financial institutions uncovered that only 26% of them have a team assigned to detect cross-channel fraud. Due to the developing technologies, various fraud techniques have emerged and increased in digital environments. Fraud directly affects customer satisfaction. For instance, only in the UK, the total loss of fraud transactions was £1.26 billion in 2020. In this study, we come up with a Gradient Boosting Tree (GBT)-based approach to efficiently detect cross-channel frauds. As a part of our proposed approach, we developed an algorithm able to generate an optimized training set to train the model and overcome imbalanced data problems. This solution made it easier for the model to understand the concept drift, another major problem arising from changing customer behavior. We boost the performance of our GBT model by integrating additional demographic, economic, and behavioral features as a part of feature engineering. Hyperparameter tuning methods find the best parameters for the model. The cross-channel fraud detection performance of the model is evaluated on a real banking dataset which is highly imbalanced in terms of fraud which is another challenge in the fraud detection problem. We use our trained model to score real-time cross-channel transactions by a leading private bank in Turkey. As a result, our approach can catch almost 75% of total fraud loss in a month with a low false-positive rate and acceptable call count.
  • Placeholder
    Master ThesisPublication
    Descriptive and predictive analysis of the NFT market
    Çabuk, Onur Can; Albey, Erinç; Albey, Erinç; Önal, Mehmet; Güler, M. G.; Department of Data Science
    Non-fungible tokens (NFTs) are digital assets on a blockchain that have unique identi- fication codes and metadata that make them distinguishable from one another. NFTs can represent a wide range of digital assets, including game cards, artwork, and even real estate. Due to these characteristics, NFTs have gained a tremendous interest from people around the world, leading to huge returns on investment in the NFT market. However, there are only a few studies on the market in the literature. This paper examines various aspects of the NFT market to shed light on its dy- namics and wallet behaviors. First, a descriptive analysis of the market is performed to show its overall trend. The transactional behaviors of wallets are then analyzed, and a segmentation is made to gain a general understanding of the user portfolio. The buyers of a specific NFT collection (Bored Ape Yacht Club) are then studied by comparing them to the overall market, revealing differences in transactional tenden- cies and macro indicators. Finally, machine learning models are developed to predict the transactional behaviors of wallets. Our analysis has revealed that the growth of the NFT market is largely driven by new entrants to the market, but lately there has been a significant decrease in the number of new wallets entering the market. We have also found that the majority of wallets in the market have only one transaction and hold only one token, suggesting that these are users who are experimenting with the market. When we look at the Bored Ape Yacht Club sample, however, we see that these users are highly engaged with the market, with high trading frequencies and a diverse portfolio. Finally, our predictive models show that the transactional behaviors of wallets can be predicted, which opens up opportunities for optimization in various areas.
  • Placeholder
    Master ThesisPublication
    Allocating costs in a lot sizing game using novel machine learning methods
    Kasapoğlu, Furkan; Özener, Okan Örsan; Özener, Okan Örsan; Özener, Başak Altan; Çelikyurt, U.; Department of Data Science; Kasapoğlu, Furkan
    In supply chain management (SCM), effective resource utilization is the key to achieving certain strategic benefits such as minimizing costs, increasing service levels, reducing inventories, increasing responsiveness, and finally improving customer satisfaction. Collaborative approaches among supply chain entities have become increasingly popular to increase resource utilization. In this thesis, we analyze a collaborative production setting where several companies facing varying demands throughout a finite planning horizon attempt to reduce their procurement costs by ordering from a common supplier. As the capacity of the common supplier is better utilized in such a collaborative solution, it will yield benefits that will be shared by the collaborators. Our objective is to design a cost allocation framework to ensure the sustainability of the collaborative purchasing organization. We propose various methods, including novel and computationally efficient machine learning based methods, using two different architectures, gradient boosting mechanism, and artificial neural networks which ensure the scalability of the proposed framework. We perform an extensive computational study and observe that our proposed method significantly outperforms the generic methods in the literature in terms of solution quality and computation time.
  • Placeholder
    Master ThesisPublication
    Optimizing inventory routing: an integrated machine learning solution approach
    Aktaş, Taha Huzeyfe; Özener, Okan Örsan; Özener, Okan Örsan; Ekici, Ali; Yakıcı, E.; Department of Data Science
    Inventory Routing Problem (IRP) arises from vendor-managed inventory business set tings where the supplier is responsible for replenishing the inventories of its customers over a planning horizon. In the IRP, the supplier makes the routing and inventory decisions together to improve the overall performance of the system. In our setting, the supplier’s goal is to minimize total transportation costs over a planning horizon while avoiding stock-outs at the customer locations. We assume that the supplier has a fleet of homogeneous capacitated delivery vehicles and abundant availability of the product to be delivered to the customers. Each customer has a constant de mand/consumption rate and limited storage capacity to keep inventory. To address this problem, we propose a novel integrated clustering and routing algorithm. In the clustering phase, we partition the customer set into clusters, ensuring that each cluster is served by a single vehicle. To accomplish this, we employ a novel deep learning model within the clustering framework. In the routing phase, we develop the delivery schedule for each cluster. What sets our approach apart is its consider ation of the three key decisions—when to deliver, how much to deliver, and how to route—by integrating both a mathematical model and a machine learning model in the decision-making process. We evaluate the performance of the proposed clustering and routing algorithms against existing literature, and our results demonstrate sig nificant improvements. Furthermore, the proposed neural network-based clustering approach serves as an effective representation of how machine learning algorithms can enhance decision-making structures.
  • Placeholder
    Master ThesisPublication
    A data-driven approach to NFT trading : Q-learning based simulator
    Kamalak, Süleyman; Albey, Erinç; Albey, Erinç; Önal, Mehmet; Güler, M. G.; Department of Data Science
    Non-fungible tokens (NFTs) have garnered considerable attention in recent years due to their broad range of applications and potential as a lucrative investment oppor tunity. Given the nascent nature of the technology and the scarcity of comprehen sive studies, there is a pressing need for a holistic trading framework that not only addresses the complexities inherent in the trading process but also proposes viable solutions to existing challenges. This study introduces a novel approach for navi gating the NFT trading landscape, effectively confronting the various challenges and suggesting practical solutions. The trading environment is modeled using a Markov Decision Process (MDP), with Q-learning employed to simulate the environment and resolve the MDP problem. The study proposes machine learning models to tackle key challenges, including defining the market state, appraising NFT tokens, and ad dressing the illiquidity issue prevalent in the NFT market. The proposed approach yields an NFT trading strategy that has shown to outperform traditional strategies, generating substantial profits even amidst bearish market conditions. The Bored Ape Yacht Club (BAYC) collection serves as the primary data set, with the agent trained from June 1, 2021, through January 1, 2023. In testing period from January 1 to June 10, 2023, the proposed model outperformed traditional benchmarks, achieving a profit of 21.14% as opposed to a 20.39% loss for the best-performing benchmark. We assert that this forms a robust foundation for future research into NFT trading simulation and backtesting. We also identify potential areas for future enhancements, particularly possible improvements in the trading strategy and Q-learning approach. The insights gleaned significantly enhance the understanding of the importance of AI applications in the rapidly evolving field of NFT trading.
  • Placeholder
    Master ThesisPublication
    A revised approach to cryptocurrency portfolio optimization using advanced Q-learning and policy iteration frameworks
    Altok, Ceren; Albey, Erinç; Albey, Erinç; Önal, Mehmet; Güler, M. G.; Department of Data Science
    Despite all the factors that cause concern among investors, such as volatility and de centralization of crypto world, the popularity of cryptocurrencies continues to grow steadily. The cryptocurrency market still holds its allure for many investors due to the high profit levels it has experienced in the past. With the entrance of numerous alt coins into the market, portfolio management becomes much more challenging. In the literature, we come across numerous studies proposing efficient portfolio management techniques for cryptocurrencies. This study presents proposed models developed based on policy iteration and Q-learning algorithms. Under Q-learning, three distinct sub-models are introduced: Deep Q-Network (DQN), Double Deep Q-Network (DDQN), and Double Dueling Q Network (DDDQN). All of these models are trained using 6-month training periods and compared using 10 different training and testing periods. Additionally, to eval uate both of proposed policy iteration and Q-learning models, baseline models were created for each algorithm, and the performance of the proposed models was assessed against these baseline models. The results indicate that among Policy Iteration models, the proposed model has the highest average ROI value of 3%, making it the top-performing model. Similarly, among Q-learning models, the proposed DQN model surpasses both baseline models and other Q-learning models, with an average ROI value of 2%. Considering all the models, the proposed Policy Iteration model achieves the highest average ROI value, while the proposed DQN and the proposed DDDQN model demonstrates the lowest volatility in terms of ROI standard deviations.
  • Placeholder
    Master ThesisPublication
    Automated failure detection in refrigerators using machine learning algorithms
    Sarıal, Selin; Yanıkoğlu, İhsan; Yanıkoğlu, İhsan; Albey, Erinç; Yavuz, T.; Department of Data Science
    The sustainable functioning of refrigerators is crucial in residential and commercial settings. These appliances are used continuously throughout the day, and any failure can lead to food spoilage, negatively impacting brand reputation. Therefore, having an efficient failure detection system that can identify and diagnose any problems instantly is essential. This paper proposes a novel machine learning pipeline that uses online sensor data from the refrigerators of anonymous customers and a feedback mechanism to inform customer service about the detected failure remotely. The performance of the system is evaluated through a real-life pilot project, and the results indicate that the proposed method achieves high accuracy in detecting various types of failure. Applying the proposed approach prevents food spoilage, reduces maintenance costs while increasing customer satisfaction, and enhances the reliability and safety of refrigerators.