Browsing by Author "Astekin, Merve"
Now showing 1 - 4 of 4
- Results Per Page
- Sort Options
Conference ObjectPublication Metadata only A big data processing framework for self-healing internet of things applications(IEEE, 2016) Dundar, B.; Astekin, Merve; Aktas, M. S.; Astekin, MerveIn this study, we introduce a big data processing framework that provides self-healing capability in the Internet of Things domain. We discuss the high-level architecture of this framework and its prototype implementation. To identify faulty conditions, we utilize a complex-event processing technique by applying a rule-based pattern-detection algorithm on the events generated real-time. For events, we use a descriptor metadata of the measurements (such as CPU usage, memory usage, bandwidth usage) taken from Internet of Things devices. To understand the usability and effectiveness of the proposed architecture, we test the prototype implementation for performance and scalability under increasing incoming message rates. The results are promising, because its processing overhead is negligible.ArticlePublication Metadata only Centrality and scalability analysis on distributed graph of large-scale e-mail dataset for digital forensics(IEEE, 2020-12-10) Ozcan, S.; Astekin, Merve; Shashidhar, N. K.; Zhou, B.; Astekin, MerveToday's digital forensics software tools mostly do not offer automatic analysis methods to reveal evidences among huge amounts of digital files within hard disk images. It is important that finding evidence in digital and cyber forensics investigations as soon as possible by examining hard disk images. E-mails constitute a rich source of information in hard disk images, and they are the most possible data source to obtain an evidence. The analyzers search e-mail files by manually or using traditional methods in order to find an evidence. However, this operation could take a long time due to the size of the e-mail data which can contain a huge number of files and a huge volume of data. This study introduces an end-to-end distributed graph analysis framework for large-scale digital forensic datasets, and evaluates the accuracy of the centrality algorithms and the scalability of the proposed framework in terms of running time performance. The framework is comprised of specific processes to perform pre-processing, graph building, and algorithm activities. An architecture is introduced based on distributed big data techniques. Three different centrality algorithms are implemented to analyze the accuracy of our framework. Further, three implementations are provided to demonstrate the running time performance of our framework. Experiments are performed on Enron e-mail dataset to analyze the centrality algorithms, to evaluate the performance of the framework, and to compare the running times between the traditional approach and our approach. Moreover, the running time performance of the framework is evaluated under various parallelization level. The accuracy of the results is also evaluated and compared between the centrality algorithms. The comparison shows that some certain algorithms provide more accurate results and it is possible to improve the running time by orders of magnitude utilizing our end-to-end distributed graph analysis approach.ArticlePublication Metadata only Provenance aware run-time verification of things for self-healing Internet of Things applications(Wiley, 2019-02-10) Aktas, M. S.; Astekin, Merve; Astekin, MerveWe propose a run-time verification mechanism of things for self-healing capability in the Internet of Things domain. We discuss the software architecture of the proposed verification mechanism and its prototype implementations. To identify faulty running behavior of things, we utilize a complex event processing technique by applying rule-based pattern detection on the events generated real time. For events, we use a descriptor metadata of the measurements (such as CPU usage, memory usage, and bandwidth usage) taken from Internet of Things devices. To understand the usability and effectiveness of the proposed mechanism, we developed prototype applications using different event processing platforms. We test the prototype implementations for performance and scalability under increasing message rates. The results are promising because the processing overhead of the proposed verification mechanism is negligible.PhD DissertationPublication Metadata only Scalable analysis of large-scale system logs for anomaly detection(2019-05-30) Astekin, Merve; Sözer, Hasan; Sözer, Hasan; Arı, İsmail; Öztop, Erhan; Aktaş, M. S.; Akyokuş, S.; Department of Computer Science; Astekin, MerveSystem logs provide information regarding the status of system components and various events that occur at runtime. This information can support fault detection, diagnosis and prediction activities. However, it is a challenging task to analyze and interpret a huge volume of log data, which do not always conform to a standardized structure. As the scale increases, distributed systems can generate logs as a collection of huge volume of messages from several components. Thus, it becomes infeasible to monitor and detect anomalies e ciently and e ectively by applying manual or traditional analysis techniques. There have been several studies that aim at detecting system anomalies automatically by applying machine learning techniques on system logs. However, they o er limited e ciency and scalability. We identified three shortcomings that cause these limitations: i) Existing log parsing techniques do not parse unstructured log messages in a parallel and distributed manner. ii) Log data is processed mainly in o ine mode rather than online. That is, the entire log data is collected beforehand, instead of analyzing it piece-by-piece as soon as more data becomes available. iii) Existing studies employ centralized implementations of machine learning algorithms. In this dissertation, we address these shortcomings to facilitate end-to-end scalable analysis of large-scale system logs for anomaly detection. We introduce a framework for distributed analysis of unstructured log messages. We evaluated our framework with two sets of log messages obtained from real systems. Results showed that our framework achieves more than 30% performance improvement on average, compared to baseline approaches that do not employ fully distributed processing. In addition, it maintains the same accuracy level as those obtained with benchmark studies although it does not require the availability of the source code, unlike those studies. Our framework also enables online processing, where log data is processed progressively in successive time windows. The benefit of this approach is that some anomalies can be detected earlier. The risk is that the accuracy might be hampered. Experimental results showed that this risk occurs rarely, only when a window boundary cross-cuts a session of events. On the other hand, average anomaly detection time is reduced significantly. Finally, we introduce a case study that evaluates distributed implementations of PCA and K-means algorithms. We compared the accuracy and performance of these algorithms both with respect to each other and with respect to their centralized implementations. Results showed that the distributed versions can achieve the same accuracy and provide a performance improvement by orders of magnitude when compared to their centralized versions. The performance of PCA turns out to be better than K-means, although we observed that the di erence between the two tends to decrease as the degree of parallelism increases.