Faculty of Engineering
Permanent URI for this communityhttps://hdl.handle.net/10679/10
Browse
Browsing by Institution Author "ARI, Ismail"
Now showing 1 - 20 of 37
- Results Per Page
- Sort Options
Conference ObjectPublication Open Access Alarm sequence rule mining extended with a time confidence parameter(2014) Çelebi, Ö. F.; Zeydan, E.; Arı, İsmail; İleri, Ö.; Ergüt, S.; Computer Science; ARI, IsmailMost mobile telecommunication operators receive an overwhelming number of alarms in their networks. Network support specialists are faced with the challenge of picking the most important alarms in advance that can cause severe damages to the system or disrupt the service. A system that can discover alarm correlations and alarm rules then notify network administrators can significantly increase the efficiency of Network Operation Centers (NOC) of these mobile operators. This paper provides a new alarm correlation, rule discovery, and significant rule selection technique based on analysis of real data collected from a mobile telecom operator. We present a method based on sequential rule mining algorithm with an additional parameter called time-confidence. The time-confident rules found by this method are processed more efficiently in real-time Complex Event Processing (CEP) systems that require exact time-window values during monitoring. Furthermore, compared to traditional sequential rule mining, our proposed method adds another support dimension to eliminate meaningless rules that appear due to wrong settings of minimum support-confidence thresholds with respect to the nature of data.Conference ObjectPublication Metadata only Büyük veri problemlerine çözüm olarak veri akış madenciliği(IEEE, 2013) Ölmezoğulları, Erdi; Arı, İsmail; Çelebi, Ö. F.; Ergüt, S.; Computer Science; ARI, Ismail; Ölmezoğulları, ErdiGünümüzde bilişim dünyası faydalı bilgiye ulaşma yolunda “büyük veri” problemleri (verinin kütlesi, hızı, çeşitliliği, tutarsızlığı) ile baş etmeye çalışmaktadır. Bu makalede, büyük veri akışları üzerinde İlişkisel Kural Madenciliği’nin (İKM) daha önce literatürde yapılmamış bir şekilde “çevrimiçi” olarak gerçeklenme detayları ile başarım bulguları paylaşılacaktır. Akış madenciliği için Apriori ile FP-Growth algoritmaları Esper isimli olay akış motoruna eklenmiştir. Elde edilen sistem üzerinde bu iki algoritma kayan penceler ve LastFM sosyal müzik sitesi verileri kullanılarak karşılaştırılmıştır. Başarımı yüksek olan FPGrowth seçilerek gerçek-zamanlı ve kural-tabanlı bir tavsiye motoru oluşturulması sağlanmıştır. En önemli bulgularımız çevrimiçi kural çıkarımı sayesinde: (1) çevrimdışı kural çıkarımından çok daha fazla kuralın (2) çok daha hızlı ve etkin olarak ve (3) çok daha önceden hesaplanabileceği gösterilmiştir. Ayrıca müzik zevklerine uygun “George Harrison⇒The Beatles” gibi pekçok ilginç ve gerçekçi kural bulunmuştur. Sonuçlarımızın ileride diğer büyük veri analitik sistemlerinin tasarım ve gerçeklemesine ışık tutacağını ummaktayız.Conference ObjectPublication Metadata only Data stream analytics and mining in the cloud(IEEE, 2012) Arı, İsmail; Ölmezoğulları, Erdi; Çelebi, Ö. F.; Computer Science; ARI, Ismail; Ölmezoğulları, ErdiDue to prevalent use of sensors and network monitoring tools, big volumes of data or “big data” today traverse the enterprise data processing pipelines in a streaming fashion. While some companies prefer to deploy their data processing infrastructures and services as private clouds, others completely outsource these services to public clouds. In either case, attempting to store the data first for subsequent analysis creates additional resource costs and unwanted delays in obtaining actionable information. As a result, enterprises increasingly employ data or event stream processing systems and further want to extend them with complex online analytic and mining capabilities. In this paper, we present implementation details for doing both correlation analysis and association rule mining (ARM) over streams. Specifically, we implement Pearson-Product Moment Correlation for analytics and Apriori & FPGrowth algorithms for stream mining inside a popular event stream processing engine called Esper. As a unique contribution, we conduct experiments and present performance results of these new tools with different tumbling and sliding time-windows over two different stream types: one for moving bus trajectories and another for web logs from a music site. We find that while tumbling windows may be more preferable for performance in certain applications, sliding windows can provide additional benefits with rule mining. We hope that our findings can shed light on the design of other cloud analytics systems.Conference ObjectPublication Metadata only Democratization of HPC cloud services with automated parallel solvers and application containers(Wiley, 2018-11-10) Muhtaroğlu, Nitel; Arı, İsmail; Kolcu, Birkan; Computer Science; ARI, Ismail; Kolcu, Birkan; Muhtaroğlu, NitelIn this paper, we investigate several design choices for HPC services at different layers of the cloud computing architecture to simplify and broaden its use cases. We start with the platform-as-a-service (PaaS) layer and compare direct and iterative parallel linear equation solvers. We observe that several matrix properties that can be identified before starting long-running solvers can help HPC services automatically select the amount of computing resources per job, such that the job latency is minimized and the overall job throughput is maximized. As a proof of concept, we use classical problems in structural mechanics and mesh these problems with increasing granularities leading to various matrix sizes, ie, largest having 1 billion non-zero elements. In addition to matrix size, we take into account matrix condition numbers, preconditioning effects, and solver types and execute these finite element analysis (FEA) over an IBM HPC cluster. Next, we focus on the infrastructure-as-a-service (IaaS) layer and explore HPC application performance, load isolation, and deployment issues using application containers (Docker) while also comparing them to physical and virtual machines (VM) over a public cloud.ArticlePublication Metadata only Democratization of runtime verification for internet of things(Elsevier, 2018-05) İnçki, Koray; Arı, İsmail; Computer Science; ARI, Ismail; İnçki, KorayInternet of Things (IoT) devices have gained more prevalence in ambient assisted living (AAL) systems. Reliability of AAL systems is critical especially in assuring the safety and well-being of elderly people. Runtime verification (RV) is described as checking whether the observed behavior of a system conforms to its expected behavior. RV techniques generally involve heavy formal methods; thus, it is poorly utilized in the industry. Therefore, we propose a democratization of RV for IoT systems by presenting a model-based testing (MBT) approach. To enable modeling expected behaviors of an IoT system, we first describe an extension to a UML profile. Then, we capture the expected behavior of an interaction that is modeled on a Sequence Diagram (SD). Later, the expected behaviors are translated into runtime monitor statements expressed in Event-Processing Language (EPL), which are executed at the edge of the IoT network. We further demonstrate our contributions on a sample AAL system.ArticlePublication Open Access Design and implementation of a cloud computing service for finite element analysis(Elsevier, 2013-06) Arı, İsmail; Muhtaroğlu, Nitel; Computer Science; ARI, Ismail; Muhtaroğlu, NitelThis paper presents an end-to-end discussion on the technical issues related to the design and implementation of a new cloud computing service for finite element analysis (FEA). The focus is specifically on performance characterization of linear and nonlinear mechanical structural analysis workloads over multi-core and multi-node computing resources. We first analyze and observe that accurate job characterization, tuning of multi-threading parameters and effective multi-core/node scheduling are critical for service performance. We design a “smart” scheduler that can dynamically select some of the required parameters, partition the load and schedule it in a resource-aware manner. We can achieve up to 7.53× performance improvement over an aggressive scheduler using mixed FEA loads. We also discuss critical issues related to the data privacy, security, accounting, and portability of the cloud service.Conference ObjectPublication Metadata only E-Cube: multi-dimensional event sequence analysis using hierarchical pattern query sharing(ACM, 2011-06-12) Liu, M.; Rundensteiner, E.; Greenfield, K.; Gupta, C.; Wang, S.; Arı, İsmail; Mehta, A.; Computer Science; ARI, IsmailMany modern applications, including online financial feeds, tag-based mass transit systems and RFID-based supply chain management systems transmit real-time data streams. There is a need for event stream processing technology to analyze this vast amount of sequential data to enable online operational decision making. Existing techniques such as traditional online analytical processing (OLAP) systems are not designed for real-time pattern-based operations, while state-of-the-art Complex Event Processing (CEP) systems designed for sequence detection do not support OLAP operations. We propose a novel E-Cube model which combines CEP and OLAP techniques for efficient multi-dimensional event pattern analysis at different abstraction levels. Our analysis of the interrelationships in both concept abstraction and pattern refinement among queries facilitates the composition of these queries into an integrated E-Cube hierarchy. Based on this E-Cube hierarchy, strategies of drill-down (refinement from abstract to more specific patterns) and of roll-up (generalization from specific to more abstract patterns) are developed for the efficient workload evaluation. Our proposed execution strategies reuse intermediate results along both the concept and the pattern refinement relationships between queries. Based on this foundation, we design a cost-driven adaptive optimizer called Chase, that exploits the above reuse strategies for optimal E-Cube hierarchy execution. Our experimental studies comparing alternate strategies on a real world financial data stream under different workload conditions demonstrate the superiority of the Chase method. In particular, our Chase execution in many cases performs ten fold faster than the state-of-the art strategy for real stock market query workloads.Conference ObjectPublication Metadata only E-Cube: multi-dimensional event sequence processing using concept and pattern hierarchies(IEEE, 2010) Liu, M.; Rundensteiner, E. A.; Greenfield, K.; Gupta, C.; Wang, S.; Arı, İsmail; Mehta, A.; Computer Science; ARI, IsmailMany modern applications including tag based mass transit systems, RFID-based supply chain management systems and online financial feeds require special purpose event stream processing technology to analyze vast amounts of sequential multi-dimensional data available in real-time data feeds. Traditional online analytical processing (OLAP) systems are not designed for real-time pattern-based operations, while Complex Event Processing (CEP) systems are designed for sequence detection and do not support OLAP operations. We will demonstrate a novel E-Cube model that combines CEP and OLAP techniques for multi-dimensional event pattern analysis at different abstraction levels. A London transit scenario will be given to demonstrate the utility and performance of this proposed technology.Conference ObjectPublication Open Access Finding event correlations in federated wireless sensor networks(IEEE, 2011) Arı, İsmail; Çelebi, Ömer Faruk; Computer Science; ARI, Ismail; Çelebi, Ömer FarukEvent correlation engines help us find events of interest inside raw sensor data streams and help reduce the data volume, simultaneously. This paper discusses some of the challenges faced in finding event correlations over federated wireless sensor networks (WSNs) including high data volumes, uncertain or missing data, application-specific dependencies and widely varying data ranges and sampling frequencies. Analysisover real geo-tracking data of moving objects confirms some of these challenges. Federation at the data layer above the WSNs is presented as a feasible alternative.Conference ObjectPublication Metadata only Finecloud: Fine-grained cloud service advisory using machine learning(IEEE, 2022) Orhun, Yasemin; İstanbullu, Yiğit; Arı, İsmail; Computer Science; ARI, Ismail; Orhun, Yasemin; İstanbullu, YiğitMotivated by real customer problems, we investigated utilization of cloud services at different layers including infrastructure (IaaS), application services (PaaS) and databases (DaaS). We found several issues such as forgetting about unused resources, bursty workloads and service dependencies causing under-utilization (a.k.a. over- provisioning) problem. Cloud advisory tools offered by the public providers either lack the fine-grained analysis needed for actionable recommendations or can't see the correlations among services that are used by the same customers' resource groups. We proposed an automated, near real-time advisor that utilizes historical usage data and machine learning (ML) models to recommend cost saving opportunities. We demonstrated significant cost savings averaging around 20%, which can accumulate as thousands of Dollars for large and active systems. Since our advisory models depend on time-series data, we compared several forecasting algorithms including ARIMA, LSTM and Prophet. We found LSTM model to deliver the most accurate results for our workloads.Book PartPublication Metadata only Forecasting multivariate time-series data using LSTM and mini-batches(Springer, 2020) Khodabakhsh, Athar; Arı, İsmail; Bakır, M.; Alagoz, S. M.; Computer Science; Bohlouli, M.; Bigham, B. S.; Narimani, Z.; Vasighi, M.; Ansari, E.; ARI, Ismail; Khodabakhsh, AtharMultivariate time-series data forecasting is a challenging task due to nonlinear interdependencies in complex industrial systems. It is crucial to model these dependencies automatically using the ability of neural networks to learn features by extraction of spatial relationships. In this paper, we converted non-spatial multivariate time-series data into a time-space format and used Recurrent Neural Networks (RNNs) which are building blocks of Long Short-Term Memory (LSTM) networks for sequential analysis of multi-attribute industrial data for future predictions. We compared the effect of mini-batch length and attribute numbers on prediction accuracy and found the importance of spatio-temporal locality for detecting patterns using LSTM.EditorialPublication Metadata only Foreword(2012) Rundensteiner, E.; Manolescu, I.; Amer-Yahia, S.; Naumann, F.; Markl, V.; Arı, İsmail; Computer Science; ARI, IsmailConference ObjectPublication Metadata only High-performance complex event processing using continuous sliding views(ACM, 2013) Ray, M.; Rundensteiner, E. A.; Liu, M.; Gupta, C.; Wang, S.; Arı, İsmail; Computer Science; ARI, IsmailComplex Event Processing (CEP) has become increasingly important for tracking and monitoring anomalies and trends in event streams emitted from business processes such as supply chain management to online stores in e-commerce. These monitoring applications submit complex event queries to track sequences of events that match a given pattern. While the state-of-the-art CEP systems mostly focus on the execution of flat sequence queries, we instead support the execution of nested CEP queries specified by the (NEsted Event Language) NEEL. However the iterative execution often results in the repeated recomputation of similar or even identical results for nested subexpressions as the window slides over the event stream. In this work we thus propose to optimize NEEL execution performance by caching intermediate results. In particular we design two methods of applying selective caching of intermediate results. The first is the Continuous Sliding Caching technique. The second is a further optimization of the previous technique which we call the Interval-Driven Semantic Caching. Techniques for incrementally loading, purging and exploiting the cache content are described. Our experimental study using real-world stock trades evaluates the performance of our proposed caching strategies for different query types.Conference ObjectPublication Metadata only High-performance nested CEP query processing over event streams(IEEE, 2011) Liu, M.; Rundensteiner, E.; Dougherty, D.; Gupta, C.; Wang, S.; Arı, İsmail; Mehta, A.; Computer Science; ARI, IsmailComplex event processing (CEP) over event streams has become increasingly important for real-time applications ranging from health care, supply chain management to business intelligence. These monitoring applications submit complex queries to track sequences of events that match a given pattern. As these systems mature the need for increasingly complex nested sequence query support arises, while the state-of-art CEP systems mostly support the execution of flat sequence queries only. To assure real-time responsiveness and scalability for pattern detection even on huge volume high-speed streams, efficient processing techniques must be designed. In this paper, we first analyze the prevailing nested pattern query processing strategy and identify several serious shortcomings. Not only are substantial subsequences first constructed just to be subsequently discarded, but also opportunities for shared execution of nested subexpressions are overlooked. As foundation, we introduce NEEL, a CEP query language for expressing nested CEP pattern queries composed of sequence, negation, AND and OR operators. To overcome deficiencies, we design rewriting rules for pushing negation into inner subexpressions. Next, we devise a normalization procedure that employs these rules for flattening a nested complex event expression. To conserve CPU and memory consumption, we propose several strategies for efficient shared processing of groups of normalized NEEL subexpressions. These strategies include prefix caching, suffix clustering and customized “bit-marking” execution strategies. We design an optimizer to partition the set of all CEP subexpressions in a NEEL normal form into groups, each of which can then be mapped to one of our shared execution operators. Lastly, we evaluate our technologies by conducting a performance study to assess the CPU processing time using real-world stock trades data. Our results confirm that our NEEL execution in many cases performs 100 fold fast er than the traditional iterative nested execution strategy for real stock market query workloads.Book PartPublication Metadata only Hybrid job scheduling for improved cluster utilization(Springer Science+Business Media, 2014) Arı, İsmail; Kocak, Uğur; Computer Science; ARI, Ismail; Kocak, UğurIn this paper, we investigate the models and issues as well as performance benefits of hybrid job scheduling over shared physical clusters. Clustering technologies that are currently supported include MPI, Hadoop-MapReduce and NoSQL systems. Our proposed scheduling model is above the cluster-specific middleware and OS-level schedulers and it is complementary to them. First, we demonstrate that we can effectively schedule MPI, Hadoop, NoSQL jobs together by profiling them and then co-scheduling. Second, we find that it is better to schedule cluster jobs with different job characteristics together (CPU vs. I/O intensive) rather than two CPU-intensive jobs. Third, we use the learning outcome of this principle to design of a greedy sort-merge scheduler. Up to 37% savings in total job completion times are demonstrated. These savings are directly proportional to the cluster utilization improvements.Conference ObjectPublication Metadata only Kombi̇natoryal test tekni̇kleri̇ni̇n karmaşık olay i̇şleme motorlarında uygulanması(IEEE, 2012) Arı, İsmail; Ölmezoğulları, Erdi; Sözer, Hasan; Computer Science; ARI, Ismail; SÖZER, Hasan; Ölmezoğulları, ErdiTesting practice has a critical place during the design, implementation and integration of software, hardware and complex systems composed of these. Cost of failures caused by bugs that could not be detected and fixed early in the process increase in a multiplicative way and adversely affect the overall projects costs. However, trying to do comprehensive tests generating correct outputs is also costly both time-wise and money-wise. Combinatorial Testing Techniques (CTT) have been a preferred method in software testing due to their quantifiable case coverage guarantees and appropriateness for automation. We observed that, Complex Event Processing (CEP) engines - commonly used today for real-time analysis over critical, high-volume signal processing applications (e.g. mobile communication, sensors, radar) - are NOT being systematically tested with approaches such as CTT. In this paper, we uniquely show applicability of CTT to CEP for fast creation of continuous query test suites and obtain promising results.ArticlePublication Metadata only MaLeFICE: Machine learning support for continuous performance improvement in computational engineering(Wiley, 2022-04-25) Sönmezer, Hasan Berk; Muhtaroğlu, Nitel; Arı, İsmail; Gökçin, Deniz; Computer Science; ARI, Ismail; Sönmezer, Hasan Berk; Muhtaroğlu, Nitel; Gökçin, DenizComputer aided engineering (CAE) practices improved drastically within the last decade due to ease of access to computing resources and open-source software. However, increasing complexity of hardware and software settings and the scarcity of multiskilled personnel rendered the practice inefficient and infeasible again. In this article, we present a method for continuous performance improvement in computational engineering that combines online performance profiling with machine learning (ML). To test the viability of this method, we provide a detailed analysis for solution time estimation of finite element analysis (FEA) jobs based on multidimensional models. These models combine numerous matrix features (matrix size, density, bandwidth, etc.), solver features (direct-iterative, preconditioning, tolerance), and hardware features (core count, virtual–physical). We repeat our analysis over different machines as well as docker containers to demonstrate applicability over different platforms. Next, we train supervised and unsupervised ML algorithms over commonly used, realistic FEA benchmarks and compare accuracy of different models. Finally, we design two new ML-based online batch schedulers called shortest predicted time first (SPTF) and shortest cluster time first (SCTF), which are comparable in performance to the optimal, but offline shortest job first (SJF) scheduler. We find that ML-based profiling and scheduling can reduce the average turnaround times by 2x –5x over other alternatives.Conference ObjectPublication Open Access Model-based runtime monitoring of smart city systems(Elsevier, 2018) İnçki, Koray; Arı, İsmail; Computer Science; ARI, Ismail; İnçki, KorayThe pace of proliferation for smart systems in city wide applications is unmatched. The introduction of Internet of Things (IoT), an enabler of smart city phenomenon, has incubated a productive environment for such innovations. Smart things equipped with IoT capabilities, allow for developing smart city applications at such large scale that each application can be represented as a system of systems (SoS). Nevertheless, the complexity of engineering such SoS has been a major challenge in developing and maintaining smart city applications. One of the engineering challenges that industry face today is the verification of a SoS smart city application at runtime. We introduce utilization of a model-based runtime monitoring approach for providing reliable service. We propose to use message sequence charts for representing a smart city application, later allow the practitioners to express expected behavior of an application in terms of complex-event processing patterns. We demonstrate the fidelity of our approach on a sample smart parking system. Our approach is one of its kind in enabling a non-intrusive monitoring of IoT behavior at runtime (online).ArticlePublication Open Access Multivariate sensor data analysis for oil refineries and multi-mode identification of system behavior in real-time(IEEE, 2018) Khodabakhsh, Athar; Arı, İsmail; Bakır, M.; Ercan, Ali Özer; Electrical & Electronics Engineering; Computer Science; ARI, Ismail; ERCAN, Ali Özer; Khodabakhsh, AtharLarge-scale oil refineries are equipped with mission-critical heavy machinery (boilers, engines, turbines, and so on) and are continuously monitored by thousands of sensors for process efficiency, environmental safety, and predictive maintenance purposes. However, sensors themselves are also prone to errors and failure. The quality of data received from these sensors should be verified before being used in system modeling. There is a need for reliable methods and systems that can provide data validation and reconciliation in real-time with high accuracy. In this paper, we develop a novel method for real-time data validation, gross error detection and classification over multivariate sensor data streams. The validated and high-quality data obtained from these processes is used for pattern analysis and modeling of industrial plants. We obtain sensor data from the power and petrochemical plants of an oil refinery and analyze them using various time-series modeling and data mining techniques that we integrate into a complex event processing engine. Next, we study the computational performance implications of the proposed methods and uncover regimes where they are sustainable over fast streams of sensor data. Finally, we detect shifts among steady-states of data, which represent systems' multiple operating modes and identify the time when a model reconstruction is required using DBSCAN clustering algorithm.Conference ObjectPublication Metadata only NEEL: The nested complex event language for real-time event analytics(Springer International Publishing, 2011) Liu, M.; Rundensteiner, E. A.; Dougherty, D.; Gupta, C.; Wang, S.; Arı, İsmail; Mehta, A.; Computer Science; ARI, IsmailComplex event processing (CEP) over event streams has become increasingly important for real-time applications ranging from health care, supply chain management to business intelligence. These monitoring applications submit complex event queries to track sequences of events that match a given pattern. As these systems mature the need for increasingly complex nested sequence query support arises, while the state-of-art CEP systems mostly support the execution of only flat sequence queries. In this paper, we introduce our nested CEP query language NEEL for expressing nested queries composed of sequence, negation, AND and OR operators. Thereafter, we also define its formal semantics. Subtle issues with negation and predicates within the nested sequence context are discussed. An E-Analytics system for processing nested CEP queries expressed in the NEEL language has been developed. Lastly, we demonstrate the utility of this technology by describing a case study of applying this technology to a real-world application in health care.