Astekin, M.Özcan, S.Sözer, Hasan2020-09-092020-09-092019978-1-7281-0857-5http://hdl.handle.net/10679/6930https://doi.org/10.1109/BigData47090.2019.9006593Anomalies during system execution can be detected by automated analysis of logs generated by the system. However, large scale systems can generate tens of millions of lines of logs within days. Centralized implementations of traditional machine learning algorithms are not scalable for such data. Therefore, we recently introduced a distributed log analysis framework for anomaly detection. In this paper, we introduce an extension of this framework, which can detect anomalies earlier via incremental analysis instead of the existing offline analysis approach. In the extended version, we periodically process the log data that is accumulated so far. We conducted controlled experiments based on a benchmark dataset to evaluate the effectiveness of this approach. We repeated our experiments with various periods that determine the frequency of analysis as well as the size of the data processed each time. Results showed that our online analysis can improve anomaly detection time significantly while keeping the accuracy level same as that is obtained with the offline approach. The only exceptional case, where the accuracy is compromised, rarely occurs when the analysis is triggered before all the log data associated with a particular session of events are collected.enginfo:eu-repo/semantics/restrictedAccessIncremental analysis of large-scale system logs for anomaly detectionConference paper2119212700055482870202810.1109/BigData47090.2019.9006593Log analysisDistributed systemsParallel processingAnomaly detectionBig dataMachine learning2-s2.0-85081345108