Browsing by Author "Muhtaroğlu, Nitel"
Now showing 1 - 5 of 5
- Results Per Page
- Sort Options
Conference paperPublication Metadata only Democratization of HPC cloud services with automated parallel solvers and application containers(Wiley, 2018-11-10) Muhtaroğlu, Nitel; Arı, İsmail; Kolcu, Birkan; Computer Science; ARI, Ismail; Kolcu, Birkan; Muhtaroğlu, NitelIn this paper, we investigate several design choices for HPC services at different layers of the cloud computing architecture to simplify and broaden its use cases. We start with the platform-as-a-service (PaaS) layer and compare direct and iterative parallel linear equation solvers. We observe that several matrix properties that can be identified before starting long-running solvers can help HPC services automatically select the amount of computing resources per job, such that the job latency is minimized and the overall job throughput is maximized. As a proof of concept, we use classical problems in structural mechanics and mesh these problems with increasing granularities leading to various matrix sizes, ie, largest having 1 billion non-zero elements. In addition to matrix size, we take into account matrix condition numbers, preconditioning effects, and solver types and execute these finite element analysis (FEA) over an IBM HPC cluster. Next, we focus on the infrastructure-as-a-service (IaaS) layer and explore HPC application performance, load isolation, and deployment issues using application containers (Docker) while also comparing them to physical and virtual machines (VM) over a public cloud.ArticlePublication Open Access Design and implementation of a cloud computing service for finite element analysis(Elsevier, 2013-06) Arı, İsmail; Muhtaroğlu, Nitel; Computer Science; ARI, Ismail; Muhtaroğlu, NitelThis paper presents an end-to-end discussion on the technical issues related to the design and implementation of a new cloud computing service for finite element analysis (FEA). The focus is specifically on performance characterization of linear and nonlinear mechanical structural analysis workloads over multi-core and multi-node computing resources. We first analyze and observe that accurate job characterization, tuning of multi-threading parameters and effective multi-core/node scheduling are critical for service performance. We design a “smart” scheduler that can dynamically select some of the required parameters, partition the load and schedule it in a resource-aware manner. We can achieve up to 7.53× performance improvement over an aggressive scheduler using mixed FEA loads. We also discuss critical issues related to the data privacy, security, accounting, and portability of the cloud service.PhD DissertationPublication Metadata only Finite element analysis in a cloud computing environment(2019-01-04) Muhtaroğlu, Nitel; Arı, İsmail; Arı, İsmail; Aktemur, Tankut Barış; Yapıcı, Güney Güven; Unat, D.; Altılar, D. T.; Department of Computer Science; Muhtaroğlu, NitelIn this thesis, the challenges faced and lessons learned while establishing a large-scale high performance cloud computing service that enables online mechanical structural analysis and many other scientific applications using the finite element analysis (FEA) technique, will be described. Within an High Performance Computing (HPC) environment, several jobs with different demands can co-exist thus it becomes a challenge for the service provider to efficiently utilize its own resources while also satisfying the quality expectations of job submitters. Such a service is intended to process many independent and loosely-dependent tasks concurrently. In order to reach optimal job scheduling metrics each job type that can be submitted to the cluster must be carefully examined, its space and time characteristics must be well-understood and quantified. Challenges faced include accurate characterization of complex FEA jobs, handling of many-task mixed jobs, sensitivity of task execution to multi-threading parameters, effective multi-core scheduling within a single computing node, and achieving seamless scaling across multiple nodes. It is found that significant performance gains in terms of both job completion latency and throughput are possible via dynamic or "smart" batch partitioning and resource-aware scheduling compared to the naive Shortest Job First (SCF) and aggressively-parallel scheduling techniques. Chapter 3 of this thesis present an end-to-end discussion on the technical issues related to the design and implementation of a new cloud computing service for finite element analysis (FEA). Several design choices for HPC services at different layers of the cloud computing architecture are investigated to simplify and broaden its use cases. Investigations start with the software-as-a-service (SaaS) layer and compare parallel linear equation solvers. In order to minimize job latency and maximize the overall job throughput, several matrix characteristics are perceived. Developing such an understanding is also crucial for HPCaaS systems to automatically select the amount of computing resources per job. In following sections, the design of a ''smart'' scheduler that can dynamically select some of the required parameters, partition the workload and schedule it in a resource-aware manner will be demonstrated. Results showing that an up to 7.53x performance improvement over an aggressive scheduler using mixed FEA loads, will be presented. In addition to the performance studies, a complementary discussion on critical issues related to the data privacy, security, accounting, and portability of the cloud service will also be given. The new trend in engineering is to solve complex computational problems in the cloud over HPC services provided by different vendors. To further deepen the analyses of workloads representing HPC-related tasks in science and engineering, in chapter 4, performances of direct vs. iterative linear equation solvers are compared to help with the development of job schedulers that can automatically choose the best solver type and tune them (e.g. precondition the matrices) according to job characteristics and workload conditions that are frequently encountered on HPC cloud services. As a proof of concept, three classical elasticity problems will be used, namely a Cantilever beam, Lame problem and Stress Concentration Factor (SCF). These models theoretically represent many real-life mechanical situations in structural engineering, namely aerospace, automotive, construction and machinery industries. The representative linear problems are meshed with increasing granularities, which leads to various matrix sizes; largest having 1 billion non-zero elements. Detailed finite element analyses over an IBM HPC cluster are executed. First, a multi-frontal parallel is used, sparse direct solver and evaluate its performance with Cholesky and LU decompositions of the generated matrices with respect to memory usage, and multi-core, multi-node execution performances. As for the iterative solver, the PETSc library is used and carried out computations with several Krylov subspace methods (CG, BiCG, GMRES) and preconditioner combinations (BJacobi, SOR, ASM, None). Later in Chapter 4, the direct and iterative solver results are compared and contrasted in order to find the most suitable algorithm for varying cases obtained from numerical modeling of these three-dimensional linear elasticity problems. In addition to aforementioned studies, as a supplementary research, infrastructure-as-a-service (IaaS) layer for HPC is examined and characteristics like application performance, load isolation, and deployment speed issues using application containers (Docker) are observed. These characteristics are also compared to physical and virtual machines (VM) over a public cloud. For this purpose, HPC-specific deployment using application containers technology is evaluated and performance metrics are examined in order to contribute to evaluation of these technologies for job schedulers to be used on Cloud Computing infrastructures. This phase of the research focuses on the understanding the behavior of cloud computing infrastructures under circumstances where deployment and utilization of containers (Docker) with a chosen software is necessary. To summarize, this multi-disciplinary doctoral thesis covers most of the critical aspects and computational challenges of providing FEA in the cloud for structural mechanics including ease of deployment, batch-level performance, job-level isolation, financial accounting and content security. It utilizes several modern software tools and techniques, while also contributing new ones to the literature.ArticlePublication Metadata only MaLeFICE: Machine learning support for continuous performance improvement in computational engineering(Wiley, 2022-04-25) Sönmezer, Hasan Berk; Muhtaroğlu, Nitel; Arı, İsmail; Gökçin, Deniz; Computer Science; ARI, Ismail; Sönmezer, Hasan Berk; Muhtaroğlu, Nitel; Gökçin, DenizComputer aided engineering (CAE) practices improved drastically within the last decade due to ease of access to computing resources and open-source software. However, increasing complexity of hardware and software settings and the scarcity of multiskilled personnel rendered the practice inefficient and infeasible again. In this article, we present a method for continuous performance improvement in computational engineering that combines online performance profiling with machine learning (ML). To test the viability of this method, we provide a detailed analysis for solution time estimation of finite element analysis (FEA) jobs based on multidimensional models. These models combine numerous matrix features (matrix size, density, bandwidth, etc.), solver features (direct-iterative, preconditioning, tolerance), and hardware features (core count, virtual–physical). We repeat our analysis over different machines as well as docker containers to demonstrate applicability over different platforms. Next, we train supervised and unsupervised ML algorithms over commonly used, realistic FEA benchmarks and compare accuracy of different models. Finally, we design two new ML-based online batch schedulers called shortest predicted time first (SPTF) and shortest cluster time first (SCTF), which are comparable in performance to the optimal, but offline shortest job first (SJF) scheduler. We find that ML-based profiling and scheduling can reduce the average turnaround times by 2x –5x over other alternatives.Conference paperPublication Metadata only Smart job scheduling for high-performance cloud computing services(Civil-comp, 2011-01) Muhtaroğlu, Nitel; Arı, İsmail; Computer Science; ARI, Ismail; Muhtaroğlu, NitelIn this paper, we describe the challenges faced and lessons learned while establishing a large-scale high performance cloud computing service that enables online mechanical structural analysis and many other scientific applications using the finite element analysis (FEA) technique. The service is intended to process many independent and loosely-dependent (e.g. assembled system) tasks concurrently. Challenges faced include accurate job characterization, handling of many-task mixed jobs, sensitivity of task execution to multi-threading parameters, effective multi-core scheduling in a single node, and achieving seamless scale across multiple nodes. We find that significant performance gains in terms of both job completion latency and throughput are possible via dynamic or "smart" partitioning and resource-aware scheduling compared to shortest first and aggressive job scheduling techniques. We also discuss issues related to secure and private processing of sensitive models in the cloud.