ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Towards failure correlation for improved cloud application service resilience

Mathews, DR and Verma, M and Aggarwal, P and Lakshmi, J (2021) Towards failure correlation for improved cloud application service resilience. In: 14th IEEE/ACM International Conference on Utility and Cloud Computing, 6-9 Dec 2021, Leicester.

[img] PDF
IEEE_UCC_2021.pdf - Published Version
Restricted to Registered users only

Download (650kB) | Request a copy
Official URL: https://doi.org/10.1145/3492323.3495586

Abstract

Autonomously dealing with disruptions is necessary for maintaining the quality of a cloud application service. A fault, error, or failure in any component across the application service stack can potentially disrupt the service delivery. Fault localization and failure prediction are essential techniques in managing service failures. Emerging cloud computing paradigms are pushing application services to be built as loosely coupled distributed components for independent scaling. However, such architectures render existing approaches for fault localization and failure prediction to be limiting. Prevalent works on fault localization and failure prediction focus on a specific cloud service architecture layer or a subset of service components or specific fault types. These approaches restrict the view on the impact of the fault on the application service and obviate more intelligent methods for localizing faults or predicting failures, and thus efficiently dealing with service disruptions in an autonomous way. This paper contemplates the propagation of faults in multi-tiered architectures like clouds and uses a real-world disruption scenario to emphasize the need for correlating the faults across the service layers to acquire insights for end-to-end fault analysis for cloud application services. © 2021 ACM.

Item Type: Conference Paper
Publication: ACM International Conference Proceeding Series
Publisher: Association for Computing Machinery
Additional Information: The copyright for this article belongs to Association for Computing Machinery
Keywords: Architecture; Computer architecture; Distributed database systems; Failure (mechanical), Application services; Cloud applications; Failure correlation; Failures prediction; Fault analysis; Fault localization; Fault propagation; Resilience; Service resiliences; Service stack, Forecasting
Department/Centre: Division of Electrical Sciences > Electrical Engineering
Date Deposited: 09 Mar 2022 10:26
Last Modified: 09 Mar 2022 10:26
URI: http://eprints.iisc.ac.in/id/eprint/71520

Actions (login required)

View Item View Item