ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Scalable and dynamic regeneration of big data volumes

Sanghi, A and Sood, R and Haritsa, J and Tirthapura, S (2018) Scalable and dynamic regeneration of big data volumes. In: 21st International Conference on Extending Database Technology, EDBT 2018, 26 - 29 March 2018, Vienna, pp. 301-312.

[img] PDF
EDBT 2018_2018_301-312_2018.pdf - Published Version
Restricted to Registered users only

Download (1MB) | Request a copy
Official URL: https://doi.org/10.5441/002/edbt.2018.27

Abstract

A core requirement of database engine testing is the ability to create synthetic versions of the customer's data warehouse at the vendor site. A rich body of work exists on synthetic database regeneration, but suffers critical limitations with regard to: (a) maintaining statistical fidelity to the client's query processing, and/or (b) scaling to large data volumes. In this paper, we present HYDRA, a workload-dependent database regenerator that leverages a declarative approach to data regeneration to assure volumetric similarity, a crucial aspect of statistical fidelity, and materially improves on the prior art by adding scale, dynamism and functionality. Specifically, Hydra uses an optimized linear programming (LP) formulation based on a novel region-partitioning approach. This spatial strategy drastically reduces the LP complexity, enabling it to handle query workloads on which contemporary techniques fail. Second, Hydra incorporates deterministic post-LP processing algorithms that provide high efficiency and improved accuracy. Third, Hydra introduces the concept of dynamic regeneration by constructing a minuscule database summary that can on-the-fly regenerate databases of arbitrary size during query execution, while obeying volumetric specifications derived from the query workload. A detailed experimental evaluation on standard OLAP benchmarks demonstrates that Hydra can efficiently and dynamically regenerate large warehouses that accurately mimic the desired statistical characteristics.

Item Type: Conference Paper
Publication: Advances in Database Technology - EDBT
Publisher: OpenProceedings.org
Additional Information: The copyright for this article belongs to the OpenProceedings.org.
Keywords: Ability testing; Big data; Data warehouses; Linear programming, Contemporary techniques; Dynamic regeneration; Experimental evaluation; Large data volumes; Processing algorithms; Spatial strategies; Statistical characteristics; Synthetic database, Query processing
Department/Centre: Division of Electrical Sciences > Computer Science & Automation
Date Deposited: 19 Aug 2022 05:07
Last Modified: 19 Aug 2022 05:07
URI: https://eprints.iisc.ac.in/id/eprint/75975

Actions (login required)

View Item View Item