ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Exploiting Programmable Network Interfaces for Parallel Query Execution in Workstation Clusters

Kumar, Santhosh V and Thazhuthaveetil, MJ and Govindarajan, R (2006) Exploiting Programmable Network Interfaces for Parallel Query Execution in Workstation Clusters. In: 20th International Parallel and Distributed Processing Symposium, 2006. IPDPS 2006., 25-29 April 2006, Rhodes Island.

[img] PDF
Exploiting_Programmable.pdf - Published Version
Restricted to Registered users only

Download (168kB) | Request a copy
Official URL: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumb...


Workstation clusters equipped with high performance interconnect having programmable network processors facilitate interesting opportunities to enhance the performance of parallel application run on them. In this paper, we propose schemes where certain application level processing in parallel database query execution is performed on the network processor. We evaluate the performance of TPC-H queries executing on a high end cluster where all tuple processing is done on the host processor, using a timed Petri net model, and find that tuple processing costs on the host processor dominate the execution time. These results are validated using a small cluster. We therefore propose 4 schemes where certain tuple processing activity is offloaded to the network processor. The first 2 schemes offload the tuple splitting activity - computation to identify the node on which to process the tuples, resulting in an execution time speedup of 1.09 relative to the base scheme, but with I/O bus becoming the bottleneck resource. In the 3rd scheme in addition to offloading tuple processing activity, the disk and network interface are combined to avoid the I/O bus bottleneck, which results in speedups up to 1.16, but with high host processor utilization. Our 4th scheme where the network processor also performs apart of join operation along with the host processor, gives a speedup of 1.47 along with balanced system resource utilizations. Further we observe that the proposed schemes perform equally well even in a scaled architecture i.e., when the number of processors is increased from 2 to 64

Item Type: Conference Paper
Publisher: IEEE
Additional Information: Copyright 2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Department/Centre: Division of Electrical Sciences > Computer Science & Automation
Division of Interdisciplinary Sciences > Supercomputer Education & Research Centre
Date Deposited: 10 Nov 2011 05:45
Last Modified: 10 Nov 2011 05:45
URI: http://eprints.iisc.ac.in/id/eprint/41965

Actions (login required)

View Item View Item