ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Compiler-assisted energy optimization for clustered VLIW processors

Nagpal, Rahul and Srikant, YN (2012) Compiler-assisted energy optimization for clustered VLIW processors. In: JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 72 (8). pp. 944-959.

[img] PDF
jou_par_dis_com_72-8_944-959_2012.pdf - Published Version
Restricted to Registered users only

Download (1MB) | Request a copy
Official URL: http://dx.doi.org/10.1016/j.jpdc.2012.04.005


Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving the clock speed, reducing the energy consumption of the logic, and making the design simpler, it introduces extra overheads by way of inter-cluster communication. This communication happens over long global wires having high load capacitance which leads to delay in execution and significantly high energy consumption. Inter-cluster communication also introduces many short idle cycles, thereby significantly increasing the overall leakage energy consumption in the functional units. The trend towards miniaturization of devices (and associated reduction in threshold voltage) makes energy consumption in interconnects and functional units even worse, and limits the usability of clustered architectures in smaller technologies. However, technological advancements now permit the design of interconnects and functional units with varying performance and power modes. In this paper, we propose scheduling algorithms that aggregate the scheduling slack of instructions and communication slack of data values to exploit the low-power modes of functional units and interconnects. Finally, we present a synergistic combination of these algorithms that simultaneously saves energy in functional units and interconnects to improves the usability of clustered architectures by achieving better overall energy-performance trade-offs. Even with conservative estimates of the contribution of the functional units and interconnects to the overall processor energy consumption, the proposed combined scheme obtains on average 8% and 10% improvement in overall energy-delay product with 3.5% and 2% performance degradation for a 2-clustered and a 4-clustered machine, respectively. We present a detailed experimental evaluation of the proposed schemes. Our test bed uses the Trimaran compiler infrastructure. (C) 2012 Elsevier Inc. All rights reserved.

Item Type: Journal Article
Additional Information: Copy right for this article belongs to Elsiver Ltd
Keywords: Scheduling;Clustered VLIW processors;Energy-aware scheduling
Department/Centre: Division of Electrical Sciences > Computer Science & Automation
Depositing User: Francis Jayakanth
Date Deposited: 03 Aug 2012 04:52
Last Modified: 03 Aug 2012 04:54
URI: http://eprints.iisc.ac.in/id/eprint/44877

Actions (login required)

View Item View Item