Afraz, Mohammed and Saha, Diptikalyan and Kanade, Aditya (2015) P3: Partitioned Path Profiling. In: 10th Joint Meeting of the European Software Engineering Conference (ESEC) / ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE), AUG 30-SEP 04, 2015 , Bergamo, ITALY , pp. 485-495.
PDF
ESEC-FSC_485_2015.pdf - Published Version Restricted to Registered users only Download (541kB) | Request a copy |
Abstract
Acyclic path profile is an abstraction of dynamic control flow paths of procedures and has been found to be useful in a wide spectrum of activities. Unfortunately, the runtime overhead of obtaining such a profile can be high, limiting its use in practice. In this paper, we present partitioned path profiling (P3) which runs K copies of the program in parallel, each with the same input but on a separate core, and collects the profile only for a subset of intra-procedural paths in each copy, thereby, distributing the overhead of profiling. P3 identifies ``profitable'' procedures and assigns disjoint subsets of paths of a profitable procedure to different copies for profiling. To obtain exact execution frequencies of a subset of paths, we design a new algorithm, called PSPP. All paths of an unprofitable procedure are assigned to the same copy. P3 uses the classic Ball-Larus algorithm for profiling unprofitable procedures. Further, P3 attempts to evenly distribute the profiling overhead across the copies. To the best of our knowledge, P3 is the first algorithm for parallel path profiling. We have applied P3 to profile several programs in the SPEC 2006 benchmark. Compared to sequential profiling, P3 substantially reduced the runtime overhead on these programs averaged across all benchmarks. The reduction was 23%, 43% and 56% on average for 2, 4 and 8 cores respectively. P3 also performed better than a coarse-grained approach that treats all procedures as unprofitable and distributes them across available cores. For 2 cores, the profiling overhead of P3 was on average 5% less compared to the coarse-grained approach across these programs. For 4 and 8 cores, it was respectively 18% and 25% less.
Item Type: | Conference Proceedings |
---|---|
Additional Information: | Copy right for this article belongs to the ASSOC COMPUTING MACHINERY, 1515 BROADWAY, NEW YORK, NY 10036-9998 USA |
Department/Centre: | Division of Electrical Sciences > Computer Science & Automation |
Date Deposited: | 22 Oct 2016 09:40 |
Last Modified: | 22 Oct 2016 09:40 |
URI: | http://eprints.iisc.ac.in/id/eprint/55062 |
Actions (login required)
View Item |