Towards automatic compilation for energy efficient iterative stencil computations

Zou, Yun, author; Rajopadhye, Sanjay, advisor; Strout, Michelle M., committee member; Anderson, Chuck W., committee member; Gao, Xinfeng, committee member

Towards automatic compilation for energy efficient iterative stencil computations

dc.contributor.author	Zou, Yun, author
dc.contributor.author	Rajopadhye, Sanjay, advisor
dc.contributor.author	Strout, Michelle M., committee member
dc.contributor.author	Anderson, Chuck W., committee member
dc.contributor.author	Gao, Xinfeng, committee member
dc.date.accessioned	2016-08-18T23:10:16Z
dc.date.available	2016-08-18T23:10:16Z
dc.date.issued	2016
dc.description.abstract	Today, energy has become a critical concern in all aspects of computing. In this thesis, we address the energy efficiency of an important class of programs called "Stencil Computations", which occur frequently in a wide variety of scientific applications. We target the compute intensive stencil computations, and seek to automatically produce codes that minimize energy consumption. Two main energy consumption contributors are addressed in our work -- dynamic memory energy and static energy -- which are proportional to the number of off-chip memory accesses and execution time separately. We first target the dynamic energy consumption, and propose an energy-efficient tiling and parallelization strategy called Flattened Multi-Pass Parallelization (FMPP), it seeks to minimize the total number of off-chip memory accesses without sacrificing execution time. Our strategy uses two-level tiling, which first partitions the iteration space into "passes", and then tiles the passes and executes the passes in a "non-synchronized" or overlapped fashion. Producing such codes are beyond the capability of current tiled code generators, because the schedules used are polynomials, thus are more general than multidimensional schedules. We present a parametric tiled code generation algorithm for FMPP strategy for the programs with parallelogram shaped iteration space. Then, we seek to reduce the static energy consumption by further improving the performance of generated code. We found that existing production compilers fail to vectorize the parametric tiled code efficiently, which is critical to the compiled program's performance. We propose a compilation method for parametrically tiled stencil computations that systematically vectorizes the loops with short vector intrinsics. Our method targets the non-boundary full tiles, trades register loads of register reorganization operations, enables vector register reuse within and across vectorized computations, and incorporates temporary buffering and memory padding to align memory accesses. We developed a semi-automatic code generation framework to support our memory efficient strategy and compilation method for vectorization. Our framework allows a number of optimization choices to be configured (e.g., the trade-off of data reorganization instructions and the number of aligned loads, tiling and parallelization strategy etc). We evaluate our strategy on several modern Intel architectures with a set of stencil benchmarks. Our experimental results shown that our energy efficient tiling and parallelization strategy is able to significantly reduce the dynamic memory energy consumption on different platforms, by about a 74% (resp. 75% and 67%) reduction on an 8-core Xeon E5-2650 v2 (resp. 6-core Xeon E5-2620 v2 and 6-core Xeon E5-2620 v3). This leads to a reduction in the total energy consumption of the program by 2% to 14%. Our vectorized code also shows significant performance improvement over existing compilers. We get an average of 34% performance improvement for Jacobi 1D on all the platforms, and up to 40% performance improvement for some 2D stencils. With the savings in both static energy and dynamic memory energy, we are able to reduce the total energy consumption by 20% in average for 2D stencils on the Xeon E5-2620~v3 platform. The tuning space for our experiment is fairly large (including both optimization choices and tile sizes), and exhaustively searching the whole space is extremely time-consuming. In our work, we also take the first step for building an autotuner for our framework. We propose to use Artificial Neural Networks to assist the tuning process, and present a study of performance tuning with the assistance of neural networks. Our results show that the use of an Artificial Neural Network has a great potential to accurately predict the performance, and can help reduce the search space significantly.
dc.format.medium	born digital
dc.format.medium	doctoral dissertations
dc.identifier	Zou_colostate_0053A_13712.pdf
dc.identifier.uri	http://hdl.handle.net/10217/176677
dc.identifier.uri	https://doi.org/10.25675/3.024157
dc.language	English
dc.language.iso	eng
dc.publisher	Colorado State University. Libraries
dc.relation.ispartof	2000-2019
dc.rights	Copyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.
dc.subject	automatic parallelization
dc.subject	multi-level tiling
dc.subject	vectorization
dc.subject	auto-tuning
dc.subject	artificial neural network
dc.subject	parametric tiling
dc.title	Towards automatic compilation for energy efficient iterative stencil computations
dc.type	Text
dcterms.rights.dpla	This Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
thesis.degree.discipline	Computer Science
thesis.degree.grantor	Colorado State University
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy (Ph.D.)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Zou_colostate_0053A_13712.pdf
Size:: 2.34 MB
Format:: Adobe Portable Document Format

Download

Collections

2000-2019
Theses and Dissertations