The prevailing part of applications in modern high performance computing systems are data-intensive scientific computations. Thus, a considerable amount of energy is consumed by the I/O (Input/Output) subsystem performing I/O operations on terabytes of data. For better performance, each data file is split into stripes placed on distinct disks of a large array of disks. We have a complex systems consisting of many disks. The energy consumed by each disk depends on its rotational speed: the disk power consumption is quadratically proportional to the disk rotational speed. Thus, if we can perform some I/O operations at lower rate without degrading the performance of the overall computation, then we have opportunity for energy savings.
High performance computing applications are dominated by input-independent behaviour : The sequence of operations (also of I/O operations) does not depend on the values of input data. A compiler can effectively determine at compile time I/O system parameters:
Consider the following code fragment:
Since I/O operations can be performed concurrently with CPU operations, the time efficiency of the algorithm can be increased by appropriate inserting data-prefetching instructions at compilation time. Our example code would be transformed by the compiler as follows:
Here, the prefetch distance \(d\) (the time needed to cover the I/O latency, measured in iterations) can be calculated as follows: \[ d=\left\lceil\frac{T_d}{s+T_{pf}}\right\rceil, \] where
Here, \(d\) iterations of \(j\) loop are required to hide I/O latency and \(b_1\) is the strip size used for stripe mining. Execution of this optimised code is illustrated by the following timing diagram:
So far, we have optimised the time efficiency. However, in our example, \(d\) is much lower than \(b_1\). By reducing the rotational speed of the disks we can save disk energy and increase prefetch distance without significant increase of the computation time:
Here, the lower height of the red blocks indicates the reduced power consumption of disks. We have reduced the rotational speed twice and, thus, the power consumption have been reduced four times.
Seung Woo Son and Mahmut Kandemir proposed general integrated static compilation framework that can be applied to an array-based, loop-intensive program \( \cal{P} \) that consists of \( s \) loop nests. The original code of \( \cal{P}\) is transformed with the consideration of system parameters such as:
Wu-chun Feng (Editor): The Green Computing Book Tackling Energy Efficiency at Large Scale.
CRC Press 2014, Print ISBN: 978-1-4398-1987-6, eBook ISBN: 978-1-4398-1988-3.
Chapter 2. Compiler-Driven Energy Efficiency. Mahmut Kandemir, Shekhar Srikantaiah.
Seung Woo Son and Mahmut Kandemir: Energy-aware data prefetching for multi-speed disks.
In Proceedings of the 3rd Conference on Computing Frontiers, pages 105–114, Ischia, Italy, May 2006.