7.3.4. Algorithm

Input format 1/4

The input to the algorithm contains:

  • The program code \( {\cal P} \) that is a parallel group of communicating loops.
  • The edges and vertices of IPCG of the input program code. (We denote \(V({\cal P}) \) by \(V \) and the vertices \(v_1,\ldots,v_n \) are sorted in the partial order determined by the intra-iteration edges. We denote the set of loop entries \( X({\cal P}) \) by \(H\).)
  • The parameter \( \delta \), that specifies the user tolerance for performance degradation, and
  • \(Q^{*}\) -- the upper limit on the number of iterations of phase 1 of the algorithm.

The algorithm computes, for each communication channel from \( x_i \) to \( x_j \), the optimal scaling factor \( k[i,j] \), such that scaling down the communication rate of the channels by the by the corresponding scaling factors does not increase the overall execution time more than \( (1+\delta) \) times.

Phase 1: Computing \( t_{\alpha}[i,j] \), \(q\), \(Q\), and \(T\) 2/4

The first phase of the algorithm computes the values of \( t_{\alpha}[i,j] \), \(q\), \(Q\), and \(T\), where:

  • \(t_{\alpha}[i,j] \) is the earliest time the vertex \(v_i\) at \(j\)th iteration can be reached,
  • The values of \(Q\),\(q\) and \(T\) satisfy the conditions: \(0\leq q \lt Q\), and, for all \(v_i\in H \), \(t_{\alpha}[i,Q]-t_{\alpha}[i,q]=T\). (We refer to the iterations \(q,q+1,\ldots,Q-1\) as the representative iterations: if each task \((i,j) \) is always performed in time \( \alpha(i,j) \), then each loop \( v_i\in H \) starts its \( \left(q+(Q-q)\cdot k\right)\)th iteration at time \( t_{\alpha}[i,q]+k\cdot T\) )

More formally, each \( t_{\alpha}[i,j] \) is the minimal value that satisfies the following expressions:

  1. \( \forall i\;:\; t_{\alpha}[i,0]=0\),
  2. \( \forall (k,i)\in E^{+}\;:\; t_{\alpha}[i,j]\geq t_{\alpha}[k,j]+\alpha(k,i) \),
  3. \( \forall (k,i)\in E^{-}\;:\; t_{\alpha}[i,j]\geq t_{\alpha}[k,j-1] \),

where

  • \( E^{+} \) is the set of intra-iteration edges, and
  • \( E^{-} \) is the set of inter-iteration edges.

The pseudo code of phase 1:

  • Set all \(t_\alpha[i,j]\) to \(0\); set \(Q=1\); set \(T=-1\)
  • repeat
    • for \(i=1\) to \(|V|\) do
      • for each intra-iteration edge \((v_i,v_j)\) where \(t_{\alpha}[j,Q]\lt t_{\alpha}[i,Q]+\alpha(i,j)\) do
        • \(t_{\alpha}[j,Q]=t_{\alpha}[i,Q]+\alpha(i,j)\)
    • \(Q=Q+1\)
    • /* recall that all inter-iteration edges are only control edges */
    • for each inter-iteration edge \((v_i,v_j)\) where \(t_{\alpha}[j,Q]\lt t_{\alpha}[i,Q-1]\) do
      • \(t_{\alpha}[j,Q]=t_{\alpha}[i,Q-1]\)
    • for \(q=Q-1\) downto \(0\) do
      • if \(\forall v_i,v_j\in H\;:\; t_{\alpha}[i,Q]-t_{\alpha}[i,q]=t_{\alpha}[j,Q]-t_{\alpha}[j,q]\) then
        • \(T=t_{\alpha}[i,Q]-t_{\alpha}[i,q]\) for some \(v_i\in H\)
        • break /* the values \(Q\), \(q\), and \(T\) have been found */
  • until \(Q\geq Q^{*}\) or \(T\geq 0\)
  • if \(T=-1\) then terminate with failure

Phase 2: Computing \( t_{\beta}[i,j] \) 3/4

The second phase of the algorithm is devoted to computing the worst-case time of the representative iterations. First we compute each \( t_{\beta}[i,j] \) that is minimum value that satisfies the following conditions:

  1. \( \forall i\;:\; t_{\beta}[i,q]=t_{\alpha}[i,q] \),
  2. \( \forall (k,i)\in E^{+}\;:\; t_{\beta}[i,j]\geq t_{\beta}[k,j]+\beta(k,i)\),
  3. \( \forall (k,i)\in E^{-}\;:\; t_{\beta}[i,j]\geq t_{\beta}[k,j-1] \),

The worst-case time of the representative iterations of loop \(v_i\in H\) is \(t_{\beta}[i,Q]-t_{\beta}[i,q]\).

The pseudo code of phase 2:

  • for \(r=q\) to \(Q\) do for each \(v_i\in V\) do \(t_{\beta}[i,r]=t_{\alpha}[i,r]\)
  • for \(r=q\) to \(Q\) do
    • for \(i=1\) to \(|V|\) do
      • for each intra-iteration edge \((v_i,v_j)\) where \(t_{\beta}[j,r]\lt t_{\beta}[i,r]+\beta(i,j)\) do
        • \(t_{\beta}[j,r]=t_{\beta}[i,r]+\beta(i,j)\)
    • for each inter-iteration control edge \((v_i,v_j)\) where \(t_{\beta}[j,r]\lt t_{\beta}[i,r]\) do
      • \(t_{\beta}[j,r]=t_{\beta}[i,r]\)

Phase 3: Computing \( t[i,j] \) and Determining \(k[i,j]\) 4/4

For power efficiency, we want to scale down the bit-rates of the communication channels. In the third phase of the algorithm, we try to find, for each connection channel between the connected loops \(x_i\) and \(x_j \), a maximal scaling factor \( k[i,j] \), under the constraint that the time of the representative iterations, for each \( x_i\in H \), does not exceed \( \max\{ (1+\delta)\cdot T, t_{\beta}[i,Q]-t_{\beta}[i,q] \} \). It is assumed that scaling factor can be selected from a discrete set of possible scaling levels. After each modification of scaling factor of any single channel, we recompute the variables \( t[i,j] \) in such a way that, for each \( v_i\in H \), the value \( t[i,Q]-t[i,q] \) is time estimation of the representative iterations of the loop \( v_i \) under the current configuration of scaling factors.

The pseudo code of phase 3:

  • set all \(t[i,j]\) to \(0\)
  • /* \([[i,j]]\) denotes the connection channel between loops \(x_i\) and \(x_j\) */
  • for each connection \([[i,j]]\) do set \(k[i,j]=1\) and \(g[i,j]=0\)
  • repeat
    • for each connection \([[i,j]]\) where \(g[i,j]=0\)
      • \(k'=k[i,j]\) /* backup \(k[i,j]\) */
      • increase \(k[i,j]\) to the next possible scaling level
      • for \(r=q\) to \(Q\) do for each \(v_i\in V\) do \(t[i,r]=t_{\alpha}[i,r]\)
      • for \(r=q\) to \(Q\) do
        • for \(i=1\) to \(|V|\) do
          • for each intra-iteration edge \((v_i,v_j)\) where \(t[j,r]\lt t[i,r]+\alpha(v_i,v_j)/k[\psi(v_i),\psi(v_j)]\)
            • \(t[j,r]= t[i,r]+\alpha(v_i,v_j)/k[\psi(v_i),\psi(v_j)]\)
        • for each inter-iteration control edge \((v_i,v_j)\) where \(t[j,r]\lt t[i,r]\) do
          • \(t[j,r]=t[i,r]\)
      • if \(\exists v_i\in H\;:\; t[i,Q]-t[i,q]\gt\max\{(1+\delta)\cdot T, t_{\beta}[Q,i]-t_{\beta}[q,i]\}\) then
        • \(k[i,j]=k'\) /* restore \(k[i,j]\) to previous value */
        • \(g[i,j]=1\) /* this connection cannot be scaled any further */
  • until \(g[i,j]=1\) for each connection \([[i,j]]\)




Projekt Cloud Computing – nowe technologie w ofercie dydaktycznej Politechniki Wrocławskiej (UDA.POKL.04.03.00-00-135/12)jest realizowany w ramach Programu Operacyjnego Kapitał Ludzki, Priorytet IV. Szkolnictwo wyższe i nauka, Działanie 4.3. Wzmocnienie potencjału dydaktycznego uczelni w obszarach kluczowych w kontekście celów Strategii Europa 2020, współfinansowanego ze środków Europejskiego Funduszu Społecznego i budżetu Państwa