Qualifier

Presenter Kyle M. Tarplee

The Papers

Task Scheduling

Heterogeneous Computing (HC) Systems

Steady-State Scheduling

Motivation

Steady-State Scheduling

Modeling

  • Focus on only the dominate schedule impacting parts of the problem
  • Find a model with the least DOF (i.e. only model what you need to model)
  • Assign portion of task types to machine types
  • Neglect transient effects
  • Convert integer valued decision variables to continuous
  • Temporarily ignore fine grain allocation (e.g. time ordering)
  • Steady-State Scheduling

    Approach

    Paper #1

    Steady-State Scheduling on Heterogeneous Clusters: Why and How?

    Overview

    Platform Model

    Platform Model

  • Full overlap, single port model
  • A node can simultaneously:
  • All parameters are assumed to be rational
  • Master-Slave Tasking

    Problem Description

    Master-Slave Tasking

    Example System

    Master-Slave Tasking

    Linear Program

    $$\underset{\alpha_i,\,s_{ij}\,\vert\, e_{ij} \in E}{\text{maximize}} \quad n_{task} = \sum_i \frac{\alpha_i}{w_i}$$ subject to:

    $\forall i$$0 \leq \alpha_i \leq 1$computing time is a fraction
    $\forall i,j \,\vert\, e_{ij} \in E$$0 \leq s_{ij} \leq 1$communication time is a fraction
    $\forall i$$\sum_{j \,\vert\, e_{ij} \in E} s_{ij} \leq 1$total sending fraction
    $\forall i$$\sum_{j \,\vert\, e_{ji} \in E} s_{ji} \leq 1$total receiving fraction

    Master-Slave Tasking

    Linear Program

    $\forall i,j \,\vert\, e_{jm} \in E$$s_{jm} = 0$master receives nothing
    $\forall i \,\vert\, i \neq m$$\sum_{j \,\vert\, e_{ji} \in E} \frac{s_{ji}}{c_{ji}}$
    $= \frac{\alpha_i}{w_i} + \sum_{j \,\vert\, e_{ij} \in E} \frac{s_{ij}}{c_{ij}}$
    tasks received = processed + forwarded

    Pipelined Scatter

    Problem Description

    Pipelined Scatter

    Example System

    Pipelined Scatter

    Linear Program

    $$\underset{TP,\,s_{ij},\,send(i,j,k)}{\text{maximize}} \quad TP$$ subject to:

    $\forall i,j \,\vert\, e_{ij} \in E$$0 \leq s_{ij} \leq 1$communication time is a fraction
    $\forall i$$\sum_{j \,\vert\, e_{ij} \in E} s_{ij} \leq 1$total sending fraction
    $\forall i$$\sum_{j \,\vert\, e_{ji} \in E} s_{ji} \leq 1$total receiving fraction

    Pipelined Scatter

    Linear Program

    $\forall i,j \,\vert\, e_{ij} \in E$$s_{ij} = \sum_k send(i,j,k) c_{ij}$ sent time is sum of all sending times
    $\forall i,k, \,\vert\, k \neq i$$\sum_{j \,\vert\, e_{ji} \in E} send(i,j,k)$
    $= \sum_{j \,\vert\, e_{ij} \in E} send(i,j,k)$
    received = sent
    if not target
    $\forall P_k \in \mathcal{P}_\text{target}$$TP = \sum_{j \,\vert\, e_{jk} \in E} send(j,k,k)$ $TP$ is the messages arriving at the target

    Determine Scheduling Period

    Constructing a Schedule

    Example

    Reference

    Example

    Problem and LP Solution

    Example

    Finding the Bipartite Graph

    Example

    Schedule

    Extensions

    Limitations of the Approach

    Critique

    Paper #2

    Linear Programming-Based Affinity Scheduling of Independent Tasks on Heterogeneous Computing Systems

    Overview

    Problem Description

    Baseline Mapping Heuristics

    Minimum Execution Time (MET) K-Percent Best (KPB) Minimum Completion Time (MCT)
    Assign task to machine which is fastest for that task type from $S_i^k$ with the earliest completion time, where $S_i^k$ is the set of $k\%$ fastest machines for task type $i$ which has the earliest completion time
    Required state none between MET and MCT full
    Issues Ignores waiting time at the machine Selecting $k$ is difficult.
    $k$ is the same for each task type.
    No consideration of future task arrivals

    LPAS

    Nomenclature

    LPAS

    Allocation LP

    $$\underset{\lambda,\,\delta_{ij}}{\text{maximize}} \quad \lambda$$ subject to:

    $\forall i$$\sum_j \delta_{ij} \mu_{ij} \geq \lambda \alpha_i$task arrival rate constraint
    $\forall j$$\sum_i \delta_{ij} \leq 1$machine allocation constraint
    $\forall i,j$$\delta_{ij} \geq 0$proportional allocation is positive

    LPAS

    Allocation LP Solution

    LPAS

    Algorithm

    1. Pick the solution which has the smallest number of zeros (least restrictive)
    2. $S_i = \{j \,:\, \delta^*_{ij} \neq 0\}$ be the set of assignable machines for task type $i$
    3. Assigns a task to a machine in $S_i$ with the earliest completion time

    Example 1

    System A

    Given a HC system: $\alpha = \begin{pmatrix} 2.45 & 2.45 \end{pmatrix}$ and $\mu = \begin{pmatrix}9 & 5 \\ 2 & 1 \end{pmatrix}$

    Allocation LP gives: $\delta^* = \begin{pmatrix}0 & 0.5 \\ 1 & 0.5 \end{pmatrix}$ with task rates of $\delta^* \mu = \begin{pmatrix}0 & 2.5 \\ 2 & 0.5 \end{pmatrix}$ and $\lambda^* = \frac{2.5}{2.45} = 1.0204 > 1$

    Example 2

    Given a HC system: $\alpha = \begin{pmatrix} 2.45 & 2.45 \end{pmatrix}$ and $\mu = \begin{pmatrix}2.5 & 2.5 \\ 2.5 & 2.5 \end{pmatrix}$

    Allocation LP gives: $\delta^* = \begin{pmatrix}0 & 1 \\ 1 & 0 \end{pmatrix}$ or $\begin{pmatrix}1 & 0 \\ 0 & 1 \end{pmatrix}$ and $\lambda^* = \frac{2.5}{2.45} = 1.0204 > 1$

    Simulation

    Results

    System A

    System B

    $\alpha = \begin{pmatrix} 5 & 8 \end{pmatrix}$ and $\mu = \begin{pmatrix}8 & 3 \\ 4 & 10 \end{pmatrix}$

    Allocation LP gives: $\delta^* = \begin{pmatrix}0.8333 & 0 \\ 0.1667 & 1 \end{pmatrix}$

    System C1

    System D

    High task and high machine heterogeneity

    System E

    Low task and high machine heterogeneity

    MCT outperforms LPAS because you effectively have one task type and LPAS restricts which machine it can run on (when there is virtually no benefit to do so)

    System F1

    High task and low machine heterogeneity with $\alpha = \begin{pmatrix}4 & 8 & 10 & 10\end{pmatrix}$

    System F2

    High task and low machine heterogeneity with $\alpha = \begin{pmatrix}7 & 7 & 7 & 7\end{pmatrix}$

    System G

    Super low task and machine heterogeneity

    MCT performs slightly better then LPAS possibly because the smaller machine set is more restrictive without a large benefit

    System H

    Super high task and machine heterogeneity, high task to machine affinity

    System I1

    Same as system C1 but with hyper-exponential execution time distribution (twice the variance)

    Note that the LPAS upper bound is a typo in the paper

    System I2

    Same as system C1 but with constant execution time

    LPAS-2/k Heuristic

    LP-Static Heuristic

    System C2

    Guided-LPAS Heuristics

    Critique

    Future Directions (My Thoughts)

    Conclusion

    /