Colorado State University

Fort Collins, Colorado, USA

- problem statement
- optimization problem formulation
- algorithm description
- results

$$
\newcommand{\ETC}{\mathit{ETC}_{ij}}
\newcommand{\APC}{\mathit{APC}_{ij}}
\newcommand{\APCIdle}{\mathit{APC}_{\emptyset j}}
%solutions to the vector optimization problem
\newcommand{\MSLB}{\mathit{MS}_\mathit{DL}}
\newcommand{\ELB}{E_\mathit{DL}}
\newcommand{\evalat}[2]{\left. #1 \right|_{#2}}
\newcommand{\YND}{\mathcal{Y}_\mathit{ND}}
\newcommand{\MSmu}{\mathit{MS}(\boldsymbol{\mu})}
\newcommand{\Emu}{E(\boldsymbol{\mu})}
$$

- static scheduling
- single bag-of-tasks
- task assigned to only one machine (task indivisibility)
- machine runs one task at a time
- known deterministic execution times

- large number of heterogeneous tasks and machines
**goal:**maximize profit per time- minimize operating cost (energy)
- minimize makespan: process the next bag-of-tasks after this one

- work has been done in minimizing
- execution time
- energy consumption
- reliability

- our focus is on maximizing profit
- important for businesses
- combines the makespan, energy, and other cost factors

- contributions
- monetary model for provider and client HPC
- algorithm to efficiently find
- maximum profit schedule
- bounds on profit

- software as a service (SaaS) providers maximize profits by increasing revenue and controlling costs
- example: SaaS web-based video trans-coding
- charges customers per minute of video converted
- task execution time is well known due to repetitive tasks executed by the provider

**objective:**minimize cost of processing workload**and**process tasks as fast as possible

- energy and makespan optimization
- good for system operators
- does not provide a concrete decision space for automated schedulers
- need efficient algorithms for very large-scale systems

- let $p$ be the price (revenue) per bag-of-tasks
- let $c$ be the cost per unit of energy
- let $E$ be the energy consumed
- let $MS$ be the makespan
- profit per bag-of-tasks is $p - c E$
- profit per unit time (to be maximized) is
$\frac{p - c E}{\mathit{MS}} = $ $\frac{p}{\mathit{MS}}$ $- c$ $\frac{E}{\mathit{MS}}$
- revenue per unit time $-c$ times average power consumption

- computationally expensive to compute optimal solutions for
- minimizing makespan
- maximizing profit (function of makespan)

- need scalable and efficient algorithms to find good schedules
- proposed 3 phase algorithm:
- solve linear optimization problem assuming tasks are divisible
- round the solution
- assign tasks to machines

- simplifying approximation:
**each task is divisible among machines** - $T_i$ — number of tasks of type $i$
- $M_j$ — number of machines of type $j$
- $\ETC$ —
__e__stimated__t__ime to__c__ompute for a task of type $i$ running on a machine of type $j$ - $\mu_{ij}$ — number of tasks of type $i$ assigned to machines of type $j$
- matrix $\boldsymbol{\mu}$ is a resource allocation
- decision variable
- not binary but integer valued ($\mu_{ij} \gg 1$)

- finishing time of machine type $j$ is (lower bound) $$F_j = \frac{1}{M_j} \sum_i \mu_{ij} \ETC$$

- $\APC$ — average power consumption for a task of type $i$ running on a machine of type $j$
- $\ELB(\boldsymbol{\mu}) = \sum_i \sum_j \mu_{ij} \ETC \APC$
- in the paper $\ELB$ includes consideration of idle power

- let $P_\text{max}$ be the maximum average power consumption
- models long running average power consumption
- useful for modeling cooling capacity

- recall:
- $F_j$ is finishing time
- $\mu_{ij}$ is number of tasks

- objective and power constraint are non-linear
- ratios of decision variables, $\mu_{ij}$ and $\MSLB$
- constraints can be converted to use ratios of $\mu_{ij}$ and $\MSLB$
- variable substitution
- $z_{ij} \leftarrow \frac{\mu_{ij}}{\MSLB}$ is the average tasks per unit time
- $r \leftarrow \frac{1}{\MSLB}$ is the number of bag-of-tasks per unit time

- average power consumption becomes $\bar{P} = \sum_i \sum_j z_{ij} \ETC \APC$

$$\implies$$

- solve linear program and compute

$\mu_{ij} = \frac{z_{ij}}{r}$ and $\MSLB = \frac{1}{r}$ - rounding algorithm (per task type)
- rounds the number of tasks of each type assigned to each machine type
- find nearest integer solution while satisfying constraints

- local assignment algorithm (per machine type)
- assign tasks to individual machines
- greedy algorithm to minimize makespan

- recall:
- $z_{ij}$ is average tasks per unit time
- $r$ is number of bag-of-tasks per unit time

- upper bound
- many Pareto efficient solutions to the relaxation of the energy and makespan optimization problem
- maximum profit of all those solutions

- collapse $p$ and $c$ into a single intuitive parameter
- let $E_\text{min}$ be the energy of the minimum energy solution
- let profit ratio $\gamma = \frac{p}{c E_\text{min}}$
- $\gamma$ is unitless
- $\gamma \ge 0$ is realizable
- $\gamma > 1$ ⇒ positive profit is achievable

- recall:
- $p$ is price per bag-of-tasks
- $c$ is cost per unit of energy

- heterogeneous tasks and machines
- 11,000 tasks each is one of 30 task types
- 360 machines each is one of 9 machine types

‐ Pareto front lower bound

- recall: $\gamma = \frac{p}{c E_\text{min}}$ is the profit ratio

‐ Pareto front lower bound

▢ profit upper bound (infeasible)

- recall: $\gamma = \frac{p}{c E_\text{min}}$ is the profit ratio

‐ Pareto front lower bound

▢ profit upper bound (infeasible)

◆ profit lower bound (feasible)

- recall: $\gamma = \frac{p}{c E_\text{min}}$ is the profit ratio

‐ Pareto front lower bound

▢ profit upper bound (infeasible)

◆ profit lower bound (feasible)

| profit per unit time

- recall: $\gamma = \frac{p}{c E_\text{min}}$ is the profit ratio

‐ Pareto front lower bound

▢ profit upper bound (infeasible)

◆ profit lower bound (feasible)

| profit per unit time

-- power constraint (55 KW)

- profit ratio = 1.2
- 11,000 tasks
- Pareto front generation: 173 ms

- profit ratio = 1.2
- 11,000 tasks
- Pareto front generation: 173 ms
- single max profit solve: 2 ms
- if 1 million tasks: 70 ms

- 100 Monte Carlo runs sampling over the task types
- relative decrease in profit = $100 \frac{\mathit{profit}_\text{DL} - \mathit{profit}_\text{full}}{\mathit{profit}_\text{DL}}$
- width of glyphs represent the probability density

profit ratio = 1.2

profit ratio = 1.5

- machine type 1 and 2 are less power efficient
- use when price is high thus emphasizing makespan minimization

- millions of tasks and tens of thousands of machines
- scheduler execution times are sub-second
- online batch mode scheduling for uncertain
- execution times
- arrival rates

- schedule to cores instead of machines

- fast profit maximizing scheduling algorithm
- scalable to very large systems
- provides tight lower and upper bounds on the profit per unit time
- easily applied to online batch mode scheduling

/