In traditional algorithms analysis we are interested in the time and memory complexity of an algorithm, i.e., how much time and memory will it take to solve a concrete problem for a concrete input size for the worst input. However, some problems arising in real life do not subject to such analysis, since not all the data is known in advance. In On-Line Algorithm Analysis we are interested exactly in such types of problems. Our main goal in this part of the course will be tackling the problem of decision making between different types of clouds when we do not know our demand in advance. We are concentrating on cost-efficiency (and thus also energy efficiency) for choosing between on-demand public cloud and a private cloud put on own equipment. The former works in pay-as-you go fashion, the former has an initial cost of equipment purchase but is later cheaper in the per-hour scale.
We start from introducing the most basic problems from the On-Line Algorithms Analysis field.
The Ski-Rental problem
The Ski-Rental problem is considered the "Hellow World" of On-Line Algorithms Analysis. The problem is defined as follows: we start skiing as a new sport and on each day in the morning we have to make a decision whether to rent the skis for this day at the cost of 1 or to buy them forever at the cost of B. What we do not know in advance is the answer to the question of when will our skiing end due for example to breaking a leg.
An algorithm for solving the above problem is defined by just a single number: this number should state the number of days of renting after which we decide to buy the skis. In particular it may be equal to zero, meaning that we decide to buy the skis on the very first day. If we know in advance the number T of days we are going to spend skiing, the decision if and when to buy the skis is simple. First we observe that if we should buy the skis at latest on day T, it only makes sense to buy them on the first day, because it will not increase the buying cost and will reduce the rental cost to zero. After reducing the "if/when" decision to the "if" decision, we notce that if we only rent, the cost will be exactly T, and if we buy the skis, the cost will be B. Thus, we base our decision on whichever of these two costs is lower - if T is lower than B, we always rent, otherwise we buy on the first day. Thus, we have deficed what is called an optimal offline algorithm. It is "offline", as it knows the whole input (i.e., the number of days) in advance and it is optimal since we proved that it has the smallest cost of all offline algorithms (thus also all algorithms in general).
The feature which differs an on-line algorithm from an offline one is that it does not know the input (i.e., the number of days) in advance and because of this its final cost might be larger than the cost of the optimal offline algorithm. For and algorithm ALG, we define its competitive ratio as a minimal value C such that for all inputs T ALG(T) < C*OPT(T), where ALG(T) is the cost of ALG on input T, and OPT(T) is the cost of an optimal offline algorithm on input T. Since we are considering a minimization problem (we are minimizing the cost, not maximizing a gain), the competitive ratio will always be at least 1.
Lower bound for deterministic algorithms
In order to understand the problem better, we start from working on a lower-bound for any on-line algorithm. A lower bound is a number C (in some other problems it might be a function) which states that each on-line algorithm will have a competitive ratio at least C. This means that for each on-line algorithm ALG we will be able to choose input T such that ALG(T) > = C * OPT(T). In this scenario we take the role not of the algorithm designer but of someone who is trying to break down the algorithm by finding the worst possible input for it. Such an entity is called an adversary.
Consider a concrete algorithm which decides to buy the skis on day T (for an algorithm which never buys the skis, we will only use the fact that it rents the skis for at least 2B days, thus let T=2B for this algorithm). For such an algorithm we define the input to be T, which means that the algorithm will rent the skis for T-1 days and then buy them, thus generating a cost of T-1+B (for the always-rent algorithm it will have a total cost of exactly 2B). As explained previously, the optimal offline algorithm will in this scenario have a cost of min(T,B). A simple calculation leads to conclusion that for each algorithm its competitive ratio C is at least 2-1/B.
Deterministic algorithm
Now that we know that it is impossible to design an algorithm that will a competitive ratio much better than 2, we may try to design one that achieves exactly that, and to be even more concrete, has a competitive ratio of C=2-1/B. Consider an algorithm which tries to amortize the cost of buying the skis against the cost of first renting them for the same cost. Namely, it first rents the skis for exactly B-1 days and buys them on day B. Before a general analysis, let's see what happens in the supposedly worst-case scenario described in the lower-bound section. In that scenario, the adversary chose the process to end exactly on the day when the algorithm decides to buy the skis. In our current case it means that T=B which means that the optimum cost is exactly B (and is achievable in two manners) and the algorithm's cost is exactly B-1+B = 2B -1 which is exactly C times larger than the optimum cost.
However, in order to prove our claim that this algorithm is C competitive, we have to check all possibilities for T, not only one special case. If T < B, then both our algorithm and the optimal one will always rent the skis, both generating costs T. If T > B (inlcuding the case when the process does not end), then the optimal algorithm will buy the skis on the first day, generating cost B, while our algorithm will first rent them for B-1 days and then buy them, generating cost 2B - 1 which is exactly C times larger than the optimal cost.
The Spin-Block problem
After developing an algorithm with a matching lower bound, we might come to the conclusion that the problem is closed. However, there are still two generalizations that we want to consider. For the first one we want to slightly change the model but also show its application in computer science and not only in ski tourism. We introduce the problem of spin-block which is defined for a single-core system running many threads concurrently. When a thread which is currently being executed needs a resource that is not available yet, a problem arises whether to wait for this resource, i.e., spin this process, or to put the process to sleep, i.e., block it. Spinning a process through a unit of time has a unit cost but we are allowed to spin it for any fractions of a unit, which is the mentioned generalization. Blocking a process means putting all its data away in order to read it later and has a cost of B time units.
A randomized algorithm
The second generalization that we would like to introduce is randomization. We allow the algorithm to choose its buying time threshold not as a deterministic value but as a random variable. For an algorithm defined this way and a concrete input its cost will also be a random variable and we will define the competitive ratio as the ratio between the expected value of the algorithm cost and the optimal offline cost. Next, we are going to show that randomization allows to bypass lower bounds for deterministic algorithms. Consider a randomized algorithm which chooses its threshold for buying the skis (or blocking a process) uniformly at random from the interval [B/2, B]. Consider all inputs for this algorithm, i.e., all values of T after which the input sequence ends.
For B/2 > T both ALG and OPT will always rent and they will have the same cost. For T > B, the algorithm behaves exactly like for T=B, as it buys the skis at time T. So we consider the case when B >= T and T > B/2.
We compute the expected value of ALG as: E[ALG] = 4 - 5/4 * B/T - T/B. In the interval interesting for us this function has its maximum equal to 1.75 which is also the competitive ratio of this randomized algorithm.
Lower bounds for randomized algorithms
There exist techniques for proving lower bounds for randomized algorithms but they are beyond the scope of this course. An interested reader is advised to search for the Yao Min-Max Principle and its applications.
Allan Borodin, Ran El-Yaniv: Online Computation and Competitive Analysis
Cambridge University Press, 2005