Basically, scalability can be defined “as the ability of a particular system to fit a problem as the scope of that problem increases (number of elements or objects, growing volumes of work and/or being susceptible to enlargement)”. For example, increasing system’s throughput by adding more software or hardware resources to cope with an increased workload. The ability to scale up a system may depend on its design, the types of data structures, algorithms or communication mechanisms used to implement the system components. By determining the load scalability, we mean when a system has the ability to make good use of available resources at different workload levels (i.e. avoiding excessive delay, unproductive consumption or contention). Factors that affect load scalability may be bad use of parallelism, inappropriate shared resources scheduling or excessive over heads. For example, a web server maintains a good level of load when the number of threads that executes HTTP requests is increased in a workload peak or if the workload is a video streaming service, then volume is typically measured by the number of users watching a video at any given time. In this case, increasing the network bandwidth between the streaming servers and the user devices would address increases in the workload. Similarly, increasing CPU power would handle an increase in an insurance company’s underwriting workload, which is measured by the size and complexity of the policy being processed. Be aware that elasticity and scalability not only require that a system react to an increase in workload, but also that it equally reacts to a decrease in workload (something that is often overlooked, and sometimes a harder problem to solve). Autoscaling depends on the monitoring agent to collect metric data at the operating system level. The actions to scale may be classified in: Vertical or Horizontal scaling.