In this article, we are going to see DRS in detail. What is DRS and how it works in backend, what are the options we have in DRS settings.
What is DRS ?
DRS is not to balance the load perfectly across every host. Rather, DRS monitors the resource demand and works to ensure that every VM is getting the resources entitled. When DRS determines that a better host exists for the VM, it make a recommendation to move that VM.
Two Primary functions of DRS are:
- Load balancing VMs due to imbalanced cluster
- VM placement when Powering on.
Let’s take a closer look at how DRS achieves its goal of ensuring VMs are happy, with effective placement and
efficient load balancing.
Effective VM Placement
One of the first steps in ensuring good VM performance is to make sure that the VM gets all the resources it
needs as soon as it is powered on. DRS considers the demand of a VM, so it will never be short of resources
whenever it is started. A VM’s demand includes the amount of resources it needs to run, and the way DRS
calculates this is described below.
DRS looks for the demand for every running VM in the cluster. VM demand is the amount of resources that the VM currently needs to run. For CPU, demand is calculated based on the amount of CPU the VM is currently consuming. For memory, demand is calculated based on the following formula.
VM memory demand = Function(Active memory used, Swapped, Shared) + 25% (idle consumed memory)
DRS looks for the demand for every running VM in the cluster. VM demand is the amount of resources that the VM currently needs to run. For CPU, demand is calculated based on the amount of CPU the VM is currently consuming. For memory, demand is calculated based on the following formula.
VM memory demand = Function(Active memory used, Swapped, Shared) + 25% (idle consumed memory)
Efficient Load Balancing
DRS uses
a cluster-level balance metric to make load-balancing decisions. This balance metric is calculated from the
standard deviation of resource utilization data from hosts in the cluster. DRS runs its algorithm once every 5
minutes (by default) to study imbalance in the cluster. In each round, if it needs to balance the load, DRS uses
vMotion to migrate running VMs from one ESXi host to another.
Detecting VM Demand Changes
During each round, along with resource usage data, DRS also collects resource availability data from each and
every VM and host in the cluster. Data like VM CPU average and VM CPU max over the last collection interval
depict the resource usage trend for a given VM. DRS then correlates the resource usage data with the availability data and runs its loadbalancing
algorithm before taking necessary vMotion actions in order to keep the cluster balanced and to
ensure that VMs are always getting the resources they need to run.
Cost Benefit Analysis
vMotion of live VMs comes with a performance cost, which depends on the size of the VM being migrated. If the
VM is large, it will use a lot of the current host’s and target host’s CPU and memory for vMotion. The benefit,
however, is in terms of performance for VMs on the source host, the migrated VM on the destination host, and
improved load balance across the cluster. The DRS algorithm constantly evaluates the cost and benefit of each
load balancing vMotion move.
Factors That Affect DRS Behavior
In this section, we discuss
some of the customizations and factors that affect DRS and how to use them for best performance.
DRS Automation Levels
During initial placement and load balancing, DRS generates placement and vMotion recommendations,
respectively. DRS can apply these recommendations automatically, or you can apply them manually.
DRS has three levels of automation:
- Fully Automated – DRS applies both initial placement and load balancing recommendations automatically.
- Partially Automated – DRS applies recommendations only for initial placement.
- Manual – You must apply both initial placement and load balancing recommendations.
DRS Aggression Levels (Migration Threshold)
The DRS aggression level controls the amount of imbalance that will be tolerated in the cluster. DRS has five
aggression levels ranging between 1 (most conservative) and 5 (most aggressive). The more aggressive the
level, the less DRS tolerates imbalance in the cluster. The more conservative, the more DRS tolerates imbalance.
As a result, you might see DRS initiate more migrations and generate a more even load distribution when you
increase the aggression level. By default, DRS aggression level is set to 3.
When DRS aggression is set to level 1, DRS will not load balance the VMs. DRS will only apply move
recommendations that must be taken either to satisfy hard constraints, such as affinity or anti-affinity rules, or
to evacuate VMs from a host entering maintenance or standby mode.
VM Overrides
DRS automation levels and migration threshold are normally applied at the cluster level. In some cases, you
might require DRS to treat some VMs specially. For example, you might decide DRS should not consider a
specific VM when generating its recommendations, or you might decide DRS should not migrate that VM at all.
You can set VM overrides under Cluster -> Manage -> Settings -> VM Overrides. Here
you can set the automation or migration threshold for a VM to a value different than that at the cluster level, or
even disable them.
VM/Host Rules
Rules help define special conditions on VMs and/or hosts in a DRS cluster. Once a rule is set, DRS has to honor
it, and has to make recommendations in accordance to the rule, along with its placement and load balancing
logic.
There are different types of rules that can be set:
1. Keep Virtual Machines Together (VM-VM)—This rule ensures that the VMs specified in the rule are always
running on the same host.
2. Separate Virtual Machines (VM-VM)—This rule will keep the VMs in the rule always running on different
hosts.
3. Virtual Machines to Hosts (VM-Host)—This type of rule is set on groups of one or more VMs and one or
more hosts. A host or a VM group can be created in the web client, under VM overrides under Cluster -> Manage -> Settings -> VM/Host Groups.
In VM-Host rules there are sub-rules of type should and must. With these sub-rules, you
can specify if a VM group should/must, or should not/must not run on a host group. Sub-rules of type must
(mandatory) will always be honored by DRS under all circumstances. However, sub-rules of type should
(preferential) are dropped if DRS determines that the imbalance in the cluster is very high.
Reservation, Limit, and Shares
DRS provides many tools for you to customize your VMs and workloads according to specific use cases.
Reservation, limit, and shares are three such tools borrowed from ESXi's resource
management paradigm.
Reservation:
You might need to guarantee compute resources to some critical VMs in your clusters. This is often the case
when running applications that cannot tolerate any type of resource shortage, or when running an application
that is always expected to be up and serving requests from other parts of the infrastructure.
With the help of reservations, you can guarantee a specified amount of CPU or memory to your critical VMs.
Reservations can be made for an individual VM, or at the resource pool level. In a resource pool with several
VMs, a reservation guarantees resources collectively for all the VMs in the pool.
Limit:
In some cases, you might want to limit the resource usage of some VMs in their cluster, in order to prevent them
from consuming resources from other VMs in the cluster. This can be useful, for example, when you want to
ensure that when the load spikes in a non-critical VM, it does not end up consuming all the resources and
thereby starving other critical VMs in the cluster.
Shares:
Shares provide you a way to prioritize resources for VMs when there is competition in the cluster. They can be
set at a VM or a resource pool level.
By default, a cluster has a resource pool hierarchy, with the root resource pool (the cluster itself) at the top, and
all VMs as its children. Shares are defined as numbers for all the sibling VMs under this root resource pool.
Shares are distributed equally, by default, on a per-resource basis (per-vCPU and per-unit of memory). This
means that by default, a VM with more configured resources will get more shares than a VM with fewer
resources. During resource contention, resources available at the root resource pool are shared among the
children based on their shares’ values.
DRS provides four types of shares for VMs and resource pools - Low, Normal, High, and Custom - to change
their priority compared to their siblings. Normal shares are typically 2x Low, and High shares are typically 2x
Normal. Custom can be used to set specific share values. When setting custom shares at a VM level, you need to
account for all the vCPUs and memory of that VM, since shares are assigned based on the amount of configured
resources of a VM.
No comments:
Post a Comment