Azure autoscaling best practices: an addition

Jul 25, 2021

If you don't use a scale-out and scale-in rule combination, then you might get flapping.

AI-generated abstract painting, courtesy of 1SecondPainting [archived link]

One autoscaling best practice mentioned in the Microsoft Docs [archived Microsoft Docs link] is:

Always use a scale-out and scale-in rule combination that performs an increase and decrease

If you use only one part of the combination, autoscaling will only take action in a single direction (scale-out, or in) until it reaches the maximum, or minimum instance counts, as defined in the profile. This is not optimal, ideally you want your resource to scale up at times of high usage to ensure availability. Similarly, at times of low usage you want your resource to scale down, so you can realize cost savings.

What this paragraph doesn't say is that if you have a scale condition like:

Scale out
When  example-asp  (Average) CpuPercentage > 90     Increase count by 1
Or    example-asp  (Average) MemoryPercentage > 90  Increase count by 1

Scale in
When  example-asp  (Average) CpuPercentage < 45     Decrease count by 1

Then your example-asp might get stuck in a flapping state (the condition in which instances scale up and down rapidly and repetitively). This is because the ASP might meet both the memory-based scale-out condition and the CPU-based scale-in condition.

The rest of the essay will tell the story of how I arrived at this discovery along with some insights so you can avoid spending one week with a Microsoft representative trying to diagnose the same behavior.

Work item

On the 17th of June, 2021, I picked up a task from the backlog which read:

Our App Service Plan is set to scale automatically, but that leads to some errors while the ASP is scaling: (it seems) mostly HTTP calls that call an instance which is starting or shutting down. Please investigate if it's better to disable autoscaling; and, if so, how many instances would we need?

The HTTP errors could have easily been caused by services hitting unhealthy instances because the apps in the ASP didn't have health checks. But, investigating the problem from the ticket led me to the core issue of this essay: Azure autoscaling is a black box service that you can use but not perfectly tune because you don't have all the information, only metrics, logs, correlations, and speculations.

With that in mind, I looked at the metrics of each individual app and of the ASP. In total, 6 apps had the pattern of a stable instance which got replaced by another stable instance; each instance lasted for about 4 days. All apps had ephemeral instances (i.e. with a 1-2 day lifespan); this was because of the sub-optimal scaling condition that had the scale-in threshold set at 60 and the scale-out at 80.

The average CPU utilization of the ASP was between 23% - 30% while the average memory usage was between 63% - 78%.

ASP average CPU percentage by instance on June 17

ASP average memory percentage by instance on June 17

The scale condition was based on the CpuPercentage metric, which was fine, but the scale-in and scale-out thresholds were too close to each other (only 20 percentage points) which led to flapping and, possibly, to the HTTP error calls.

After some tweaking, I arrived at the scale condition mentioned at the beginning of the essay (scale-in at 45 and scale-out at 90) which eliminated the HTTP errors but not the flapping. Flapping should have no longer occurred, but, instead, the ASP was stuck with 2 instances.

The autoscale engine determines when to scale-in based on this formula:

used scale-in metric types x current instance count / final number of instances when scaled down

The reasoning behind this equation is simple: if you have a workload that is divided across N instances that is to be moved to, for example, N - 1 instances, then the N - 1 instances should be able to handle the workload that was present on the N instances.

The official Microsoft documentation [archived documentation] explains why the formula uses all metric types of a scaling condition's scale-in rules when determining when to scale in: Autoscale only takes a scale-in action if all of the scale-in rules are triggered.

For the ASP in question, the formula should then be: 30 (average CPU usage) x 2 (current number of instances) / 1 (desired number of instances) = 60. This is 30% below the scale-out threshold I implemented, yet the ASP was stuck in endless flapping. This should only happen if the outcome of the equation is 90% or more.

After depleting the available information on the subject (i.e. the official docs, a StackOverflow post [archived version], and a blog post [archived version]) I decided to contact Microsoft.

Microsoft Support ticket

A Microsoft representative from the App Service team proposed two solutions:

Switch the App Service Plan to the P1V2 tier: same costs but newer hardware with slightly more computing resources.
Remove the MemoryPercentage rule as it might interfere with the CPU one. Alternatively, add a scale-in MemoryPercentage rule, which I didn't do because memory usage was fluctuating more than CPU usage, thus making the task of setting memory scale-in and scale-out thresholds harder.

He also recommended lowering the scale-out threshold to CpuPercentage > 80 because that's what they usually recommend to their clients.

I chose to ignore his recommendation based on 2 facts:

The way the CPU usage increases in the ASP (i.e. rare spikes that go from 30% straight to 90% - 95% in under 1 minute) makes it so that reducing the scale-out threshold by 10% would have no positive impact and one negative impact; that is...
Lowering the scale-out threshold would require lowering the scale-in threshold too (for example, to 40%). This would result in 5% more inefficient use of computing resources because lowering the scale-in threshold increases the permissible amount of idling resources. This is not necessarily bad (say, if you have a workload that fluctuates), but it's not optimal for the ASP in question. I would have to lower the scale-in threshold because if I implement the recommended scale-out threshold of 80 but don't also lower the scale-in threshold to at least 40, then the ASP can never scale in to 1 instance. If the CPU usage is 44% (slightly below the current scale-in threshold) averaged across 2 instances, the Azure autoscaler would perform the following calculation: 44 x 2 / 1 = 88. The result is above the recommended scale-out threshold of 80. Thus, the autoscaler would produce a flapping event and refrain from scaling in because then it would have to scale out again immediately afterwards.

But don't bother with why I ignored his recommendation for now. This will come in useful at the end of the essay as context for the questions I still have unanswered. For now, let's focus on his solutions.

The first solution didn't eliminate the flapping but removing the MemoryPercentage rule did the trick. We have a winner:

Scale out
When  example-asp  (Average) CpuPercentage > 90     Increase count by 1

Scale in
When  example-asp  (Average) CpuPercentage < 45     Decrease count by 1

The representative didn't explain why or how a scale-out rule might interfere with another scale-out rule, even after I pointed out that the Or in the scale condition syntax makes it seem like the scale-out rules are not connected to each other.

Then my friend Yannis, who has no familiarity with Azure, gave what I think to be the right explanation: sometimes the ASP might get stuck in flapping because both a memory-based scale-out condition and a CPU-based scale-in condition are met at the same time.

This might be why if you set two scale-in conditions, one will be prefixed by When and the other by And, but if you set two scale-out conditions, one will be prefixed by When and the other by Or:

Scale out
When  example-asp  (Average) CpuPercentage > 90     Increase count by 1
Or  example-asp  (Average) MemoryPercentage > 90    Increase count by 1

Scale in
When  example-asp  (Average) CpuPercentage < 45     Decrease count by 1
And  example-asp  (Average) MemoryPercentage < 45   Decrease count by 1

Now the issue is closed:

ASP average CPU percentage by instance on July 17

ASP average memory percentage by instance on July 17

But still, all of this is speculation and the only reason I'm explaining myself at length is to give context for the questions I am still left with.

Questions

How does the autoscale engine (a.k.a. the Azure Monitor autoscale) evaluate the scaling condition? I would love to see the algorithm.
How often does the autoscale engine evaluate the scaling condition? Is it 5 minutes [archived version]?
How does the autoscale engine resolve edge cases? For example, is it safe for me to keep scale-in and scale-out rules that leave no leeway for the scale-in calculation (as described in the second fact above)? The Ensure the maximum and minimum values are different and have an adequate margin between them [archived link] best practice does not answer this question, unfortunately.

In short: how does Azure autoscaling work, precisely?

❦

Thank you Yannis and Larisa for helping me write this.