Kubernetes autoscaling, explained

What is Kubernetes autoscaling? This automation capability means you don't have to manually provision and scale down resources as demand changes. It also prevents needless spending. Here's how it works

As the prefix “auto” suggests, autoscaling in Kubernetes is one of the container orchestration platform’s major automation capabilities. It simplifies resource management that would otherwise require intensive human effort to achieve.

“Autoscaling allows you to dynamically adjust to demand without intervention from the individuals in charge of operating the cluster,” explains Lilith Cohen, senior cloud consultant at Mission. “[It’s] one of Kubernetes’ most powerful features!”

What is autoscaling in Kubernetes?

Autoscaling is an important concept in cloud automation overall. Without autoscaling, you’re manually provisioning (and later scaling down) resources every time conditions change, and you’re less likely to be operating with optimal resource utilization and cloud spending. You’re always running – and paying for – at peak capacity to ensure availability. Or your services fail during peak demand because they don’t have enough resources available to handle the spikes. Neither scenario is likely going to lead to a happy CEO or happy customers.

"Kubernetes autoscaling helps optimize resource usage and costs by automatically scaling a cluster up and down in line with demand."

“Kubernetes autoscaling helps optimize resource usage and costs by automatically scaling a cluster up and down in line with demand,” says Fei Huang, CSO at NeuVector. “If a service in production experiences greater load during certain times of the day, for example, Kubernetes can dynamically and automatically increase the cluster nodes and deployed pods as necessary to handle that change in demand. When load decreases, Kubernetes can then adjust back to fewer nodes and pods, conserving on resources and spending.”

[ Kubernetes terminology, demystified: Read How to explain Kubernetes in plain English and get our Kubernetes glossary cheat sheet for IT and business leaders. ]

How does Kubernetes autoscaling work? Two levels

Different cloud platforms have different native autoscaling features. Kubernetes enables autoscaling at the cluster/node level as well as at the pod level, two different but fundamentally connected layers of Kubernetes architecture.

"You can't scale the application if there's no capacity remaining in the cluster."

“They’re both important and often interrelated,” says Andrew Sullivan, senior principal technical marketing manager, cloud platforms at Red Hat. “You can’t scale the application if there’s no capacity remaining in the cluster. It’s not unusual for conversations to fracture and tangent at that point into capacity management philosophies.”

These philosophical discussions become weightier in hybrid cloud and multi-cloud environments. Both approaches offer more flexibility when it comes to capacity management, but you still need to ensure you’re utilizing your available resources in a way that meets your performance requirements while staying within your budget. Autoscaling is one of the embedded technical capabilities that makes this balance more attainable.

[ Learn more. Get the free eBooks, Hybrid Cloud Strategy for Dummies and Multi-Cloud Portability for Dummies. ]

“For example, when deployed to a hyperscaler, each instance costs money, but once deployed, it’s a fixed cost whether you’re using one percent or 100 percent of the resources,” Sullivan says. “As a result, you want to utilize each node as fully as possible. This is different than on-premises, where many infrastructure admins like to have extra capacity already available so that there is as little delay as possible between a node failing and workload recovering.”

Autoscaling can not only prevent capacity-related failures (i.e., ensuring that your application always has the infrastructure resources it needs) but it can also prevent you from paying for a bunch of resources that you don’t need available 24/7 (the latter being particularly applicable to any application that experiences spikes and lulls in demand).

3 autoscaling methods for Kubernetes

There are actually three autoscaling features for Kubernetes: Horizontal Pod Autoscaler, Vertical Pod Autoscaler, and Cluster Autoscaler. Let’s take a closer look at each and what they do.

Horizontal Pod Autoscaler: "Scaling out"

Horizontal scaling, which is sometimes referred to as “scaling out,” allows Kubernetes administrators to dynamically (i.e., automatically) increase or decrease the number of running pods as your application’s usage changes.

Cohen shares a more detailed view of how it works: With a Horizontal Pod Autoscaler, “a cluster operator declares their target usage for metrics, such as CPU or memory utilization, as well their desired maximum and minimum desired number of replicas,” Cohen says. “The cluster will then reconcile the number of replicas accordingly, and scale up or down the number of running pods based on their current usage and the desired target.”

(Kubernetes’ documentation offers a deep dive on the technical nuts and bolts of HPA.)

Vertical Pod Autoscaler: "Scaling up"

There’s some “which is better?” debate about horizontal scaling versus vertical scaling. The latter, sometimes referred to as “scaling up,” refers to adding more resources (such as CPU or memory) to an existing machine. The VPA essentially applies the same principle to Kubernetes; it’s not so much that one type of scaling is “better” than the other. Rather, they serve different purposes and they’re both useful in different circumstances.

"While scaling horizontally is typically considered a best practice, there are some services that you may want to run in your cluster where this is either not possible or not ideal."

“While scaling horizontally is typically considered a best practice, there are some services that you may want to run in your cluster where this is either not possible or not ideal due to some constraint,” Cohen says. “A Vertical Pod Autoscaler allows you to scale a given service vertically within a cluster. The cluster operator declares their target usage for metrics, such as CPU or memory utilization, similarly to a Horizontal Pod Autoscaler. The cluster will then reconcile the size of the service’s pod or pods based on their current usage and the desired target.”

Cluster Autoscaler

HPA and VPA essentially make sure that all of the services running in your cluster can dynamically handle demand while not over-provisioning during slower usage periods. That’s a good thing.

But there’s now another issue that needs to be addressed: “What happens when load is at a peak and the nodes in the cluster are getting too overloaded with all the newly scaled pods?” Cohen says.

This is where the Cluster Autoscaler goes to work: As the name indicates, it’s what allows for the autoscaling of the cluster itself, increasing and decreasing the number of nodes available for your pods to run on. (A node in Kubernetes lingo is a physical or virtual machine.)

“Based on current utilization, the Cluster Autoscaler will reach out to a cloud provider’s API and scale up or down the number of nodes attached to the cluster accordingly, and the pods will rebalance themselves across the cluster,” Cohen says.

Huang from NeuVector notes that the Cluster Autoscaler and Vertical Pod Autoscaler are essentially a package deal that goes by Kubernetes Autoscaler. (They share a Github repo, too.) Cluster Autoscaler ensures that all of your pods have a home (i.e., a node on which to run), but also that there are no unused nodes (i.e., unused running machines.) VPA, to recap, automatically adjusts the CPU and memory requests made by pods as demand fluctuates.

“This [combination] makes it very simple to automate scaling for container workloads and the Kubernetes environment,” Huang says.

How to reap maximum benefit

As Red Hat’s Sullivan noted, both layers of autoscaling (cluster/node and pod) are important and related. Moreover, each of these three autoscaling methods can be used in concert together for maximum benefit.

“In many cases, a combination of all three autoscaling methods will be used in a given environment to ensure that services can run in a stable way while at peak load, and while keeping costs to a minimum during times of lower demand,” Cohen says.

[ Want to learn more? Get the free eBooks: Getting Started with Kubernetes and O'Reilly: Kubernetes Operators: Automating the Container Orchestration Platform. ]

Topics

Enterprise Technology

Kubernetes

Automation

Kevin Casey writes about technology and business for a variety of publications. He won an Azbee Award, given by the American Society of Business Publication Editors, for his InformationWeek.com story, "Are You Too Old For IT?" He's a former community choice honoree in the Small Business Influencer Awards.

More about me