Application Scaling in Kubernetes¶
One of the primary advantages of running applications on Kubernetes is the ability to automatically scale them in response to real-world demand. Proper scaling ensures that your application has the resources it needs to be responsive and available, while also optimizing costs by not over-provisioning resources.
The Contain Platform provides the foundational components necessary to implement both horizontal and vertical scaling for your applications.
Further Reading
For a more comprehensive deep-dive into this topic, please refer to the official Kubernetes Autoscaling Workloads documentation.
Scaling Concepts¶
There are two primary ways to scale an application in Kubernetes:
-
Horizontal Scaling: This involves increasing or decreasing the number of running instances (pods) for your application. If your application's load increases, you add more pods. If the load decreases, you remove pods. This is the most common scaling strategy for stateless applications.
-
Vertical Scaling: This involves increasing or decreasing the resources (CPU and memory) allocated to the existing pods. If a pod is consistently using 100% of its allocated CPU, you can give it more CPU. This is often used for stateful applications like databases that may not be able to scale horizontally with ease.
The Role of the Metrics Server¶
To make intelligent scaling decisions, Kubernetes needs to know how much resource (CPU and memory) your application's pods are currently consuming. This is the job of the Kubernetes Metrics Server.
The Contain Platform includes the Metrics Server as a standard component in every cluster. It collects resource usage data from every pod and exposes it through the Kubernetes Metrics API, making it available for autoscalers to use.
Horizontal Scaling with the Horizontal Pod Autoscaler (HPA)¶
The most common way to automatically scale your application is with a
HorizontalPodAutoscaler (HPA). The HPA automatically adjusts the number of
replicas in a Deployment, StatefulSet, or other scaleable resource based on
observed CPU or memory usage.
How It Works¶
The HPA controller periodically queries the Metrics Server for the resource utilization of the pods it's targeting. It then compares the current utilization to the target you've defined and calculates the optimal number of replicas.
Example: Scaling on CPU Utilization¶
Let's say you have a Deployment for your web application and you want to
ensure the pods' average CPU usage stays around 80%. If the average CPU usage
exceeds this target, the HPA will add more pods. If it drops well below, it will
remove pods.
You would define an HPA resource like this:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-webapp-hpa
namespace: my-app
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-webapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
scaleTargetRef: Points to theDeploymentwe want to scale.minReplicas/maxReplicas: Defines the lower and upper bounds for the number of pods. The HPA will never scale below 2 or above 10 pods.metrics: Defines the metric to scale on. In this case, it targets an average CPU utilization of 80% across all pods.
Set Resource Requests
For the HPA to work effectively, you must set CPU resource requests on
your Pods' containers. The utilization percentage is calculated based on the
requested amount (e.g., 80% of a 500m CPU request is 400m). Without
requests, the HPA cannot calculate utilization and will not function.
Vertical Scaling with the Vertical Pod Autoscaler (VPA)¶
While HPA changes the number of pods, the VerticalPodAutoscaler (VPA)
adjusts the CPU and memory resource requests of the pods themselves. This
helps you "right-size" your pods, ensuring they have the resources they need
without being over-provisioned.
How It Works¶
The VPA analyzes the historical resource consumption of your pods. Based on this
analysis, it can recommend or automatically apply updated resource requests
and limits to your pod specifications.
Using VPA for Recommendations (Recommended)¶
The safest and most common way to use VPA is in "recommendation mode." In this
mode, the VPA does not automatically change your pods' resources. Instead, it
provides a recommendation that you can review and apply manually. This is an
great tool for determining the optimal resource requests for your
application.
Here is an example of a VPA in recommendation mode:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-webapp-vpa
namespace: my-app
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: my-webapp
updatePolicy:
updateMode: "Off"
targetRef: Points to theDeploymentwe want to analyze.updatePolicy.updateMode: "Off": This is the key field. It tells the VPA to only generate recommendations and not to apply any changes.
After a while, you can check the VPA's recommendations with kubectl describe
vpa my-webapp-vpa.
Automatic Updates¶
The VPA can also be configured with an updateMode of "Auto". In this mode,
it will automatically update the pod template with its recommended resource
requests. To apply these changes, the VPA must restart the pods, which can cause
a brief service interruption. This mode should be used with caution and is best
suited for applications that can handle rolling restarts gracefully.
Combining HPA and VPA: The Best of Both Worlds¶
You can use HPA and VPA together to achieve a highly efficient scaling strategy.
- Use VPA in
"Off"mode: Deploy a VPA withupdateMode: "Off"for your application. Let it run for a period to gather data and generate stable recommendations for CPU and memory requests. - Apply the Recommendations: Manually update your
Deployment's pod template with the resourcerequestsrecommended by the VPA. - Use HPA for Horizontal Scaling: With your pods now "right-sized," create an HPA to scale the number of replicas horizontally based on CPU or memory utilization.
Conflict Warning
You cannot use an HPA and a VPA in "Auto" mode on the same resource
for the same metric (CPU or memory). The two autoscalers would conflict,
leading to undesirable behavior. The recommended pattern is always to use
VPA in "Off" mode to inform the requests used by the HPA.