Published on

Using the Vertical Pod Autoscaler (VPA) in Kubernetes

Authors

We manage a Kubernetes cluster with a mix of deployments. These include static sites, web applications and utilities. Some of these deployments need to be single-replica, as they don't support horizontal scaling. This means we can't use the Horizontal Pod Autoscaler (HPA) to manage resource requests and limits. We've recently been exploring the use of the Vertical Pod Autoscaler (VPA) as an alternative, which (almost!) leverages the new InPlace Pod Vertical Scaling feature now in Kubernetes 1.27+.

VPA Overview

The Vertical Pod Autoscaler (VPA) automatically adjusts the CPU and memory requests and limits for your pods based on their actual usage. This is particularly useful for workloads that cannot be horizontally scaled, such as single-replica deployments. The VPA consists of three main components:

  1. Recommender: Monitors resource usage and provides recommendations for CPU and memory requests and limits.
  2. Updater: Updates the resource requests and limits of pods based on the recommendations from the Recommender.
  3. Admission Controller: Intercepts pod creation and updates to apply the recommended resource requests and limits.

Requests and limits

In Kubernetes, resource requests and limits are used to manage the resources allocated to containers within pods.

  • Requests: The minimum amount of CPU and memory that a container is guaranteed to have. The Kubernetes scheduler uses these values to determine which node to place the pod on.
  • Limits: The maximum amount of CPU and memory that a container can use. If a container tries to exceed its limit, it may be throttled or terminated.

Problem: If resource requests and limits are not set appropriately, it can lead to inefficient resource usage, application instability, and poor performance. It's difficult to predict the exact resource requirements for applications, especially when workloads vary over time. We were hoping the VPA could help us manage this automatically.

InPlace Pod Vertical Scaling

With the introduction of InPlace Pod Vertical Scaling in Kubernetes 1.27, the VPA can now update the resource requests and limits of running pods without needing to restart them. This is a significant improvement, as it reduces downtime and allows for more seamless resource management. However, there are still some limitations to be aware of:

  • Not all workloads support in-place updates. Some applications may require a restart to apply new resource settings.
  • The VPA may not be able to update resources if the node does not have sufficient available resources.
  • InPlace Pod Vertical Scaling is still an alpha feature and may not be suitable for production environments.

VPA Modes

The VPA can operate in three different modes:

  • Off: The VPA only provides recommendations but does not update any pods. = Initial: The VPA sets the resource requests and limits for pods when they are first created, but does not update them afterwards. This mode is useful for workloads that do not support in-place updates but still benefit from initial resource tuning.
  • InPlaceOrRecreate: The VPA attempts to update the resource requests and limits of running pods in place. If in-place updates are not supported by the workload, it will recreate the pod with the new settings.

Why use VPA?

In our case, we have several deployments that need to run as single replicas. These include:

  • Static sites served by Nginx
  • Web applications that do not support horizontal scaling
  • Utility services that perform background tasks These deployments often have varying resource requirements based on the workload, making it challenging to set appropriate resource requests and limits manually. Using the VPA allows us to automatically recommend these values based on actual usage, ensuring that our applications have the resources they need to run efficiently without over-provisioning.

VPA Approach

Our approach was to move through the VPA modes as follows:

  1. Start with the VPA in Off mode to gather resource usage data and get recommendations.
  2. Use the recommendations to set initial resource requests and limits for the deployments.
  3. Move to Initial mode to apply the recommended resource requests and limits when pods are created.

Installing the VPA

Installing the VPA as a custom resource definition (CRD) involves cloning the repository and running the installation script:

git clone https://github.com/kubernetes/autoscaler/
cd vertical-pod-autoscaler/
./hack/vpa-up.sh

Initial VPA Configuration

An example VPA configuration is shown below. This example sets the VPA to Off mode initially. It also sets minimum and maximum resource limits to prevent the VPA from recommending values that are too low or too high.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa      
  namespace: default
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind:       Deployment
    name:       my-app
  updatePolicy:
    updateMode: "Off"  # Start in Off mode to gather data
  resourcePolicy:
    containerPolicies:
      - containerName: "*"
        minAllowed:
          cpu: "100m"
          memory: "128Mi"
        maxAllowed:
          cpu: "1"
          memory: "1Gi"

Monitoring and Adjusting

After deploying the VPA, we monitored the recommendations provided by the Recommender component.

To view the recommendations, we used the following command:

This output includes the recommended CPU and memory requests and limits based on the observed usage. We used these recommendations to set the initial resource requests and limits for our deployments. Once we were satisfied with the initial settings, we switched the VPA to Initial mode:

  updatePolicy:
    updateMode: "Initial"

What's Next?

We are currently evaluating the performance of our deployments with the VPA in Initial mode. If we find that our workloads support in-place updates and the VPA is providing accurate recommendations, we may consider moving to InPlaceOrRecreate mode in the future.

However, the minReplicas setting on the VPA currently prevents us from using this mode, as it requires at least 2 replicas to enable recreation of pods. Our options are to either:

  • Reduce the minReplicas to 1 and accept the risk of downtime during pod recreation (if in-place updates are not possible).
  • Keep using Initial mode and manually adjust resources as needed.

Ultimately, re-creating a pod would only be necessary if the node does not have sufficient resources to accommodate the new requests and limits. In this case, the pod would presumably be a candidate for throttling or eviction anyway.