The Horizontal & Vertical Pod Autoscaling

Anshika Varshney

When deploying applications on Kubernetes, one of the biggest challenges is ensuring they run smoothly under varying workloads. Sometimes there are only a few users, and other times a lot of users may come at once. Changing resources by hand every time is not practical. That’s why Kubernetes Autoscaling is useful — it automatically adjusts your app’s resources based on the workload, so your app runs smoothly while also saving costs.

In Kubernetes, there are two main types of autoscaling for Pods:

Horizontal Pod Autoscaling (HPA)
Vertical Pod Autoscaling (VPA)

Horizontal Pod Autoscaler (HPA)

The Horizontal Pod Autoscaler automatically increases or decreases the number of Pods in a deployment, replica set, or statefulset based on observed metrics (like CPU or memory usage).

Think of HPA as scaling out or in

If your app is getting more traffic → HPA adds more Pods.
If traffic goes down → HPA reduces the Pods to save resources.

How HPA Works?

HPA continuously watches metrics (like CPU, memory, or even custom metrics).
When a threshold is crossed (say CPU > 70%), it increases the number of Pods.
When usage drops, it scales Pods down.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Here, the HPA ensures that CPU usage remains around 70%, scaling Pods between 2 and 10.

Vertical Pod Autoscaler (VPA)

The Vertical Pod Autoscaler automatically adjusts the CPU and memory requests/limits of containers in Pods. Think of VPA as scaling up or down resources inside a Pod.

If a Pod needs more memory → VPA increases its memory allocation.
If it is over-provisioned → VPA reduces the allocated resources to save cost.

How VPA Works?

VPA monitors historical and current resource usage.
It recommends or applies updated CPU/memory values.
Pods may restart to apply new resource values.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: myapp-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: myapp
  updatePolicy:
    updateMode: "Auto"

Here, VPA automatically adjusts CPU/memory for myapp Pods.

HPA vs VPA – Key Differences

Feature	HPA (Horizontal)	VPA (Vertical)
What it scales	Number of Pods	CPU/Memory per Pod
Scaling direction	Scale out/in (Pods count)	Scale up/down (resources)
Best for	Handling traffic spikes	Optimizing resource usage
Pod restarts required?	No	Yes (for new resources)

Autoscaling in Kubernetes ensures:

High availability (apps scale up when needed).
Cost efficiency (scale down when idle).
Resilience (apps can handle unpredictable traffic).
Use HPA when your app faces variable traffic loads.
Use VPA when you want to fine-tune resource allocation automatically.

Conclusion

With Kubernetes autoscaling, your apps adjust to demand automatically. HPA adds or removes pods, while VPA tunes resources for each pod. Together, they keep your app reliable, cost-efficient, and ready for any traffic spike.

Anshika Varshney

Software Consultant

The Horizontal & Vertical Pod Autoscaling

Anshika Varshney

Table of Contents

Horizontal Pod Autoscaler (HPA)

How HPA Works?

Vertical Pod Autoscaler (VPA)

How VPA Works?

HPA vs VPA – Key Differences

Autoscaling in Kubernetes ensures:

Conclusion

Anshika Varshney

Leave a Comment Cancel Reply

Suggested Article

NashTech

Solutions

Useful links

Connect with us

Our achievements