How to Monitor and Troubleshoot Multi-Tenant Kubernetes Clusters with Capsule and Prometheus

Riya

Monitoring and troubleshooting Kubernetes clusters become more complex in a multi-tenant setup. When multiple teams or applications share a cluster, ensuring that each tenant operates efficiently without impacting others is crucial. Capsule, a Kubernetes multi-tenancy operator, helps manage tenant isolation, while Prometheus, a popular monitoring tool, enables real-time visibility into cluster performance. In this blog, we’ll explore how to monitor and troubleshoot multi-tenant Kubernetes clusters using Capsule and Prometheus. We’ll cover real-world scenarios, provide code examples, and share best practices.

🚀 Why Monitoring is Critical in Multi-Tenant Kubernetes Clusters

In a multi-tenant Kubernetes cluster, the following challenges make monitoring essential:

Resource contention – A single tenant consuming excessive resources can degrade the performance of others.
Performance bottlenecks – High latency or failed deployments can affect tenant services.
Access issues – Misconfigured roles or permissions can prevent tenants from accessing their resources.
Scaling problems – Auto-scaling might fail if tenants exceed quotas or node limits.

To avoid these issues, you need a monitoring solution that provides:

Real-time visibility into tenant performance
Granular metrics for CPU, memory, and network usage
Alerting and notification for early issue detection
Isolation of tenant-specific metrics for easier troubleshooting

Capsule helps isolate tenants, while Prometheus collects and visualizes performance data.

🛠️ How Capsule Enhances Multi-Tenant Monitoring

Capsule creates logical boundaries around tenants by:

Assigning tenants to namespaces.
Enforcing quotas and resource limits per tenant.
Isolating network traffic using policies.

By combining Capsule with Prometheus, you can monitor each tenant separately and ensure that resource consumption stays within defined limits.

📊 How Prometheus Works in Kubernetes

Prometheus is an open-source monitoring and alerting system that collects and processes metrics from Kubernetes resources. It works by:

Scraping HTTP endpoints for metrics.
Storing collected metrics in a time-series database.
Providing a query language (PromQL) to analyze data.
Generating alerts based on defined rules.

Prometheus integrates well with Kubernetes components like the kubelet, cAdvisor, and exporters (like Node Exporter and Kafka Exporter).

🔎 Step-by-Step Guide: Setting Up Capsule and Prometheus for Monitoring

1. Install Capsule

Start by installing Capsule using Helm:

helm repo add clastix https://clastix.github.io/charts
helm install capsule clastix/capsule

Verify the installation:

kubectl get pods -n capsule-system

2. Create a Capsule Tenant

Create a tenant named analytics-team:

apiVersion: capsule.clastix.io/v1beta1
kind: Tenant
metadata:
  name: analytics-team
spec:
  owners:
    - kind: User
      name: "dev1@example.com"
  namespaceQuota: 2
  nodeSelector:
    kubernetes.io/os: linux
  storageClasses:
    allowed:
      - standard

Apply the configuration:

kubectl apply -f tenant.yaml

3. Install Prometheus

Install Prometheus using Helm:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack

Verify that Prometheus is running:

kubectl get pods -n default | grep prometheus

4. Set Up Prometheus to Monitor Capsule Tenants

Create a ServiceMonitor to scrape metrics from tenant namespaces:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: capsule-monitor
  namespace: monitoring
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      capsule.clastix.io/tenant: analytics-team
  namespaceSelector:
    matchNames:
      - analytics-team-namespace
  endpoints:
    - port: http
      interval: 30s

Apply the configuration:

kubectl apply -f service-monitor.yaml

This tells Prometheus to scrape metrics from the analytics-team tenant’s namespace.

5. Monitor Resource Usage with PromQL

Use PromQL to query tenant-specific metrics:

CPU usage per tenant:

sum(rate(container_cpu_usage_seconds_total{namespace="analytics-team-namespace"}[5m]))

Memory usage per tenant:

sum(container_memory_usage_bytes{namespace="analytics-team-namespace"})

Network I/O per tenant:

sum(rate(container_network_receive_bytes_total{namespace="analytics-team-namespace"}[5m])) +
sum(rate(container_network_transmit_bytes_total{namespace="analytics-team-namespace"}[5m]))

6. Create Alerts for Critical Issues

Create an AlertingRule to notify when CPU usage exceeds 90%:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: tenant-cpu-alert
  namespace: monitoring
spec:
  groups:
  - name: tenant-cpu-usage
    rules:
    - alert: HighTenantCPUUsage
      expr: sum(rate(container_cpu_usage_seconds_total{namespace="analytics-team-namespace"}[5m])) > 0.9
      for: 1m
      labels:
        severity: critical
      annotations:
        summary: "High CPU usage for tenant analytics-team"

Apply the configuration:

kubectl apply -f alerting-rule.yaml

7. Create a Grafana Dashboard for Visualization

Install Grafana:

helm install grafana prometheus-community/grafana

Expose Grafana:

kubectl port-forward svc/grafana 3000:80

Access Grafana at http://localhost:3000 using the default login (admin/prom-operator).

👉 Import a Kubernetes dashboard and filter metrics by namespace (analytics-team-namespace) to isolate tenant data.

🚨 Troubleshooting Common Issues

1. No Data in Prometheus

Ensure that the ServiceMonitor selector matches the pod labels.
Check if Prometheus can reach the target endpoints:

kubectl logs -f prometheus-server-0

2. High Latency or Resource Exhaustion

Adjust resource quotas using Capsule to prevent resource starvation:

spec:
  resourceQuota:
    hard:
      cpu: "2"
      memory: 4Gi

3. Network Connectivity Issues

Check network policies applied to tenant namespaces:

kubectl describe networkpolicy

🏆 Best Practices for Monitoring Multi-Tenant Clusters

Define separate namespaces for tenants – This simplifies monitoring and resource management.
Use Prometheus ServiceMonitor for tenant-specific metrics – Avoid scraping the entire cluster for performance reasons.
Set up alerts for critical resource consumption – Notify early to prevent outages.
Visualize performance with Grafana – Dashboards simplify troubleshooting and capacity planning.

🌍 Real-World Use Case

A SaaS company hosts multiple customer environments on a shared Kubernetes cluster. After introducing Capsule and Prometheus:

Each customer gets a dedicated tenant and namespace.
Prometheus scrapes tenant-specific metrics.
Alerts notify when CPU/memory usage exceeds limits.
Grafana dashboards provide real-time insights into tenant performance.

This setup improves monitoring accuracy, simplifies troubleshooting, and ensures fair resource distribution among tenants.

🎯 Conclusion

Monitoring multi-tenant Kubernetes clusters is essential for performance, security, and scalability. Capsule streamlines tenant management, while Prometheus provides real-time insights and alerting. By combining the two, you can create a secure, scalable, and efficient Kubernetes environment that meets the demands of modern cloud-native applications.

That’s it for now. I hope this article gave you some useful insights on the topic. Please feel free to drop a comment, question or suggestion.

Riya

Riya is a DevOps Engineer with a passion for new technologies. She is a programmer by heart trying to learn something about everything. On a personal front, she loves traveling, listening to music, and binge-watching web series.

How to Monitor and Troubleshoot Multi-Tenant Kubernetes Clusters with Capsule and Prometheus

Riya

Table of Contents

🚀 Why Monitoring is Critical in Multi-Tenant Kubernetes Clusters

🛠️ How Capsule Enhances Multi-Tenant Monitoring

📊 How Prometheus Works in Kubernetes

🔎 Step-by-Step Guide: Setting Up Capsule and Prometheus for Monitoring

1. Install Capsule

2. Create a Capsule Tenant

3. Install Prometheus

4. Set Up Prometheus to Monitor Capsule Tenants

5. Monitor Resource Usage with PromQL

6. Create Alerts for Critical Issues

7. Create a Grafana Dashboard for Visualization

🚨 Troubleshooting Common Issues

1. No Data in Prometheus

2. High Latency or Resource Exhaustion

3. Network Connectivity Issues

🏆 Best Practices for Monitoring Multi-Tenant Clusters

🌍 Real-World Use Case

🎯 Conclusion

Riya

Suggested Article

NashTech

Solutions

Useful links

Connect with us

Our achievements