Monitoring and troubleshooting Kubernetes clusters become more complex in a multi-tenant setup. When multiple teams or applications share a cluster, ensuring that each tenant operates efficiently without impacting others is crucial. Capsule, a Kubernetes multi-tenancy operator, helps manage tenant isolation, while Prometheus, a popular monitoring tool, enables real-time visibility into cluster performance. In this blog, we’ll explore how to monitor and troubleshoot multi-tenant Kubernetes clusters using Capsule and Prometheus. We’ll cover real-world scenarios, provide code examples, and share best practices.
🚀 Why Monitoring is Critical in Multi-Tenant Kubernetes Clusters
In a multi-tenant Kubernetes cluster, the following challenges make monitoring essential:
- Resource contention – A single tenant consuming excessive resources can degrade the performance of others.
- Performance bottlenecks – High latency or failed deployments can affect tenant services.
- Access issues – Misconfigured roles or permissions can prevent tenants from accessing their resources.
- Scaling problems – Auto-scaling might fail if tenants exceed quotas or node limits.
To avoid these issues, you need a monitoring solution that provides:
- Real-time visibility into tenant performance
- Granular metrics for CPU, memory, and network usage
- Alerting and notification for early issue detection
- Isolation of tenant-specific metrics for easier troubleshooting
Capsule helps isolate tenants, while Prometheus collects and visualizes performance data.
🛠️ How Capsule Enhances Multi-Tenant Monitoring
Capsule creates logical boundaries around tenants by:
- Assigning tenants to namespaces.
- Enforcing quotas and resource limits per tenant.
- Isolating network traffic using policies.
By combining Capsule with Prometheus, you can monitor each tenant separately and ensure that resource consumption stays within defined limits.
📊 How Prometheus Works in Kubernetes

Prometheus is an open-source monitoring and alerting system that collects and processes metrics from Kubernetes resources. It works by:
- Scraping HTTP endpoints for metrics.
- Storing collected metrics in a time-series database.
- Providing a query language (PromQL) to analyze data.
- Generating alerts based on defined rules.
Prometheus integrates well with Kubernetes components like the kubelet, cAdvisor, and exporters (like Node Exporter and Kafka Exporter).
🔎 Step-by-Step Guide: Setting Up Capsule and Prometheus for Monitoring
1. Install Capsule
Start by installing Capsule using Helm:
helm repo add clastix https://clastix.github.io/charts
helm install capsule clastix/capsule
Verify the installation:
kubectl get pods -n capsule-system
2. Create a Capsule Tenant
Create a tenant named analytics-team:
apiVersion: capsule.clastix.io/v1beta1
kind: Tenant
metadata:
name: analytics-team
spec:
owners:
- kind: User
name: "dev1@example.com"
namespaceQuota: 2
nodeSelector:
kubernetes.io/os: linux
storageClasses:
allowed:
- standard
Apply the configuration:
kubectl apply -f tenant.yaml
3. Install Prometheus
Install Prometheus using Helm:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack
Verify that Prometheus is running:
kubectl get pods -n default | grep prometheus
4. Set Up Prometheus to Monitor Capsule Tenants
Create a ServiceMonitor to scrape metrics from tenant namespaces:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: capsule-monitor
namespace: monitoring
labels:
release: prometheus
spec:
selector:
matchLabels:
capsule.clastix.io/tenant: analytics-team
namespaceSelector:
matchNames:
- analytics-team-namespace
endpoints:
- port: http
interval: 30s
Apply the configuration:
kubectl apply -f service-monitor.yaml
This tells Prometheus to scrape metrics from the analytics-team tenant’s namespace.
5. Monitor Resource Usage with PromQL
Use PromQL to query tenant-specific metrics:
CPU usage per tenant:
sum(rate(container_cpu_usage_seconds_total{namespace="analytics-team-namespace"}[5m]))
Memory usage per tenant:
sum(container_memory_usage_bytes{namespace="analytics-team-namespace"})
Network I/O per tenant:
sum(rate(container_network_receive_bytes_total{namespace="analytics-team-namespace"}[5m])) +
sum(rate(container_network_transmit_bytes_total{namespace="analytics-team-namespace"}[5m]))
6. Create Alerts for Critical Issues
Create an AlertingRule to notify when CPU usage exceeds 90%:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: tenant-cpu-alert
namespace: monitoring
spec:
groups:
- name: tenant-cpu-usage
rules:
- alert: HighTenantCPUUsage
expr: sum(rate(container_cpu_usage_seconds_total{namespace="analytics-team-namespace"}[5m])) > 0.9
for: 1m
labels:
severity: critical
annotations:
summary: "High CPU usage for tenant analytics-team"
Apply the configuration:
kubectl apply -f alerting-rule.yaml
7. Create a Grafana Dashboard for Visualization
Install Grafana:
helm install grafana prometheus-community/grafana
Expose Grafana:
kubectl port-forward svc/grafana 3000:80
Access Grafana at http://localhost:3000 using the default login (admin/prom-operator).
👉 Import a Kubernetes dashboard and filter metrics by namespace (analytics-team-namespace) to isolate tenant data.
🚨 Troubleshooting Common Issues
1. No Data in Prometheus
- Ensure that the
ServiceMonitorselector matches the pod labels. - Check if Prometheus can reach the target endpoints:
kubectl logs -f prometheus-server-0
2. High Latency or Resource Exhaustion
- Adjust resource quotas using Capsule to prevent resource starvation:
spec:
resourceQuota:
hard:
cpu: "2"
memory: 4Gi
3. Network Connectivity Issues
- Check network policies applied to tenant namespaces:
kubectl describe networkpolicy
🏆 Best Practices for Monitoring Multi-Tenant Clusters
Define separate namespaces for tenants – This simplifies monitoring and resource management.
Use Prometheus ServiceMonitor for tenant-specific metrics – Avoid scraping the entire cluster for performance reasons.
Set up alerts for critical resource consumption – Notify early to prevent outages.
Visualize performance with Grafana – Dashboards simplify troubleshooting and capacity planning.
🌍 Real-World Use Case
A SaaS company hosts multiple customer environments on a shared Kubernetes cluster. After introducing Capsule and Prometheus:
- Each customer gets a dedicated tenant and namespace.
- Prometheus scrapes tenant-specific metrics.
- Alerts notify when CPU/memory usage exceeds limits.
- Grafana dashboards provide real-time insights into tenant performance.
This setup improves monitoring accuracy, simplifies troubleshooting, and ensures fair resource distribution among tenants.
🎯 Conclusion
Monitoring multi-tenant Kubernetes clusters is essential for performance, security, and scalability. Capsule streamlines tenant management, while Prometheus provides real-time insights and alerting. By combining the two, you can create a secure, scalable, and efficient Kubernetes environment that meets the demands of modern cloud-native applications.
That’s it for now. I hope this article gave you some useful insights on the topic. Please feel free to drop a comment, question or suggestion.