NashTech Blog

gRPC Server Monitoring in Python with OpenTelemetry

Table of Contents

Monitoring a gRPC server’s performance and health is essential for any high-performance application. In this blog, we will dive into setting up OpenTelemetry for a gRPC server in Python to capture traces and metrics. We’ll cover the steps for implementing instrumentation using OpenTelemetry and gRPC-specific libraries, list essential metrics and traces, and demonstrate how to visualize this data on a Grafana dashboard.

Prerequisites

You’ll need the following tools installed to follow along:

  1. Python (version 3.10 or higher recommended)
  2. OpenTelemetry Libraries for Python
  3. Jaeger (for distributed tracing)
  4. Prometheus (for metrics)
  5. Grafana (for building the dashboard)

1. Setting Up the Basic gRPC Server

In this guide, we’ll build a simple gRPC server that responds with a greeting message. Our example will use a service defined in a Protobuf file (hello.proto) and implement the server using gRPC and OpenTelemetry libraries.

Step 1: Define the gRPC Service in Protobuf

Create a .proto file to define your gRPC service and message structure:

// hello.proto
syntax = "proto3";

service Greeter {
  rpc SayHello (HelloRequest) returns (HelloReply);
}

message HelloRequest {
  string name = 1;
}

message HelloReply {
  string message = 1;
}

Step 2: Compile the Protobuf File

Generate the Python classes from the Protobuf file by running:

bashCopy codepython -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. hello.proto

This will create hello_pb2.py and hello_pb2_grpc.py, which we’ll use in the server code.

2. Setting Up OpenTelemetry for gRPC Instrumentation

Now, let’s integrate OpenTelemetry to capture traces and metrics on our gRPC server. OpenTelemetry’s gRPC instrumentation library makes it straightforward to collect tracing and metric data for each RPC.

Step 1: Install OpenTelemetry Libraries

To get started, install the required OpenTelemetry and gRPC libraries:

bashCopy codepip install opentelemetry-api opentelemetry-sdk opentelemetry-instrumentation-grpc opentelemetry-exporter-jaeger opentelemetry-exporter-prometheus

3. Implementing the gRPC Server with OpenTelemetry Tracing

Tracing allows us to understand how requests flow through the system and helps troubleshoot latency or bottlenecks in the application. We’ll use Jaeger to export these traces.

Instrument the gRPC Server

Now we’ll modify the gRPC server to incorporate OpenTelemetry’s gRPC instrumentation.

# server.py
import grpc
import logging
import os
from concurrent import futures
from tracing_setup import setup_tracing
from dotenv import load_dotenv
from opentelemetry import trace, metrics
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.instrumentation.system_metrics import SystemMetricsInstrumentor
from opentelemetry.exporter.prometheus import PrometheusMetricsExporter
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from prometheus_client import start_http_server
from opentelemetry.instrumentation.grpc import GrpcInstrumentorServer

import hello_pb2_grpc
import hello_pb2

# Load environment variables
load_dotenv()
jaeger_host = os.getenv("JAEGER_HOST", "localhost")
jaeger_port = int(os.getenv("JAEGER_PORT", 6831))
prometheus_port = int(os.getenv("PROMETHEUS_PORT", 8000))

# Set up tracing and metrics
def setup_tracing_and_metrics():
    resource = Resource.create({SERVICE_NAME: "greeter_service"})

    # Jaeger exporter
    tracer_provider = TracerProvider(resource=resource)
    jaeger_exporter = JaegerExporter(
        agent_host_name=jaeger_host,
        agent_port=jaeger_port
    )
    tracer_provider.add_span_processor(BatchSpanProcessor(jaeger_exporter))
    trace.set_tracer_provider(tracer_provider)
    tracer = trace.get_tracer(__name__)

    # Prometheus exporter
    prometheus_exporter = PrometheusMetricsExporter()
    metrics_reader = PeriodicExportingMetricReader(prometheus_exporter)
    meter_provider = MeterProvider(resource=resource, metric_readers=[metrics_reader])
    metrics.set_meter_provider(meter_provider)

    # Start Prometheus HTTP server
    start_http_server(prometheus_port)

    # Configure and instrument system metrics
    configuration = {
        "system.memory.usage": ["used", "free", "cached"],
        "system.cpu.time": ["idle", "user", "system", "irq"],
        "system.network.io": ["transmit", "receive"],
        "process.runtime.memory": ["rss", "vms"],
        "process.runtime.cpu.time": ["user", "system"],
        "process.runtime.context_switches": ["involuntary", "voluntary"],
    }
    SystemMetricsInstrumentor(config=configuration).instrument()

    return tracer

# Instrument the gRPC server
grpc_server_instrumentor = GrpcInstrumentorServer()
grpc_server_instrumentor.instrument()

class Greeter(hello_pb2_grpc.GreeterServicer):
    def SayHello(self, request, context):
        with tracer.start_as_current_span("SayHello"):
            return hello_pb2.HelloReply(message=f"Hello, {request.name}!")

def serve():
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    hello_pb2_grpc.add_GreeterServicer_to_server(Greeter(), server)
    server.add_insecure_port("[::]:50051")
    server.start()
    server.wait_for_termination()

if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO)

    # Initialize tracing and metrics
    tracer = setup_tracing_and_metrics()

    serve()
4. Customize Metrics (Optional)

You can customize which metrics to capture by providing a configuration dictionary. For example:

pythonCopy codeconfiguration = {
"system.cpu.utilization": ["idle", "user", "system"],
"system.memory.usage": ["used", "free"],
"system.disk.io": ["read", "write"],
"process.runtime.cpu.utilization": None,
"process.runtime.gc_count": None,
}

SystemMetricsInstrumentor(config=configuration).instrument()

Key Metrics and Traces for gRPC Server Monitoring

Here are essential metrics and traces to help you fully understand the performance of a gRPC server:

  1. Total Requests : Track the total number of gRPC calls received by each method.
  2. Request Latency : Measure the time taken to handle requests.
  3. Error Count: Track errors within the gRPC server to detect anomalies.
  4. Active Traces: Capture active traces per request to understand request flows and debug issues.
  5. CPU Usage : Monitor CPU consumption per request to detect resource bottlenecks.
  6. Memory Usage Measure memory consumed per request to optimize memory management.
  7. Response Size: Record the size of responses sent to clients for traffic insights.
  8. Request Processing Time: Track the average time to process requests and identify latency sources.
  9. Request Rate: Measure the rate of incoming requests over time for load monitoring.
  10. Concurrency: Monitor the number of concurrent requests being processed to manage resource limits.

References: https://opentelemetry-python-contrib.readthedocs.io/

Conclusion

By setting up OpenTelemetry for a gRPC server in Python, you can build robust observability for your service. This guide shows how to capture crucial metrics and traces, and visualize them in Grafana, helping you monitor and troubleshoot the gRPC server in real time. With this setup, you’re well-equipped to optimize your server’s performance and reliability.

Picture of aayushsrivastava11

aayushsrivastava11

Leave a Comment

Your email address will not be published. Required fields are marked *

Suggested Article

Scroll to Top