Monitoring infrastructure is crucial for maintaining performance, reliability, and security in any system. While predefined metrics are essential, custom metrics allow for tailored monitoring that fits the specific needs of your applications and infrastructure. Telegraf, an open-source agent from InfluxData, is a powerful tool for collecting, processing, and writing metrics. This blog will cover best practices and examples for collecting custom metrics with Telegraf.
Introduction to Telegraf

Telegraf is an extensible, plugin-based agent that supports various input and output plugins. This flexibility makes it suitable for a wide range of use cases, including collecting custom metrics from applications, services, and infrastructure components.
Best Practices for Custom Metrics Collection
1. Identify Key Metrics
Before you start collecting custom metrics, identify the key metrics that are critical for your application’s performance and health. These might include application-specific metrics, business metrics, or infrastructure metrics not covered by default Telegraf plugins.
2. Standardize Metric Naming
Adopt a consistent naming convention for your custom metrics. A standardized naming convention makes it easier to manage and interpret metrics, especially when dealing with a large number of them. Consider including details such as the application name, component, and metric type in the metric name.
3. Use Tags for Dimensions
Telegraf supports tagging metrics, which allows you to add dimensions to your data. Tags are useful for filtering and grouping metrics. Use tags to include metadata such as hostnames, environments (e.g., production, staging), or regions.
4. Optimize Collection Frequency
Choose an appropriate collection interval for your custom metrics. While high-frequency collection provides more granular data, it can also lead to increased resource usage and storage costs. Balance the need for detail with the overhead of data collection.
5. Minimize Performance Overhead
Ensure that the process of collecting custom metrics does not adversely affect the performance of your applications. Collect metrics asynchronously and avoid heavy computations or blocking operations during metric collection.
6. Leverage Existing Plugins
Telegraf comes with a wide range of input plugins that can be customized to collect specific metrics. Before creating a custom input plugin, check if existing plugins can be extended or configured to meet your needs.
Examples of Custom Metrics Collection
Example 1: Collecting Application Metrics
Suppose you have a web application, and you want to monitor custom metrics such as the number of active users and the response time of a specific API endpoint.
1. Expose Metrics via HTTP Endpoint
Modify your application to expose metrics via an HTTP endpoint. For example, in a Python Flask application, you can use the prometheus_flask_exporter library to expose metrics.
from flask import Flask
from prometheus_flask_exporter import PrometheusMetrics
app = Flask(__name__)
metrics = PrometheusMetrics(app)
@app.route('/api/endpoint')
def api_endpoint():
# Your logic here
return 'Hello, World!'
@app.route('/metrics')
def metrics_endpoint():
# Custom metrics
return metrics.generate_latest()
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
2. Configure Telegraf HTTP Input Plugin
Configure Telegraf to collect metrics from the HTTP endpoint exposed by your application.
[[inputs.http]]
urls = ["http://localhost:5000/metrics"]
method = "GET"
data_format = "prometheus"
name_override = "custom_application_metrics"
Example 2: Collecting Business Metrics
Suppose you want to monitor business metrics such as the number of transactions processed and the total revenue generated.
1. Collect Metrics in Application
Modify your application to collect business metrics. For example, in a Python application, you can use the statsd library to collect and send metrics.
import statsd
# Initialize StatsD client
client = statsd.StatsClient('localhost', 8125)
# Increment the transaction count
client.incr('transactions.count')
# Record the revenue
client.gauge('transactions.revenue', 100.50)
2. Configure Telegraf StatsD Input Plugin
Configure Telegraf to collect metrics from StatsD.
[[inputs.statsd]]
service_address = ":8125"
delete_gauges = false
delete_counters = false
delete_sets = false
delete_timings = false
percentiles = [90]
metric_separator = "_"
datadog_extensions = true
allowed_pending_messages = 10000
percentile_limit = 1000
Example 3: Collecting System Metrics
Suppose you want to collect custom system metrics, such as the number of running processes and the disk I/O operations.
- Create a Custom Script
Create a custom script to collect system metrics. For example, a Bash script to collect the number of running processes and disk I/O operations.
#!/bin/bash
# Get the number of running processes
process_count=$(ps aux | wc -l)
# Get the disk I/O operations
disk_io=$(iostat -d | awk 'NR==4 {print $1" "$2}')
# Output the metrics in Telegraf's exec format
echo "custom_process_count value=$process_count"
echo "custom_disk_io value=$disk_io"
2. Configure Telegraf Exec Input Plugin
Configure Telegraf to execute the custom script and collect metrics.
[[inputs.exec]]
commands = ["/path/to/custom_metrics.sh"]
data_format = "influx"
interval = "60s"
Example 4: Monitoring Database Performance
Suppose you have a PostgreSQL database, and you want to monitor custom metrics such as the number of active connections and the query execution time.
1. Query Metrics from Database
Create a SQL query to retrieve custom metrics from the PostgreSQL database.
SELECT
numbackends AS active_connections,
blks_read AS disk_blocks_read,
tup_inserted AS tuples_inserted
FROM
pg_stat_database
WHERE
datname = 'your_database';
2. Configure Telegraf PostgreSQL Input Plugin
Configure Telegraf to collect metrics from PostgreSQL.
[[inputs.postgresql]]
address = "host=localhost user=postgres password=your_password dbname=your_database sslmode=disable"
query = "SELECT numbackends AS active_connections, blks_read AS disk_blocks_read, tup_inserted AS tuples_inserted FROM pg_stat_database WHERE datname = 'your_database';"
tag_keys = ["datname"]
Conclusion
Custom metrics collection with Telegraf enables you to monitor the specific aspects of your applications and infrastructure that matter most to you. By following best practices such as identifying key metrics, standardizing metric naming, using tags for dimensions, optimizing collection frequency, and minimizing performance overhead, you can effectively gather and utilize custom metrics.
Telegraf’s flexibility and extensive plugin ecosystem make it an ideal choice for collecting custom metrics from various sources. Whether you are monitoring application performance, business metrics, system metrics, or database performance, Telegraf provides the tools you need to build a robust and scalable monitoring solution tailored to your unique requirements.
I hope this gave you some useful insights. Please feel free to drop any comments, questions or suggestions. Thank You !!!