Hello Readers!! This blog will show how to monitor compute infrastructure with the cloud ops agent. As the complexity of cloud environments grows, it becomes increasingly important to have robust monitoring tools to ensure the infrastructure’s stability, security, and optimal performance. One such powerful tool is the Cloud Ops Agent, which offers comprehensive monitoring capabilities, providing valuable insights into the health and efficiency of your cloud-based compute resources.
Cloud Ops Agent:
The Cloud Ops Agent is a specialized monitoring tool that works seamlessly within cloud environments. It acts as an intermediary between your cloud resources and the monitoring system, collecting data and metrics from various sources, and then relaying that information to the monitoring platform. By integrating the Cloud Ops Agent into your cloud infrastructure, you gain visibility into critical aspects of your resources, such as CPU and memory utilization, network traffic, disk performance, and much more.
Following are the metrics that we can visualize for the computing infrastructure:
- CPU Utilisation
- Memory Utilisation
- Network Traffic
- Disk Space Utilisation
- Disk throughput
- Disk IOPS
- Logs by top severity
- New connections with VMs/External/Google
Steps to Monitor Compute Infrastructure with the Cloud Ops Agent
Step 1: Enabling VM Manager on the Google Cloud Platform
The VM Manager offered by Google Cloud is a service that enables you to efficiently install, update, and manage software across a large number of VM instances, ensuring seamless scalability. Follow these steps to enable VM Manager:
- Navigate to the Google Cloud Console: https://console.cloud.google.com/
- Select your Project: Choose the project where you want to set up.
- Go to Compute Engine: Click on “Compute Engine” from the left-hand side menu.
- Enable VM Manager: In the Compute Engine dashboard, click on “VM Manager” and then “OS Configuration Management”. Click on “Enable VM Manager” to enable VM Manager for the project.
- Click on Confirm.
Step 2: Installing Ops Agent on a Fleet of VMs
The Ops Agent serves as a data collection and transmission tool, responsible for gathering monitoring data and sending it to Google Cloud Monitoring. By utilizing the Ops Agent, we can efficiently collect system and service metrics from our active virtual machines (VMs) and easily visualize this information in the Google Cloud Console. So to streamline the process, we will employ an agent policy, enabling us to create an ops agent policy for our Google Cloud project. This Policy will govern both existing and new VMs associated with the project, ensuring the correct installation of agents and providing the option for auto-upgrades across all agents with just one simple command.
To install Ops Agent on a fleet of VMs, follow these steps:
- Move to Google Cloud console. Access the cloud shell by clicking on the top right corner.
- Run the following command in the cloud shell for creating an ops-agent policy for VMs.
$ gcloud beta compute instances \
ops-agents policies create ops-agents-policy-safe-rollout \
--agent-rules="type=ops-agent,version=current-major,package-state=installed,enable-autoupgrade=true" \
--os-types=short-name=debian,version=10 \
- So its created successfully. For verification, list the ops agent policies by using following command:
$ gcloud beta compute instances ops-agents policies list
- To Check the status of the ops agent in a particular VM, SSH in that VM and run the following command:
$ sudo systemctl status google-cloud-ops-agent"*"
- We will also see the ops agent installed on VMs from here.
This configuration is for the Google Cloud project that will govern existing and new VMs associated with that Google Cloud project. So, there is no need to again install it for upcoming VMs.
Monitoring Computing Infrastructure Performance Reports
With VM Manager, Ops Agent, and Apigee Performance Monitoring configured, we can now access computing infrastructure performance reports:
- Navigate to Google Cloud Console: Go to the Google Cloud Console: https://console.cloud.google.com/
- Select Monitoring: Click on “Monitoring” from the left-hand side menu.
- Explore Metrics: So In the Monitoring Dashboard, move to VM instances and select the VM, explore the metrics collected from the VMs running APIGEE through Ops Agent. You will get as:
- Following are the metrics that we can visualize here:
- Download Reports: We can also download results of metrics as PNG and CSV.
Conclusion
This blog provided a detailed implementation guide for how to monitor compute infrastructure with the cloud ops agent. By enabling performance monitoring and leveraging Google Cloud Monitoring, we can proactively monitor and optimize our compute infrastructure for enhanced performance and reliability. Regularly reviewing these reports will help in identifying potential bottlenecks and improving the overall efficiency of our deployment. So, If this blog helped you somewhere, please like and share this blog with the needful.