Google native Logging & Observability

Google native Logging & Observability

In this article we will go through Google logging and monitoring tools hands-on to create a dashboard showing webserver traffic, availability, latency and saturation in five simple steps. As a pre-requisite to this exercise, we are assuming following infra-available in google cloud.

Virtual Machine Debian based.

Apache2 installed.

Create and deploy a custom index.html of your choice.


  1. Google ops agent installation.
  2. Customize Ops agent & Apache configuration.
  3. Create queries using logs explorer.
  4. Create a custom dashboard "Golden Signals" using metrics explorer.
  5. Review Dashboard.

Step 1: Ops agent installation: The Ops Agent is the primary agent for collecting telemetry from compute engine instances. Refer Installing the Ops Agent for official docs. Following two steps may need to installed in each of the addressable VMs.

curl -sSO https://dl.google.com/cloudagents/add-google-cloud-ops-agent-repo.sh
sudo bash add-google-cloud-ops-agent-repo.sh --also-install        

Step 2: Configure Ops Agent & Apache: By default, Ops Agent collects CPU, memory, disk usage and other set of metrics. For this exercise, we need to enable the agent to pick up Apache logs. Here is the config you need to include in /etc/google-ops-agent/config.yaml file.

For us to go through time taken for each web request (latency), we may need to enable a parameter in apache config. Here is the line you may need to modify in /etc/apache2/apache2.conf file.

LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\" \"%Ds\"" combined        

We may need to restart both Ops Agent and Apache web server to enable the changes.

systemctl restart google-ops-agent
systemctl restart apache2
systemctl status google-ops-agent
systemctl status apache2        

Step 3: Logs Explorer: This will help to retrieve, view and analyze log data. You can refer this page for official documentation on using Logs Explorer. We are going to create and save four different types of queries in this section.

a.) All http hits log: Click on queries to select active project, click on VM instances, notice queries getting added in queries pane. For obtaining all access logs, click on one of the http request logs and select the field logName which will add another criterion in query pane. Refer to below #1 for reference. Click on "create metric" and save as "gce_requests_total".

logs explorer view for generating first query (all http requests)

similarly create three more queries as mentioned below using logs explorer.

log query criteria #1 (all http requests)
resource.labels.project_id="boreal-analog-406006"
resource.type="gce_instance"
logName="projects/boreal-analog-406006/logs/apache_access"

log query criteria #2  (good requests)
resource.labels.project_id="boreal-analog-406006"
resource.type="gce_instance"
httpRequest.status=200
logName="projects/boreal-analog-406006/logs/apache_access"

log query criteria #3 (bad requests)
resource.labels.project_id="boreal-analog-406006"
resource.type="gce_instance"
-httpRequest.status=200
logName="projects/boreal-analog-406006/logs/apache_access"

log query criteria #4 (latency)
logName="projects/boreal-analog-406006/logs/apache_access"
resource.labels.project_id="boreal-analog-406006"
resource.type="gce_instance"
(Field Name: jsonPayload.gzip_ratio)        

Step 4: Golden Signals (custom dashboard): Now that we got all the queries created, its time to build the dashboard. This can you done using Metrics explorer option. We will create four views.

First view, line chart to be created with standard available metric which can created by select metric > vm instance > apache > workload/apache.traffic.

second view, chart to be created as web good requests. select metric > vm instance > log-based metrics > logging/user/gce_good_requests.

third view is called "availability" - will be a combination of good requests over total number of requests in the same chart. Use metrics explorer to add two queries in the same view - good requests & total requests. On the right pane, under threshold for y-axis, choose 90% as reference for this sample.

fourth view is for "latency" - choose gce_latency from log based metrics and include a threshold of 250 ms as reference.

Here is an example of metrics views showing all custom-built queries.

Step 5: Review Dashboard: We have created for views in our dashboard, showing traffic, availability, good vs error requests and latency. All these are useful for an Ops engineer to gain valuable insights into the health, performance and reliability of their systems. These metrics serve as early warning indicators, enabling teams to detect and mitigate issues before they impact users or business operations.

Hope this article helps to understand the basics of logging and monitoring using Google native tools.


Ref:

Quickstart: Collect logs from Apache with the Ops Agent ?|? Cloud Logging ?|? Google Cloud

https://cloud.google.com/stackdriver/docs/solutions/agents/ops-agent

Vaideesh Srinivasan

Student at SSN college of engineering

1 年

Thanks for sharing

要查看或添加评论,请登录

Srinivasan (Srini) Viswanathan的更多文章

社区洞察

其他会员也浏览了