Monitoring your applications is a requirement in the cloud native world, but having access to useful metrics is often a difficult task. For some, the phase when metrics become most useful begins at the end of the development lifecycle, mostly during the day-2 operations (post-deployment). At that time, your operations teams can leverage the metrics to monitor your applications and to have a better understanding of what is happening.
data:image/s3,"s3://crabby-images/36071/360711521be818b792c8bb632b2b36b3b1a57739" alt=""
Yet development teams work at the very beginning of the development lifecycle. To have good and meaningful metrics, some iterations and feedback are required. In reality, developers care most about features, tests, and… logs traces! Developers dream of having good log traces because then they can properly debug their applications.
So, how can you improve this experience?
Turning application logs into useful metrics
Promtail is an agent which ships the contents of local logs to a log aggregator. The magic happens when, besides shipping logs, it can also parse them. That allows you to configure transformations which can be translated into meaningful metrics.
To have the picture, think of a log trace. For example:
You could parse the log traces which contains the expression [ERROR] and create a counter metric out of it to be displayed in your favorite monitoring system, such as Grafana and Prometheus.
A typical developer scenario for logs and metrics
One of our customers came with a question. They had configured OPA (Open Policy Agent) for their passthrough server as described in the documentation. Unfortunately, the OPA service was not running when they deployed the Gloo Edge resource (AuthConfig).
The event threw an error in the ExtAuth pod logs:
The customer was quite interested in having everything covered by metrics. But when they discovered the issue, there were no metrics implemented to catch this scenario. The process to fix this issue involved opening a ticket and waiting for the engineering team to work on it. This delay could be easily tackled with the technique you will see in the next section.
Using Gloo Edge to build metrics from logs
The goal of this workshop is to catch ExtAuth (one of the Gloo Edge components) log traces so that they can be converted into metrics and exposed by a Grafana dashboard. Let’s walk through a common scenario and solution.
Your architecture will look like this:
data:image/s3,"s3://crabby-images/326b8/326b81f0ef09c5933573b206a667beafaa91bc58" alt=""
Your Admin user is in charge of applying GlooEdge resources, while your Devops team is in charge of monitoring the infrastructure. You know that the error log is thrown when you apply an AuthConfig resource. In that resource, you specify the address to the OPA resource, which is not reachable.
Setup the environment
For this workshop you need to have:
- A Kubernetes cluster
- Helm
- A GlooEdge Enteprise license key
NOTE: Occasionally the button to “copy” might not work in some browsers. If so, please, select the text and copy.
Install Gloo Edge with a LICENSE_KEY:
First, let’s discover the error. Create an application (i.e. httpbin):
And expose it through Gloo Edge:
Create a file for the AuthConfig manifest so that you can then apply it and delete it easily:
Now, apply it:
Now since the service opa.opa.svc.cluster.local:9191 does not exist, an error is thrown in the ExtAuth pod.
Let’s verify it:
And you will see following error:
NOTE: If you do not see the error, wait a bit and try again. Propagating the events can take a bit of time.
Now, you need to catch this error with Promtail, parse it as metric, and let Prometheus scrape the metric.
Let’s delete first the AuthConfig which triggered the error, to have a clean scenario:
Deploy Prometheus and Grafana. Promtail depends on Prometheus given the ServiceMonitor resource. That is why the installation of Promtail must happen later on.
Finally, you need to install Promtail. Notice in the configuration the regex to catch the log traces you are targeting:
Once you have Grafana ready, access it through your browser:
You can use these credentials:
Create a dashboard with following specifications:
- Query:
data:image/s3,"s3://crabby-images/e8fc9/e8fc9c53d558a22fa36632b6df000a52036cbed9" alt=""
Panel configuration. The most relevant part is this:
data:image/s3,"s3://crabby-images/43610/4361063662906fe33691f601183c0aeb8d727a3c" alt=""
You now have all the integration ready.
Set the dashboard to show the last 5 mins and you should see the panel:
data:image/s3,"s3://crabby-images/aa6d7/aa6d7261a04be7076dde06819231843f2ba78a51" alt=""
Testing your configuration
Let’s verify that the dashboard shows an error once it occurs.
Apply the AuthConfig resource:
Port-forward to access Promtail’s exposed metrics:
And check the metrics in your browser: http://127.0.0.1:9080/metrics
:You should be able to see your new metric
data:image/s3,"s3://crabby-images/b496d/b496d4e521f6bd1abd30e642c0cdc9cda7a3a259" alt=""
If you go to your new Grafana dashboard you should see the error:
data:image/s3,"s3://crabby-images/3fbdb/3fbdb433d19334f6fd8ea5615f70e7b987a43388" alt=""
Once you deploy the Authz service, there will be no more errors in the logs. After the event is propagated, the dashboard turns to No Error.
The creation of the Alert to handle this Error is left to the reader.
Level up for more advanced metrics use cases
The case above is straight forward. A more complex scenario is to leverage Header Transformations and then to Enrich Access Logs. The goal here is to have that processed data in the logs. Once in there, you can create metrics out of them.
Examples of usage can be to perform Analytics to understand the usage of your system based on requests.
In an e-commerce system:
- How many times a day Product A (my-e-commerce.com/products/product-a) is more visited compared to Product B (my-e-commerce.com/products/product-b)
An analytics example with Istio and an eCommerce website will be published shortly in the official Grafana blog site. Stay tuned.
Thanks to Neeraj Poddar for his feedback!