How to Monitor Kubernetes K3s Using Telegraf and InfluxDB Cloud
Lightweight Kubernetes, known as K3s, is an installation of Kubernetes half the size in terms of memory footprint.
Do you need to monitor your nodes running K3s to know the status of your cluster? Do you also need to know how your pods perform, the resources they consume, as well as network traffic? In this article, I will show you how to monitor K3s with Telegraf and InfluxDB Cloud.
I run a blog and a few other resources on Kubernetes. Specifically, these run in a cluster of three nodes in DigitalOcean that run K3s, and I use Telegraf and InfluxDB to monitor everything.
I’m going to demonstrate how to monitor the cluster to make sure that everything is running as expected and how to identify something if it is not.
To monitor the cluster I use two components:
InfluxDB Cloud: It’s ideal to do monitoring from the outside because if we do it from the inside and the node goes down, then so does the monitoring solution, and that doesn’t make any sense. You can get a free InfluxDB account here: https://cloud2.influxdata.com/signup/
Next, we need to install a Helm chart from Telegraf, specifically this one, because it does not have Docker engine support, which if you run K3s, doesn’t need it.
Let’s do it…
The first thing we must do is create an account in InfluxDB Cloud. Next, we go to the Data section, click on Buckets, and then on Create Bucket.
Name the bucket and click on Create.
This is what our list of buckets should look like. After successfully creating the bucket, we create an access token to be able to write data to that bucket. To do that we go to the Tokens tab.
In this section, we click on Generate Token and choose the Read/Write Token option.
We specify a name, choose the bucket we want to associate with this token, and click on Save.
Once this is done, the new token appears in the token list.
To finish this part, we are going to need our Org ID and the URL to point our Telegraf to.
The Org ID is the email you used to sign up for InfluxDB Cloud. I get the URL from the address bar. In my case, when I set up my InfluxDB Cloud account, I chose the western United States. So my URL looks like this:
https://us-west-2-1.aws.cloud2.influxdata.com
Now that we configured InfluxDB Cloud, we need to configure the nodes.
As I mentioned above, we are going to use a Helm Chart. I modified this Helm Chart to adapt to K3s, because by default it tries to monitor Docker, which isn’t used in this Kubernetes distribution.
If you don’t have Helm installed, you can install it by running this command:
$ curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 |bash
Once installed, download the values.yaml file here.
You can also grab the raw file and download it directly to the master node with a wget
.
$ wget https://raw.githubusercontent.com/xe-nvdk/awesome-helm-charts/main/telegraf-ds-k3s/values.yaml
Now, we have to modify this file a bit. We need to open it and modify the Output section. By default the file looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
…
## Exposed telegraf configuration ## ref: https://docs.influxdata.com/telegraf/v1.13/administration/configuration/ config: # global_tags: # cluster: “mycluster” agent: interval: “10s” round_interval: true metric_batch_size: 1000 metric_buffer_limit: 10000 collection_jitter: “0s” flush_interval: “10s” flush_jitter: “0s” precision: “” debug: false quiet: false logfile: “” hostname: “$HOSTNAME” omit_hostname: false outputs: – influxdb: urls: – “http://influxdb.monitoring.svc:8086” database: “telegraf” retention_policy: “” timeout: “5s” username: “” password: “” user_agent: “telegraf” insecure_skip_verify: false monitor_self: false |
But since we are going to use InfluxDB Cloud, we must make some adjustments. The modified version will look something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
…
## Exposed telegraf configuration ## ref: https://docs.influxdata.com/telegraf/v1.13/administration/configuration/ config: # global_tags: # cluster: “mycluster” agent: interval: “1m” round_interval: true metric_batch_size: 1000 metric_buffer_limit: 10000 collection_jitter: “0s” flush_interval: “10s” flush_jitter: “0s” precision: “” debug: false quiet: false logfile: “” hostname: “$HOSTNAME” omit_hostname: false outputs: – influxdb_v2: urls: – “https://us-west-2-1.aws.cloud2.influxdata.com” bucket: “kubernetes” organization: “[email protected]” token: “WIX6Fy-v10zUIag_dslfjasfljadsflasdfjasdlñjfasdlkñfj==” timeout: “5s” insecure_skip_verify: false monitor_self: false |
If we need to adjust other values, like the collection interval, you can do it by changing the interval value. For example, I don’t need the data every 10 seconds, so I changed it to 1 minute.
Now we come to the moment of truth! We are going to install the Helm Chart and see if everything works as expected. Depending on your K3s configuration, you might need to pass the cluster configuration as a KUBECONFIG
environment variable.
$ exportKUBECONFIG=/etc/rancher/k3s/k3s.yaml
Once that’s done, we’re going to add the Awesome-Helm-Charts repo.
$ helm repo add awesome-helm-charts https://xe-nvdk.github.io/awesome-helm-charts/
Then we update the content of the repos that we configured.
$ helm repo update
Finally, we’ll install the repo, passing it the configuration we just modified in the values.yaml
file.
$ helm upgrade --install telegraf-ds-k3s -f values.yaml awesome-helm-charts/telegraf-ds-k3s
The terminal should return something similar to this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
Release “telegraf-ds-k3s” does not exist. Installing it now. NAME: telegraf–ds–k3s LAST DEPLOYED: Fri Jun 25 22:47:22 2021 NAMESPACE: default STATUS: deployed REVISION: 1 TEST SUITE: None NOTES: To open a shell session in the container running Telegraf run the following:
– kubectl exec –i –t —namespace default $(kubectl get pods —namespace default –l app.kubernetes.io/name=telegraf–ds –o jsonpath=‘{.items[0].metadata.name}’) /bin/sh
To tail the logs for a Telegraf pod in the Daemonset run the following:
– kubectl logs –f —namespace default $(kubectl get pods —namespace default –l app.kubernetes.io/name=telegraf–ds –o jsonpath=‘{ .items[0].metadata.name }’)
To list the running Telegraf instances run the following:
– kubectl get pods —namespace default –l app.kubernetes.io/name=telegraf–ds –w |
This output shows that the Helm chart deployed successfully. Keep in mind that this is a DaemonSet, which automatically installs the Helm Chart on each of the nodes in this cluster.
To check that everything is running properly use the following command:
$ kubectl get pods
We see that our pod is alive and kicking.
AME READY STATUS RESTARTS AGE telegraf–ds–k3s–w8qhc 1/1 Running 0 2m29s |
If you want to make sure that the log is working as expected, then run:
$ kubectl logs -f telegraf-ds-k3s-w8qhc
The terminal should output something like this:
2021–06–26T02:55:22Z I! Starting Telegraf 1.18.3 2021–06–26T02:55:22Z I! Using config file: /etc/telegraf/telegraf.conf 2021–06–26T02:55:22Z I! Loaded inputs: cpu disk diskio kernel kubernetes mem net processes swap system 2021–06–26T02:55:22Z I! Loaded aggregators: 2021–06–26T02:55:22Z I! Loaded processors: 2021–06–26T02:55:22Z I! Loaded outputs: influxdb_v2 2021–06–26T02:55:22Z I! Tags enabled: host=k3s–master 2021–06–26T02:55:22Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:“k3s-master”, Flush Interval:10s |
Everything seems fine, but now comes the moment of truth. We go to our InfluxDB Cloud account, navigate to the Explore section, and we should see some measurements and, of course, some data when selecting the bucket.
As you can see, this process isn’t as complicated as it might seem. The Helm chart simplifies our lives and from now on we can see what is happening with our cluster using an external system.