Network monitoring #2: Logging CPU temps with InfluxDB and Grafana

My previous method of logging the CPU temperatures of machines on the network involved a Bash script sending out a message via MQTT. This was intercepted and logged by a Python script running on a server, which wrote the data to text files in a place where the intranet server could see them.

Phew! It’s a miracle it worked. But while it certainly did work, all I got was a figure on the intranet web page for the most recent reading. There was no historical data. Nothing that could show me fluctuations over time.

Having procrastinated for ages, I finally decided I’d get to grips with InfluxDB and Grafana as a way of capturing and displaying this data in a more thorough and meaingful way. I thought it would be complicated.

It wasn’t.

Time series

For those not familiar with it, InfluxDB is a ‘time series’ database. It’s specifically designed for storing regular data points and so is ideal for IoT and network monitoring applications. For one thing, if the data you supply does not include a field called ‘time’, InfluxDB will automatically append a timestamp field to each record. And writing to the database couldn’t be easier, with no mucking about with SQL queries and whatnot.

Grafana is a web server that can take data from various sources, including an InfluxDB server, and present it as various charts, graphs and widgets.

This is not going to be a tutorial on InfluxDB or Grafana. You can find lots of those online. And, to be honest, there wasn’t a lot to it. I spun up the InfluxDB server, plus Telegraf and Chronograf, on one of the Raspberry Pi machines. I did this, as is my wont these days, using Docker. If you want, I’ll make another post with my docker-compose.yaml file.

The Grafana server is running, also in Docker, on the same machine. I learned Grafana by prodding it with the mouse cursor to see what happened. Within minutes, I had lovely graphs.

Feeding the machine

What we’re going to concentrate on here is how I fed data about CPU temperatures to the InfluxDB server.

The main thing you need to know about InfluxDB is that, providing you can authenticate with it (ie, you’ve set up user accounts), and provided what you send it in the right format, the database will just log what you throw at it. You don’t need to mess around building tables and columns and schemas first.

One note: I’m using InfluxDB version 1.8 on the Raspberry Pi. I’d prefer to use 2.x, but it doesn’t seem there’s an armhf version of this, at least not through Docker, that will run on the 32-bit Raspberry Pi OS. To run 2.x on the Raspberry Pi, you need a 64-bit operating system. I’m considering upgrading the machine I use for this to Ubuntu Server 64-bit. But that’s for another day.

On the client side

My weapon of choice for sending the data is Python. And so I needed the influxdb Python library on each machine, which was simple enough:

pip3 install influxdb
# Or, if you prefer...
python3 -m pip install influxdb

But before we start building the Python script, let’s discuss how we get take the temperature of each machine.

The Linux cat

Getting the CPU temperature on most Linux boxes is as simple as displaying the contents of the ‘file’ /sys/class/thermal/thermal_zone0/temp, using cat.

cat /sys/class/thermal/thermal_zone0/temp

To do this in Python, we just need to execute this command using Python’s subprocess.run() method.

cmd = 'cat /sys/class/thermal/thermal_zone0/temp'
output = subprocess.run(cmd, shell=True, capture_output=True)

If the output from this command has a return code of 0, then it all went swimmingly. We can read the temperature (as a string) using output.stdout.decode(‘utf-8’). For our purposes, we’ll convert this to a float.

stdout = float(output.stdout.decode('utf-8'))

The number we get is a thousand times too big. For a result in degrees Celsius, we divide by 1,000.

Storing the data

So now we need to send this number to the InfluxDB server. On my network, the server lives at 10.0.30.50, and 8086 is the standard port for InfluxDB. I’m using a database (called a ‘bucket’ on V2.x) I’ve called ‘iot’. And I’ve set up a user on the InfluxDB called ‘youruser’ with a password ‘yourpassword’.

We instantiate an InfluxDB client thus:

InfluxServer = '10.0.30.50'
InfluxPort = 8086
dbBucket = 'iot'
dbUser = 'youruser'
dbPass = 'yourpassword'
influxClient = InfluxDBClient(InfluxServer, InfluxPort, dbUser, dbPass, dbBucket)

Having got our temperature reading, we build a dictionary with the data we want to send. In InfluxDB, a ‘measurement’ is (roughly speaking) what you’d think of as a table in a SQL database. In our case, we’re calling this ‘cpu_temp’. Note that I haven’t set this up beforehand. We can just make up something and send it.

Next we have what are known as ‘tags’. This is metadata you can use to group or select data in the database. For example, in this case we will use the hostname of the computer to identify a group of data plots to plot on a graph. That way, we can create a line on the graph for each server. You can have as many tags as you want.

Finally, we have ‘fields’ that contain the actual data. Again, you can have as many as you want. We only have one here – the CPU temperature.

The dictionary we create is then put in a list (‘json_payload’) to send to the server:

json_payload = []
data = {
    "measurement": "cpu_temp", 
    "tags": {
        "server": serverName
    },
    "fields": {
        "temp": temp
    }
}
json_payload.append(data)

And we send this with:

result = influxClient.write_points(json_payload) # returns True or False

Putting it all together

Here’s the complete Python code. I use a cron job to run this every five minutes.

#!/usr/bin/env python3

# CPU temperature logger
# Reports temp to InfluxDB server
# Linux version

import os
import socket
import subprocess
from influxdb import InfluxDBClient

serverName = socket.gethostname().split('.')[0]

InfluxServer = '10.0.30.50'
InfluxPort = 8086
dbBucket = 'iot'
dbUser = 'youruser'
dbPass = 'yourpassword'
influxClient = InfluxDBClient(InfluxServer, InfluxPort, dbUser, dbPass, dbBucket)

cmd = 'cat /sys/class/thermal/thermal_zone0/temp'
output = subprocess.run(cmd, shell=True, capture_output=True)
stderr = output.stderr.decode('utf-8')
if stderr:
    pass # you might want to handle this
if output.returncode == 0:
    stdout = float(output.stdout.decode('utf-8'))
    temp = round(stdout/1000, 1)

    # send to InfluxDB
    json_payload = []
    data = {
        "measurement": "cpu_temp",
        "tags": {
            "server": serverName
        },
        "fields": {
            "temp": temp
        }
    }
    json_payload.append(data)
    try:
        result = influxClient.write_points(json_payload) # returns True or False
    except Exception as e:
        pass # handle this exception your way

This code is pretty basic. I haven’t specified how to handle errors and exceptions. And we currently don’t do anything with the result (true or false) of writing to the InfluxDB database. But it’s enough to be getting on with.

And now for the Mac

That code above works fine on Linux boxes, not so much on the Mac. Things were a little more complex there.

You can’t just cat a dev file on macOS to get the CPU temperature. You need to invoke powermetrics with the command:

sudo powermetrics --samplers smc -i1 -n1 | grep "CPU die"

Which is all very well, but note the presence of ‘sudo’ there. We have two options here – run the program as root, or somehow allow the normal user to invoke sudo with the powermetrics command without having to enter a password. I chose the latter.

To do this, I needed to edit the sudoers file using the utility visudo – which, not surprisingly, needs root privileges.

sudo visudo

And I added this line:

steve ALL = (root) NOPASSWD: /usr/bin/powermetrics

Now, as user ‘steve’, I can issue the ‘sudo powermetrics’ without having to enter a password. In doing this, I am demonstrating a great deal of trust in myself. I hope I can live up to it.

With this done. the macOS version of the command to be issued requires a change to line 20.

cmd = 'sudo powermetrics --samplers smc -i1 -n1 | grep "CPU die"'

What we get back isn’t just a number – it’s a line of output looking something like:

CPU die temperature: 42.81 C

We need to get just the part we need. And yeah, I know we could use awk in the original command, but I decided to grab the relevant bit later, in Python. So lines 25-27 become:

if output.returncode == 0:
    tempstr = output.stdout.decode('utf-8').split(' ')[-2]
    temp = float(tempstr)

I guess I should create a ‘universal’ version of the code, simply testing to see what OS we’re on in order to determine the command to be executed.

Graphing the result

Turning the logged data into something meaningful is easy with Grafana.

Doesn’t this look pretty?

Like I said, this isn’t a Grafana tutorial, but basically…

I selected the InfluxDB server as a data source. I added a panel. And then I specified a query for each server. Here’s what one of the queries looks like:

You may recongise ‘cpu_temp’ as the measurement defined in our code, ‘server’ as the tag and ‘ada’ is the value for that tag. And you may also notice ‘field(temp)’ – we gave the name ‘temp’ to our one-and-only field in the code.

I also created a gauge for each server showing the maximum CPU temperature in the past 24 hours. I defined thresholds to colour code them, with the bar going from green to yellow at 60C, and then to red at 70C.

The results have already been interesting. For one thing, I can see that most of the CPU temperatures remain fairly steady. The exception is the iMac (Zola), which has hit 92C!

It’s also interesting to analyse the four Raspberry Pi 4B machines in the same, fan-cooled rack. These split into two pairs – Ada & Peach, and Polar & Lulu – with the first pair being slightly hotter than the other. Why?

Well, in addition to the fan built into the rack, the cooler machines – Polar and Lulu – also have fans in the heatsinks mounted on their CPUs.

[UPDATE] I just realised that I provided Python programs and the methods for obtaining CPU temps for both Linux and macOS. But I completely forgot about Windows.

Well, fancy that.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.