Monitoring disk I/O using Zabbix

http://www.denniskanbier.nl/blog/monitoring/monitoring-disk-io-using-zabbix/

How to set up disk monitoring on Linux in Zabbix. The standard Linux template in Zabbix provides monitoring on the filling of your disks, but not too much about real utilisation. For example, it doesn’t tell you how many writes per second are being handled by a disk or partition.

However, this kind of information can be vital for the health of your servers. Disks are almost always a bottleneck, so I like to keep an eye on them.

The inspiration for this blogpost and some code is from Renaldo Maclons’ weblog.

Goal:

Monitor the utilisation of my disk devices in Linux servers. Being lazy as a sysadmin is always a good thing, so I’m going to implement low-level discovery of my disk devices and create a template to go with it.

We’ll be configuring the Zabbix agent which is running on my Linux boxes to support the low-level discovery of my disk devices, and the items we need to monitor on the host.

For the quick-starters:

1. Download the template.
2. Import the template in your Zabbix instance
3. Enable the low-level discovery script on the Zabbix agents
4. Enrich the Zabbix agent with these UserParameters
5. Add the template to a host and test

For the rest of you, please read on to understand how the template is made and what the scripts actually do.

Environment:

  • Zabbix Server/Agent version: 2.2.1
  • Linux version: RHEL5/6
  • Custom Zabbix scripts path: /opt/zabbix/
  • Custom UserParameter files: /etc/zabbix/zabbix_agentd.d/

Enabling auto-discovery of the devices:

I’ve created a small and simple Perl script to do the low-level discovery of the devices in a server. It can be found onGitHub.  On the linux servers where the Zabbix agent is running we need to download the script and tell the agent to use it:

 $ mkdir -p /opt/zabbix/linux
 $ cd /opt/zabbix/linux
 $ wget https://raw2.github.com/dkanbier/zabbix-linux/master/LLD/queryDisks.pl
 $ chmod +x queryDisks.pl
 $ chown -R zabbix:zabbix /opt/zabbix

The script is now in place. You can run it if you like, in short it will look in /proc/diskstats and filter out the device names. Then it prints them to the screen in JSON format, because Zabbix likes to talk in JSON:

./queryDisks.pl
{
 "data":[
  { "{#DISK}":"ram0" },
  { "{#DISK}":"ram1" },
  { "{#DISK}":"sda" },
  { "{#DISK}":"sda1" }
 ]
}

I shortened the output a bit to keep it readable, but you get the idea. We have a {#DISK} key with a value of the device name. Don’t worry about the fact that you probably see devices you don’t want to monitor, we will handle that soon but not here.

Let’s enable the Zabbix agent to support the auto discovery of disks. Open the zabbix_agentd.conf file:

$ vi /etc/zabbix/zabbix_agentd.conf

And add the following line to it:

UserParameter=custom.vfs.dev.discovery,/opt/zabbix/linux/queryDisks.pl

After editing the configuration we need to restart the agent:

$ sudo service zabbix-agent restart

Done!

The agent is now capable of discovering the disk devices. The first part (custom.vfs.dev.discovery) is the key that Zabbix can use for the discovery, the second part is the implementation of that key on the Linux server. Let’s notify the Zabbix Server about this by creating a new template and use the discovery rule.

Configure Zabbix to use the low-level discovery key custom.vfs.dev.discovery:

Start by creating a new template. I’ll be naming mine “Template Linux Disk IO”. Open the Zabbix web interface and go to Configuration -> Templates and click “Create template”.

Fill in a name and add the new template to the group “Templates”. Then click save.

Screen Shot 2014-01-31 at 15.52.07

Now you’re back in the Templates overview. In the row of the template we just created, click “Discovery” and then “Create discovery rule”.

Fill in:

Name: Disk device discovery
Key: custom.vfs.dev.discovery
Description: Discovery of disk devices on linux.

Screen Shot 2014-01-31 at 15.56.29

 

Now we still need to limit the types of devices we want to auto-discover. Remember the list that was presented when you ran the queryDisks.pl script? What if you only want to add the “sd” devices for example?

This is where filters and regexp rules come into play. I’ve written a post about how to use them, so I won’t cover it again in detail. I’ll just create a regexp that only matches “sd” devices and configure the discovery rule to use it:

Screen Shot 2014-01-31 at 16.01.50Screen Shot 2014-01-31 at 16.02.32

 

Auto discovering of “sd” devices is now operational.

Please be aware that you have to enable the custom.vfs.dev.discovery key on every host you want to use it on by placing the queryDisks.pl script and modify the Zabbix agent configuration.

From discovery to items in Zabbix:

Now what good are discovery rules if they don’t make any items. Sure we can tell Zabbix we have a sda, sda1 and sdb device, but what is the point if we don’t monitor anything with it.

The file /proc/diskstats contains valuable information regarding your disk devices. The official documentation can be found here.

I’ve placed a UserParameter file containing logic for extracting this information on GitHub. If you place this file in the “Include” directory of the Zabbix agent on the hosts you want to monitor your disks on it will enable monitoring of this information using custom keys.

On the client linux boxes:

$ wget https://raw2.github.com/dkanbier/zabbix-linux/master/UserParameters/userparameter_linux_disks.conf
$ sudo mv userparameter_linux_disks.conf /etc/zabbix/zabbix_agentd.d/
$ sudo service zabbix-agent restart

Now the agent is capable of extracting data from /proc/diskstats and report it to Zabbix using the keys defined in  userparameter_linux_disks.conf.

Let’s create item prototypes so the items for this data are automatically created when low-level discovery is started. We need to create an item prototype for every custom key we’ve added using the user parameter_linux_disks.conf file ( 8 in total). I’ll show the process of creating 1, the rest is available by downloading the pre-configured custom template.

In the Zabbix interface click “Configuration -> Templates. In the row of our new template “Template Linux Disk IO” click Discovery ->  Item prototypes -> Create item prototype.

Fill in:

Name: Disk:{#DISK}:reads completed    <-- {#DISK} will be filled with a value by LLD
Key: custom.vfs.dev.read.ops[{#DISK}] <-- custom.vfs.dev.read.ops comes from userparameter_linux_disks.conf
Units: Reads
New Application: I/O Stats

Screen Shot 2014-01-31 at 16.37.40

Click Save.

If you deploy this template now on your Linux hosts it will auto-discover your disk devices, only add “sd” devices as host items and monitor the total completed reads on them:

Screen Shot 2014-01-31 at 16.54.08Making sense of it all:

If we take a closer look to the stats we’re using to be put in Zabbix, there is a way to make them even more valuable and easier to interpet.

I’d like to take a closer look to the following items:

UserParameter=custom.vfs.dev.read.ops[*] # reads completed successfullly
UserParameter=custom.vfs.dev.read.sectors[*] # sectors read
UserParameter=custom.vfs.dev.write.ops[*] # writes completed
UserParameter=custom.vfs.dev.write.sectors[*] # sectors written

The data from these items comes from /proc/diskstats, and the data in there is accumulative. For example, when a sector is written the number of write.sectors will go up by 1. So if we would put these values in Zabbix, we will just have an ever growing graph.

Sure this isn’t totally useless, you can still see trends in how fast (or slow) sectors are being written over time. But for me, far more interesting is the number of sectors written per second.

Luckily Zabbix comes with a builtin feature to calculate just that. While configuring an item, you can set it’s Store value to “Delta (speed per second)”.

This will evaluate the data as (valueprev_value)/(timeprev_time), where
value – current value
value_prev – previously received value
time – current timestamp
prev_time – timestamp of previous value

Using this option will give me the number of sectors written per second, instead of an accumulated value of all sectors ever written:

Screen Shot 2014-02-01 at 23.35.17

Completed template and setup:

If you’re interested, I’ve completed and exported the template I build in this post so you can set it up in your own environment. To do so:

1. Download the template.
2. Import the template in your Zabbix instance
3. Be sure to enable your Zabbix agents to do the auto discovery and custom keys (see post above)
4. Add the template to a host and test

As always, if you need any assistance please let me know ;-).

Happy monitoring!