Integration of pmacct with ElasticSearch and Kibana

https://blog.pierky.com/integration-of-pmacct-with-elasticsearch-and-kibana/

In this post I want to show a solution based on a script (pmacct-to-elasticsearch) that I made to gather data from pmacct and visualize them using Kibana/ElasticSearch. It’s far from being the state of the art of IP accounting solutions, but it may be used as a starting point for further customizations and developments.

I plan to write another post with some ideas to integrate pmacct with the canonical ELK stack (ElasticSearch/Logstash/Kibana). As usual, add my RSS feed to your reader or follow me on Twitter to stay updated!

The big picture

This is the big picture of the proposed solution:

pmacct-to-elasticsearch - The big picture

There are 4 main actors: pmacct daemons (we already saw how to install and configure them) that collect accounting data, pmacct-to-elasticsearch, which reads pmacct’s output, processes it and sends it to ElasticSearch, where data are stored and organized into indices and, at last, Kibana, that is used to chart them on a web frontend.


The starting point of this tutorial is the scenario previously viewed in the Installing pmacct on a fresh Ubuntu setup post.

UPDATE: I made some changes to the original text (that was about Kibana 4 Beta 2) since Kibana 4 has been officially released

In the first part of this post I’ll cover a simple setup of both ElasticSearch 1.4.4 and Kibana 4.

In the second part I’ll show how to integrate pmacct-to-elasticsearch with the other components.

Setup of ElasticSearch and Kibana

This is a quick guide to setup the aforementioned programs in order to have a working scenario for my goals: please strongly consider security and scalability issues before using it for a real production environment. You can find everything you need on the ElasticSearch web site.

Dependencies

Install Java (Java 8 update 20 or later, or Java 7 update 55 or later are recommended at time of writing for ElasticSearch 1.4.4):

# apt-get install openjdk-7-jre

ElasticSearch

Install ElasticSearch from its APT repository

# wget -qO - https://packages.elasticsearch.org/GPG-KEY-elasticsearch | sudo apt-key add -
# add-apt-repository "deb http://packages.elasticsearch.org/elasticsearch/1.4/debian stable main"
# apt-get update && apt-get install elasticsearch

… and (optionally) configure it to automatically start on boot:

# update-rc.d elasticsearch defaults 95 10

Since this post covers only a simple setup, tuning and advanced configuration are out of its scope, but it is advisable to consider the official configuration guide for any production-ready setup.
Just change a network parameter to be sure that ES does not listen on any public socket; edit the /etc/elasticsearch/elasticsearch.yml file and set

network.host: 127.0.0.1

Finally, start it:

# service elasticsearch start

Wait some seconds then, if everything is ok, you can check its status with an HTTP query:

# curl http://localhost:9200/?pretty
{
  "status" : 200,
  "name" : "Wild Thing",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "1.4.4",
    "build_hash" : "c88f77ffc81301dfa9dfd81ca2232f09588bd512",
    "build_timestamp" : "2015-02-19T13:05:36Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.3"
  },
  "tagline" : "You Know, for Search"
}

Kibana 4

Download and install the right version of Kibana 4, depending on your architecture (here I used the x64):

# cd /opt
# curl -O https://download.elasticsearch.org/kibana/kibana/kibana-4.0.0-linux-x64.tar.gz
# tar -zxvf kibana-4.0.0-linux-x64.tar.gz

By default, Kibana listens on 0.0.0.0:5601 for the web front-end: again, for this simple setup it’s OK, but be sure to protect your server using a firewall and/or a reverse proxy like Nginx.

Run it (here I put it in background and redirect its output to /var/log/kibana4.log):

# /opt/kibana-4.0.0-linux-x64/bin/kibana > /var/log/kibana4.log &

Wait some seconds until it starts, then point your browser at http://YOUR_IP_ADDRESS:5601 to check that everything is fne.

pmacct-to-elasticsearch configuration

Now that all the programs we need are up and running we can focus on pmacct-to-elasticsearch setup.

pmacct-to-elasticsearch is designed to read JSON output from pmacct daemons, to process it and to store it into ElasticSearch. It works with both memory and print plugins and, optionally, it can perform manipulations on data (such as to add fields on the basis of other values).

pmacct-to-elasticsearch Data flow

Install git, download the repository from GitHub and install it:

# apt-get install git
# cd /usr/local/src/
# git clone https://github.com/pierky/pmacct-to-elasticsearch.git
# cd pmacct-to-elasticsearch/
# ./install

Now it’s time to configure pmacct-to-elasticsearch to send some records to ElasticSearch. Configuration details can be found in the CONFIGURATION.md file.

In the last post an instance of pmacctd was configured, with a memory plugin named plugin1that was performing aggregation on a socket basis (src host:port / dst host:port / protocol):

plugins: memory[plugin1]

imt_path[plugin1]: /var/spool/pmacct/plugin1.pipe
aggregate[plugin1]: etype, proto, src_host, src_port, dst_host, dst_port

In order to have pmacct-to-elasticsearch to process plugin1 output, we need to create the homonymous pmacct-to-elasticsearch configuration file, /etc/p2es/plugin1.conf; default values already point pmacct-to-elasticsearch to the local instance of ElasticSearch (URL = http://localhost:9200), so we just need to set the destination index name and type:

{
    "ES_IndexName": "example-%Y-%m-%d",
    "ES_Type": "socket"
}

Since this is a memory plugin, we also need to schedule a crontab task to consume data from the in-memory-table and pass them to pmacct-to-elasticsearch, so edit the /etc/cron.d/pmacct-to-elasticsearch file and add the line:

*/5 *  * * *     root  pmacct -l -p /var/spool/pmacct/plugin1.pipe -s -O json -e | pmacct-to-elasticsearch plugin1

Everything is now ready to have the first records inserted into ElasticSearch: if you don’t want to wait for the crontab task to run, execute the above command from command line then query ElasticSearch to show the records:

# curl http://localhost:9200/example-`date +%F`/socket/_search?pretty
{
  ...
  "hits" : {
    "total" : 6171,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "example-2014-12-15",
      "_type" : "socket",
      "_id" : "AUo910oSOUAYMzMu9bxU",
      "_score" : 1.0,
      "_source": { "packets": 1, "ip_dst": "172.16.1.15",
                   "@timestamp": "2014-12-15T19:32:02Z", "bytes": 256,
                   "port_dst": 56529, "etype": "800", "port_src": 53, 
                   "ip_proto": "udp", "ip_src": "8.8.8.8" }
      }
      ...
    ]
  }
}

(the `date +%F` is used here to obtain the actual date in the format used by the index name, that is YYYY-MM-DD)

Just to try the configuration for a print plugin, edit the /etc/pmacct/pmacctd.conf configuration file, change the plugins line and add the rest:

plugins: memory[plugin1], print[plugin2]

print_output_file[plugin2]: /var/lib/pmacct/plugin2.json
print_output[plugin2]: json
print_trigger_exec[plugin2]: /etc/p2es/triggers/plugin2
print_refresh_time[plugin2]: 60
aggregate[plugin2]: proto, src_port
aggregate_filter[plugin2]: src portrange 0-1023

Then, prepare the p2es configuration file for pmacct-to-elasticsearch execution for this plugin (/etc/p2es/plugin2.conf):

{
    "ES_IndexName": "example-%Y-%m-%d",
    "ES_Type": "source_port",
    "InputFile": "/var/lib/pmacct/plugin2.json"
}

Here, pmacct-to-elasticsearch is instructed to read from /var/lib/pmacct/plugin2.json, the file where pmacctd daemon writes to.

As you can see from the pmacctd plugin2 configuration above, a trigger is needed in order to run pmacct-to-elasticsearch: /etc/p2es/triggers/plugin2. Just add a link to the default_triggerscript and it’s done:

# cd /etc/p2es/triggers/
# ln -s default_trigger plugin2

Now you can restart pmacct daemons in order to load the new configuration for plugin2:

# service pmacct restart

or, if you preferred not to install my pmacct System V initscript:

# killall -INT pmacctd -w ; pmacctd -f /etc/pmacct/pmacctd.conf -D

After the daemon has finished writing the output file (/var/lib/pmacct/plugin2.json), it runs the trigger which, in turn, executes pmacct-to-elasticsearch with the right argument (plugin2) and detaches it.

Wait a minute, then query ElasticSearch from command line:

# curl http://localhost:9200/example-`date +%F`/source_port/_search?pretty

From now on it’s just a matter of customizations and visualization in Kibana. The official Kibana 4 Quick Start guide can help you to create visualizations and graphs. Remember, the name of the index used in these examples follows the [example-]YYYY-MM-DD daily pattern.

Housekeeping

Time series indices tend to grow and to fill up disk space and storage, so a rotation policy may be useful to delete data older than a specific date.

The Curator tool and its Delete command can help you in this:

# apt-get install python-pip
# pip install elasticsearch-curator

Once installed, test it using the right arguments…

# curator --dry-run delete indices --prefix example- --timestring %Y-%m-%d --older-than 1 --time-unit days
2014-12-15 19:04:13,026 INFO      Job starting...
2014-12-15 19:04:13,027 INFO      DRY RUN MODE.  No changes will be made.
2014-12-15 19:04:13,031 INFO      DRY RUN: Deleting indices...
2014-12-15 19:04:13,035 INFO      example-2014-12-15 is within the threshold period (1 days).
2014-12-15 19:04:13,035 INFO      DRY RUN: Speficied indices deleted.
2014-12-15 19:04:13,036 INFO      Done in 0:00:00.020131.

… and, eventually, schedule it in the pmacct-to-elasticsearch crontab file (/etc/cron.d/pmacct-to-elasticsearch), setting the desired retention period:

# m h dom mon dow user  command
...
0 1  * * *     root  curator delete indices --prefix example- --timestring \%Y-\%m-\%d --older-than 30 --time-unit days
#EOF

Of course, you can use Curator for many other management and optimization tasks too, but they are out of the scope of this post.