Realtime Web Server Log Metrics

http://jasonwilder.com/blog/2013/07/22/realtime-web-server-log-metrics/

This is a sample config that uses nxlog to tail web access logs in Combined Log Format, pull out the status code and bytes sent and send them to statsd so they can be graphed using Graphite.

It’s a simple way to see if your web server is returning errors over time or how much data it’s sending. The same concept could be used for other log files.

Logster lets you do similar things but custom parsing is accomplished by writing Python plugins which can be a little more complicated than using configuration files.

nxlog works by defining inputs, processors, outputs and routes that tie the inputs to processers and finally to outputs.

For this example, I use the imfile module which will tail one or more files send each line along for processing. nxlog will remember the last line read by default so restarts or connectivity issues are handled gracefully. After each line is read, the line is passed to a _Exec statement that parses each field into variables.

The parsing could be done in the processor but because of how the routes are setup, I chose to do it during input so it only happens once.

# Tail access log in Combined Log Format and parse out the fields.
<Input in_nginx>
    Module  im_file
    File	"/var/log/nginx/access.log"
    Exec    if $raw_event =~ /^(\S+) (\S+) (\S+) \[([^\]]+)\] \"(\S+) (.+) HTTP.\d\.\d\" (\d+) (\d+) \"([^\"]+)\" \"([^\"]+)\"/\
                { \
                  $Hostname = $1; \
                  if $3 != '-' $AccountName = $3; \
                  $EventTime = parsedate($4); \
                  $HTTPMethod = $5; \
                  $HTTPURL = $6; \
                  $HTTPResponseStatus = $7; \
                  $HTTPBytesSent = $8; \
                  $HTTPReferer = $9; \
                  $HTTPUserAgent = $10; \
                }
</Input>

There are two processors configured http_status and http_bytes. Each one checks to make sure the required data has been parsed and then rewrites the event into a statsd protocol metric. Both processors create a counter metric that statsd will sum up and roll-over at every interval.

# Rewrite the log message to a statsd counter event using the HTTP status code.
<Processor http_status>
    Module      pm_null
    Exec        if defined($HTTPResponseStatus) { \
		          $raw_event = "http.status." + $HTTPResponseStatus + ":1|c"; \
                } else { \
                  drop(); \
                }
</Processor>

# Rewrite the log message to a statsd counter event using the bytes sent.
<Processor http_bytes>
    Module      pm_null
    Exec        if defined($HTTPBytesSent) { \
		          $raw_event = "http.bytes." + $HTTPBytes + ":" + $HTTPBytesSent + "|c"; \
                } else { \
                  drop(); \
                }
</Processor>

The output is a om_udp that forwards the re-written log event to a statsd instance.

# Statsd uses UDP
<Output out_statsd>
    Module	om_udp
    Host	127.0.0.1
    Port	8125
</Output>

Finally, two routes are created that originate from the same input file but are sent to separate processors and back to the same output. Since processors are normally chained together and I’m re-writing the event, chaining them together doesn’t work. Instead I need a new copy of the log message for each processor.

# Route nginx access log through status processor and out to statsd
<Route web_status_statsd>
    Path        in_nginx => http_status => out_statsd
</Route>

# Since we re-wrote the log event, define a new route for bytes sent
<Route web_bytes_statsd>
    Path        in_nginx => http_bytes => out_statsd
</Route>

And the whole thing put together:

# Tail access log in Combined Log Format and parse out the fields.
<Input in_nginx>
    Module  im_file
    File	"/var/log/nginx/access.log"
    Exec    if $raw_event =~ /^(\S+) (\S+) (\S+) \[([^\]]+)\] \"(\S+) (.+) HTTP.\d\.\d\" (\d+) (\d+) \"([^\"]+)\" \"([^\"]+)\"/\
                { \
                  $Hostname = $1; \
                  if $3 != '-' $AccountName = $3; \
                  $EventTime = parsedate($4); \
                  $HTTPMethod = $5; \
                  $HTTPURL = $6; \
                  $HTTPResponseStatus = $7; \
                  $HTTPBytesSent = $8; \
                  $HTTPReferer = $9; \
                  $HTTPUserAgent = $10; \
                }
</Input>

# Rewrite the log message to a statsd counter event using the HTTP status code.
<Processor http_status>
    Module      pm_null
    Exec        if defined($HTTPResponseStatus) { \
		          $raw_event = "http.status." + $HTTPResponseStatus + ":1|c"; \
                } else { \
                  drop(); \
                }
</Processor>

# Rewrite the log message to a statsd counter event using the bytes sent.
<Processor http_bytes>
    Module      pm_null
    Exec        if defined($HTTPBytesSent) { \
		          $raw_event = "http.bytes." + $HTTPBytes + ":" + $HTTPBytesSent + "|c"; \
                } else { \
                  drop(); \
                }
</Processor>

# Statsd uses UDP
<Output out_statsd>
    Module	om_udp
    Host	127.0.0.1
    Port	8125
</Output>

# Route nginx access log through status processor and out to statsd
<Route web_status_statsd>
    Path        in_nginx => http_status => out_statsd
</Route>

# Since we re-wrote the log event, define a new route for bytes sent
<Route web_bytes_statsd>
    Path        in_nginx => http_bytes => out_statsd
</Route>