imuxsock

Recipe: Apache Logs + rsyslog (parsing) + Elasticsearch

By rgheorghePosted on October 13, 2015Posted in More complex scenariosTagged apache, elasticsearch, Guides for rsyslog, howto, imfile, imklog, imuxsock, liblognorm, mmnormalize, omelasticsearch, parsing, queues, recipe, rsyslog, syslog, templates, unstructured, v8

Original post: Recipe: Apache Logs + rsyslog (parsing) + Elasticsearch by @Sematext

This recipe is about tailing Apache HTTPD logs with rsyslog, parsing them into structured JSON documents, and forwarding them to Elasticsearch (or a log analytics SaaS, like Logsene, which exposes the Elasticsearch API). Having them indexed in a structured way will allow you to do better analytics with tools like Kibana:

We’ll also cover pushing logs coming from the syslog socket and kernel, and how to buffer all of them properly. So this is quite a complete recipe for your centralized logging needs.

Getting the ingredients

Even though most distros already have rsyslog installed, it’s highly recommended to get the latest stable from the rsyslog repositories. The packages you’ll need are:

rsyslog. The base package, including the file-tailing module (imfile)
rsyslog-mmnormalize. This gives you mmnormalize, a module that will do the parsing of common Apache logs to JSON
rsyslog-elasticsearch, for the Elasticsearch output

With the ingredients in place, let’s start cooking a configuration. The configuration needs to do the following:

load the required modules
configure inputs: tailing Apache logs and system logs
configure the main queue to buffer your messages. This is also the place to define the number of worker threads and batch sizes (which will also be Elasticsearch bulk sizes)
parse common Apache logs into JSON
define a template where you’d specify how JSON messages would look like. You’d use this template to send logs to Logsene/Elasticsearch via the Elasticsearch output

Loading modules

Here, we’ll need imfile to tail files, mmnormalize to parse them, and omelasticsearch to send them. If you want to tail the system logs, you’d also need to include imuxsock and imklog (for kernel logs).

# system logs
module(load="imuxsock")
module(load="imklog")
# file
module(load="imfile")
# parser
module(load="mmnormalize")
# sender
module(load="omelasticsearch")

Configure inputs

For system logs, you typically don’t need any special configuration (unless you want to listen to a non-default Unix Socket). For Apache logs, you’d point to the file(s) you want to monitor. You can use wildcards for file names as well. You also need to specify a syslog tag for each input. You can use this tag later for filtering.

input(type="imfile"
      File="/var/log/apache*.log"
      Tag="apache:"
)

NOTE: By default, rsyslog will not poll for file changes every N seconds. Instead, it will rely on the kernel (via inotify) to poke it when files get changed. This makes the process quite realtime and scales well, especially if you have many files changing rarely. Inotify is also less prone to bugs when it comes to file rotation and other events that would otherwise happen between two “polls”. You can still use the legacy mode=”polling” by specifying it in imfile’s module parameters.

Queue and workers

By default, all incoming messages go into a main queue. You can also separate flows (e.g. files and system logs) by using different rulesets but let’s keep it simple for now.

For tailing files, this kind of queue would work well:

main_queue(
  queue.workerThreads="4"
  queue.dequeueBatchSize="1000"
  queue.size="10000"
)

This would be a small in-memory queue of 10K messages, which works well if Elasticsearch goes down, because the data is still in the file and rsyslog can stop tailing when the queue becomes full, and then resume tailing. 4 worker threads will pick batches of up to 1000 messages from the queue, parse them (see below) and send the resulting JSONs to Elasticsearch.

If you need a larger queue (e.g. if you have lots of system logs and want to make sure they’re not lost), I would recommend using a disk-assisted memory queue, that will spill to disk whenever it uses too much memory:

main_queue(
  queue.workerThreads="4"
  queue.dequeueBatchSize="1000"
  queue.highWatermark="500000"    # max no. of events to hold in memory
  queue.lowWatermark="200000"     # use memory queue again, when it's back to this level
  queue.spoolDirectory="/var/run/rsyslog/queues"  # where to write on disk
  queue.fileName="stats_ruleset"
  queue.maxDiskSpace="5g"        # it will stop at this much disk space
  queue.size="5000000"           # or this many messages
  queue.saveOnShutdown="on"      # save memory queue contents to disk when rsyslog is exiting
)

Parsing with mmnormalize

The message normalization module uses liblognorm to do the parsing. So in the configuration you’d simply point rsyslog to the liblognorm rulebase:

action(type="mmnormalize"
  rulebase="/opt/rsyslog/apache.rb"
)

where apache.rb will contain rules for parsing apache logs, that can look like this:

version=2

rule=:%clientip:word% %ident:word% %auth:word% [%timestamp:char-to:]%] "%verb:word% %request:word% HTTP/%httpversion:float%" %response:number% %bytes:number% "%referrer:char-to:"%" "%agent:char-to:"%"%blob:rest%

Where version=2 indicates that rsyslog should use liblognorm’s v2 engine (which is was introduced in rsyslog 8.13) and then you have the actual rule for parsing logs. You can find more details about configuring those rules in the liblognorm documentation.

Besides parsing Apache logs, creating new rules typically requires a lot of trial and error. To check your rules without messing with rsyslog, you can use the lognormalizer binary like:

head -1 /path/to/log.file | /usr/lib/lognorm/lognormalizer -r /path/to/rulebase.rb -e json

NOTE: If you’re used to Logstash’s grok, this kind of parsing rules will look very familiar. However, things are quite different under the hood. Grok is a nice abstraction over regular expressions, while liblognorm builds parse trees out of specialized parsers. This makes liblognorm much faster, especially as you add more rules. In fact, it scales so well, that for all practical purposes, performance depends on the length of the log lines and not on the number of rules. This post explains the theory behind this assuption, and this is actually proven by various tests. The downside is that you’ll lose some of the flexibility offered by regular expressions. You can still use regular expressions with liblognorm (you’d need to set allow_regex to on when loading mmnormalize) but then you’d lose a lot of the benefits that come with the parse tree approach.

Template for parsed logs

Since we want to push logs to Elasticsearch as JSON, we’d need to use templates to format them. For Apache logs, by the time parsing ended, you already have all the relevant fields in the $!all-json variable, that you’ll use as a template:

template(name="all-json" type="list"){
  property(name="$!all-json")
}

Template for time-based indices

For the logging use-case, you’d probably want to use time-based indices (e.g. if you keep your logs for 7 days, you can have one index per day). Such a design will give your cluster a lot more capacity due to the way Elasticsearch merges data in the background (you can learn the details in our presentations at GeeCON and Berlin Buzzwords).

To make rsyslog use daily or other time-based indices, you need to define a template that builds an index name off the timestamp of each log. This is one that names them logstash-YYYY.MM.DD, like Logstash does by default:

template(name="logstash-index"
  type="list") {
    constant(value="logstash-")
    property(name="timereported" dateFormat="rfc3339" position.from="1" position.to="4")
    constant(value=".")
    property(name="timereported" dateFormat="rfc3339" position.from="6" position.to="7")
    constant(value=".")
    property(name="timereported" dateFormat="rfc3339" position.from="9" position.to="10")
}

And then you’d use this template in the Elasticsearch output:

action(type="omelasticsearch"
  template="all-json"
  dynSearchIndex="on"
  searchIndex="logstash-index"
  searchType="apache"
  server="MY-ELASTICSEARCH-SERVER"
  bulkmode="on"
  action.resumeretrycount="-1"
)

Putting both Apache and system logs together

If you use the same rsyslog to parse system logs, mmnormalize won’t parse them (because they don’t match Apache’s common log format). In this case, you’ll need to pick the rsyslog properties you want and build an additional JSON template:

template(name="plain-syslog"
  type="list") {
    constant(value="{")
      constant(value="\"timestamp\":\"")     property(name="timereported" dateFormat="rfc3339")
      constant(value="\",\"host\":\"")        property(name="hostname")
      constant(value="\",\"severity\":\"")    property(name="syslogseverity-text")
      constant(value="\",\"facility\":\"")    property(name="syslogfacility-text")
      constant(value="\",\"tag\":\"")   property(name="syslogtag" format="json")
      constant(value="\",\"message\":\"")    property(name="msg" format="json")
    constant(value="\"}")
}

Then you can make rsyslog decide: if a log was parsed successfully, use the all-json template. If not, use the plain-syslog one:

if $parsesuccess == "OK" then {
 action(type="omelasticsearch"
  template="all-json"
  ...
 )
} else {
 action(type="omelasticsearch"
  template="plain-syslog"
  ...
 )
}

And that’s it! Now you can restart rsyslog and get both your system and Apache logs parsed, buffered and indexed into Elasticsearch. If you’re a Logsene user, the recipe is a bit simpler: you’d follow the same steps, except that you’ll skip the logstash-index template (Logsene does that for you) and your Elasticsearch actions will look like this:

action(type="omelasticsearch"
  template="all-json or plain-syslog"
  searchIndex="LOGSENE-APP-TOKEN-GOES-HERE"
  searchType="apache"
  server="logsene-receiver.sematext.com"
  serverport="80"
  bulkmode="on"
  action.resumeretrycount="-1"
)

rsyslog 8.9.0 (v8-stable) released

By Adiscon SupportPosted on April 7, 2015Posted in News, Release AnnouncementTagged 0mq, 8.9.0, bugfix, imtcp, imuxsock, omprog, release, rsyslog, stable, v8

We have released rsyslog 8.9.0.

This is primarily a bug-fixing release with a couple of improvements in omprog, imuxsock and the zero message queue plugins.

ChangeLog:

http://www.rsyslog.com/changelog-for-8-9-0-v8-stable/

Download:

http://www.rsyslog.com/downloads/download-v8-stable/

As always, feedback is appreciated.

Best regards,
Florian Riedl

Changelog for 8.9.0 (v8-stable)

By Adiscon SupportPosted on April 7, 2015Posted in ChangelogTagged 0mq, 8.9.0, bugfix, imtcp, imuxsock, omprog, rsyslog, stable, v8

Version 8.9.0 [v8-stable] 2015-04-07

omprog: add option “hup.forward” to forwards HUP to external plugins
This was suggested by David Lang so that external plugins (and other
programs) can also do HUP-specific processing. The default is not
to forward HUP, so no change of behavior by default.
imuxsock: added capability to use regular parser chain
Previously, this was a fixed format, that was known to be spoken on
the system log socket. This also adds new parameters:
- sysSock.useSpecialParser module parameter
- sysSock.parseHostname module parameter
- useSpecialParser input parameter
- parseHostname input parameter
0mq: improvements in input and output modules
See module READMEs, part is to be considered experimental.
Thanks to Brian Knox for the contribution.
imtcp: add support for ip based bind for imtcp -> param “address”
Thanks to github user crackytsi for the patch.
bugfix: MsgDeserialize out of sync with MsgSerialize for StrucData
This lead to failure of disk queue processing when structured data was
present. Thanks to github user adrush for the fix.
bugfix imfile: partial data loss, especially in readMode != 0
closes https://github.com/rsyslog/rsyslog/issues/144
bugfix: potential large memory consumption with failed actions
see also https://github.com/rsyslog/rsyslog/issues/253
bugfix: omudpspoof: invalid default send template in RainerScript format
The file format template was used, which obviously does not work for
forwarding. Thanks to Christopher Racky for alerting us.
closes https://github.com/rsyslog/rsyslog/issues/268
bugfix: size-based legacy config statements did not work properly
on some platforms, they were incorrectly handled, resulting in all
sorts of “interesting” effects (up to segfault on startup)
build system: added option –without-valgrind-testbench
… which provides the capability to either enforce or turn off
valgrind use inside the testbench. Thanks to whissi for the patch.
rsyslogd: fix misleading typos in error messages
Thanks to Ansgar Püster for the fixes.

rsyslog 7.4.7 (v7-stable) released

By Adiscon SupportPosted on December 10, 2013Posted in News, Release AnnouncementTagged 7.4.7, bugfix, disk queue, imtcp, imuxsock, release, rsyslog, segfault, stable, v7

We have just released 7.4.7 of the v7-stable branch. This is a bug-fixing release. Most importantly it fixes a bug that can lead to Continue reading “rsyslog 7.4.7 (v7-stable) released”

Changelog for 7.4.7 (v7-stable)

By Adiscon SupportPosted on December 10, 2013Posted in ChangelogTagged 7.4.7, bugfix, Changelog, disk queue, imtcp, imuxsock, rsyslog, stable, v7

Version 7.4.7 [v7.4-stable] 2013-12-10

bugfix: limiting queue disk space did not work properly
- queue.maxdiskspace actually initializes queue.maxfilesize
- total size of queue files was not checked against queue.maxdiskspace for disk assisted queues.
Thanks to Karol Jurak for the patch.
bugfix: linux kernel-like ratelimiter did not work properly with all inputs (for example, it did not work with imdup). The reason was that the PRI value was used, but that needed parsing of the message, which was done too late.
bugfix: disk queues created files in wrong working directory if the $WorkDirectory was changed multiple times, all queues only used the last value set.
bugfix: legacy directive $ActionQueueWorkerThreads was not honored
bugfix: segfault on startup when certain script constructs are used
e.g. “if not $msg …”
bugfix: imuxsock: UseSysTimeStamp config parameter did not work correctly
Thanks to Tomas Heinrich for alerting us and provinding a solution suggestion.
bugfix: $SystemLogUseSysTimeStamp/$SystemLogUsePIDFromSystem did not work
Thanks to Tomas Heinrich for the patch.
improved checking of queue config parameters on startup
bugfix: call to ruleset with async queue did not use the queue
closes: http://bugzilla.adiscon.com/show_bug.cgi?id=443
bugfix: if imtcp is loaded and no listeners are configured (which is uncommon), rsyslog crashes during shutdown.

Changelog for 7.5.4 (v7-devel)

By Adiscon SupportPosted on October 7, 2013Posted in ChangelogTagged 7.5.4, bugfix, Changelog, devel, documentation, imtcp, imuxsock, mmpstrucdata, mmutf8fix, omfile, omfwd, rsyslog, v7

Version 7.5.4 [devel] 2013-10-07

mmpstrucdata: new module to parse RFC5424 structured data into json message properties
change main/ruleset queue defaults to be more enterprise-like
new defaults are queue.size 100,000 max workers 2, worker activation after 40,000 msgs are queued, batch size 256. These settings are much more useful for enterprises and will not hurt low-end systems that much. This is part of our re-focus on enterprise needs.
omfwd: new action parameter “maxErrorMessages” added
omfile: new module parameters to set action defaults added
* dirCreateMode
* fileCreateMode
mmutf8fix: new module to fix invalid UTF-8 sequences
imuxsock: handle unlimited number of additional listen sockets
doc: improve usability by linking to relevant web ressources
The idea is to enable users to quickly find additional information, samples, HOWTOs and the like on the main site. At the same time, (very) slightly remove memory footprint when few listeners are monitored.
bugfix: omfwd parameter streamdrivermmode was not properly handled
It was always overwritten by whatever value was set via the legacy directive $ActionSendStreamDriverMode
imtcp: add streamdriver.name module parameter
permits overriding the system default stream driver (gtls, ptcp)
bugfix: build system: libgcrypt.h needed even if libgrcypt was disabled
Thanks to Jonny Törnbom for reporting this problem
imported bugfixes from 7.4.4

Changelog for 7.3.11 (v7-devel)

By Adiscon SupportPosted on April 23, 2013Posted in ChangelogTagged 7.3.11, bugfix, Changelog, devel, imuxsock, log file encryption, omhiredis, rsyslog, v7

Version 7.3.11 [devel] 2013-04-23

added support for encrypting log files
omhiredis: added support for redis pipeline support
Thanks to Brian Knox for the patch.
bugfix: $PreserveFQDN is not properly working
Thanks to Louis Bouchard for the patch
closes: http://bugzilla.adiscon.com/show_bug.cgi?id=426
bugfix: imuxsock aborted due to problem in ratelimiting code
Thanks to Tomas Heinrich for the patch.
bugfix: imuxsock aborted under some conditions
regression from ratelimiting enhancements – this was a different one to the one Tomas Heinrich patched.
bugfix: timestamp problems in imkmsg

rsyslog 7.3.11 (v7-devel)

By Adiscon SupportPosted on April 23, 2013Posted in devel, DownloadTagged 7.3.11, bugfix, devel, Download, imuxsock, log file encryption, rsyslog, v7

Download file name: rsyslog 7.3.11 (devel)

rsyslog 7.3.11 (devel)
sha256 hash: 2a41dfb1cd756880693e297877ffbab5d324ecf108e3533401a919efb49ea344

Author: Rainer Gerhards (rgerhards@adiscon.com)
Version: 7.3.11 File size: 2.887 MB

Download this file now!

rsyslog 7.3.7 (v7-devel) released

By Adiscon SupportPosted on March 12, 2013Posted in News, Release AnnouncementTagged anonymizing IPv4, bugfix, devel, features, imuxsock, Journal, mmjsonparse, omjournal, release, rsyslog, v7

We have just released v 7.3.7 of the rsyslog development branch. This release offers some important new features, most importantly a plugin to anonymize IPv4 addresses and a plugin to write to the systemd journal. Also, the field() RainerScript function has been upgraded to support multi-character field delimiters. There is also a number of bug fixes present.

ChangeLog:

http://www.rsyslog.com/changelog-for-7-3-7-v7-devel/

Download:

http://www.rsyslog.com/rsyslog-7-3-7-v7-devel/

As always, feedback is appreciated.

Best regards,
Florian Riedl

Changelog for 7.3.7 (v7-devel)

By Adiscon SupportPosted on March 12, 2013Posted in ChangelogTagged anonymizing IPv4, bugfix, devel, imuxsock, Journal, mmjsonparse, omjournal, rsyslog, v7

Version 7.3.7 [devel] 2013-03-12

add support for anonymizing IPv4 addresses
add support for writing to the Linux Journal (omjournal)
imuxsock: add capability to ignore messages from ourselves
This helps prevent message routing loops, and is vital to have if omjournal is used together with traditional syslog.
field() function now supports a string as field delimiter
added ability to configure debug system via rsyslog.conf
bugfix: imuxsock segfault when system log socket was used
bugfix: mmjsonparse segfault if new-style config was used
bugfix: script == comparison did not work properly on JSON objects
bugfix: field() function did never return “***FIELD NOT FOUND***”
instead it returned “***ERROR in field() FUNCTION***” in that case