rainerscript

rsyslog 8.7.0 (v8-stable) released

We have released rsyslog 8.7.0.

Version 8.7.0 contains various improvements and additions to a wide array of modules, like imfile, imptcp, improvements to RainerScript and mmnormalize (thanks to Singh Janmejay) and a couple of other improvements. But, the biggest addition is the new omkafka module that now allows direct writing to Apache Kafka.

This release also contains important bug fixes.

This is a recommended upgrade for all users.

 

ChangeLog:

http://www.rsyslog.com/changelog-for-8-7-0-v8-stable/

Download:

http://www.rsyslog.com/downloads/download-v8-stable/

As always, feedback is appreciated.

Best regards,
Florian Riedl

Performance Tuning&Tests for the Elasticsearch Output

Original post: Rsyslog 8.1 Elasticsearch Output Performance by @Sematext

Version 8 brings major changes in rsyslog’s core – see Rainer’s presentation about it for more details. Those changes should give outputs better performance, and the Elasticsearch one should benefit a lot. Since we’re using rsyslog and Elasticsearch in Sematext‘s own log analytics product, Logsene, we had to take the new version for a spin.

The Weapon and the Target

For testing, we used a good-old i3 laptop, with 8GB of RAM. We generated 20 million logs, sent them to rsyslog via TCP and from there to Elasticsearch in the Logstash format, so they can get explored with Kibana. The objective was to stuff as many events per second into Elasticsearch as possible.

Rsyslog Architecture Overview

In order to tweak rsyslog effectively, one needs to understand its architecture, which is not that obvious (although there’s an ongoing effort to improve the documentation). The gist of it its architecture represented in the figure below.

  • you have input modules taking messages (from files, TCP/UDP, journal, etc.) and pushing them to a main queue
  • one or more main queue threads take those events and parse them. By default, they parse syslog formats (RFC-3164, RFC-5424 and various derivatives), but you can configure rsyslog to use message modifier modules to do additional parsing (e.g. CEE-formatted JSON messages). Either way, this parsing generates structured events, made out of properties
  • after parsing, the main queue threads push events to the action queue. Or queues, if there are multiple actions and you want to fan-out
  • for each defined action, one or more action queue threads takes properties from events according to templates, and makes messages that would be sent to the destination. In Elasticsearch’s case, a template should make Elasticsearch JSON documents, and the destination would be the REST API endpoint

rsyslog message flow

There are two more things to say about rsyslog’s architecture before we move on to the actual test:

  • you can have multiple independent flows (like the one in the figure above) in the same rsyslog process by using rulesets. Think of rulesets as swim-lanes. They’re useful for example when you want to process local logs and remote logs in a completely separate manner
  • queues can be in-memory, on disk, or a combination called disk-assisted. Here, we’ll use in-memory because they’re the fastest. For more information about how queues work, take a look here

Configuration

To generate messages, we used tcpflood, a small and light tool that’s part of rsyslog’s testbench. It generates messages and sends them over to the local syslog via TCP.

Rsyslog took received those messages with the imtcp input module, queued them and forwarded them to Elasticsearch 0.90.7, which was also installed locally. We also tried with Elasticsearch 1.0 and got the same results (see below).

The flow of messages in this test is represented in the following figure:

Flow of messages in this test

The actual rsyslog config is listed below (in the new configuration format). It can be tuned further (for example by using the multithreaded imptcp input module), but we didn’t get significant improvements in this particular scenario.

module(load="imtcp")   # TCP input module
module(load="omelasticsearch") # Elasticsearch output module

input(type="imtcp" port="13514")  # where to listen for TCP messages

main_queue(
  queue.size="1000000"   # capacity of the main queue
  queue.dequeuebatchsize="1000"  # process messages in batches of 1000 and move them to the action queues
  queue.workerthreads="2"  # 2 threads for the main queue
)

# template to generate JSON documents for Elasticsearch in Logstash format
template(name="plain-syslog"
         type="list") {
           constant(value="{")
             constant(value="\"@timestamp\":\"")      property(name="timereported" dateFormat="rfc3339")
             constant(value="\",\"host\":\"")        property(name="hostname")
             constant(value="\",\"severity\":\"")    property(name="syslogseverity-text")
             constant(value="\",\"facility\":\"")    property(name="syslogfacility-text")
             constant(value="\",\"syslogtag\":\"")   property(name="syslogtag" format="json")
             constant(value="\",\"message\":\"")    property(name="msg" format="json")
             constant(value="\"}")
         }

action(type="omelasticsearch"
       template="plain-syslog"  # use the template defined earlier
       searchIndex="test-index"
       bulkmode="on"                   # use the Bulk API
       queue.dequeuebatchsize="5000"   # ES bulk size
       queue.size="100000"   # capacity of the action queue
       queue.workerthreads="5"   # 5 workers for the action
       action.resumeretrycount="-1"  # retry indefinitely if ES is unreachable
)

You can see from the configuration that:

  • both main and action queues have a defined size in number of messages
  • both have number of threads that deliver messages to the next step. The action needs more because it has to wait for Elasticsearch to reply
  • moving of messages from the queues happens in batches. For the Elasticsearch output, the batch of messages is sent through the Bulk API, which makes queue.dequeuebatchsize effectively the bulk size

Results

We started with default Elasticsearch settings. Then we tuned them to leave rsyslog with a more significant slice of the CPU. We monitored the indexing rate with SPM for Elasticsearch. Here are the average results over 20 million indexed events:

  • with default Elasticsearch settings, we got 8,000 events per second
  • after setting Elasticsearch up more production-like (5 second refresh interval, increased index buffer size, translog thresholds, etc), and the throughput went up to average of 20,000 events per second
  • in the end, we went berserk and used in-memory indices, updated the mapping to disable any storing or indexing for any field, to have Elasticsearch do as little work as possible and make room for rsyslog. Got an average of 30,000 events per second. In this scenario, rsyslog was using between 1 and 1.5 of the 4 virtual CPU cores, with tcpflood using 0.5 and Elasticsearch using from 2 to 2.5

Conclusion

20K EPS on a low-end machine with production-like configuration means Elasticsearch is quick at indexing. This is very good for logs, where you typically have lots of messages being generated, compared to how often you search.

If you need some tool to ship your logs to Elasticsearch with minimum overhead, rsyslog version 8 may well be your best bet.

Related posts:

Changelog for 7.4.2 (v7-stable)

Version 7.4.2 [v7.4-stable] 2013-07-04

  • bugfix: in RFC5425 TLS, multiple wildcards in auth could cause segfault
  • bugfix: RainerScript object required parameters were not properly checked – this clould result to segfaults on startup if parameters were missing.
  • bugfix: double-free in omelasticsearch closes: http://bugzilla.adiscon.com/show_bug.cgi?id=461 a security advisory for this bug is available at: http://www.lsexperts.de/advisories/lse-2013-07-03.txt PLEASE NOTE: This issue only existed if omelasticsearch was used in a non-default configuration, where the “errorfile” parameter was specified. Without that parameter set, the bug could not be triggered. Thanks to Markus Vervier and Marius Ionescu for providing a detailled bug report. Special thanks to Markus for coordinating his security advisory with us.
  • bugfix: omrelp potential segfault at startup on invalid config parameters
  • bugfix: small memory leak when $uptime property was used
  • bugfix: potential segfault on rsyslog termination in imudp closes: http://bugzilla.adiscon.com/show_bug.cgi?id=456
  • bugfix: lmsig_gt abort on invalid configuration parameters closes: http://bugzilla.adiscon.com/show_bug.cgi?id=448 Thanks to Risto Laanoja for the patch.
  • imtcp: fix typo in “listner” parameter, which is “listener” Currently, both names are accepted.
  • solved build problems on FreeBSD closes: http://bugzilla.adiscon.com/show_bug.cgi?id=457 closes: http://bugzilla.adiscon.com/show_bug.cgi?id=458 Thanks to Christiano for reproting and suggesting patches
  • solved build problems on CENTOS5

rsyslog 7.4.2 (v7-stable) released

This is a maintenance release, consisting primarily of bug fixes. It also provides a fix for a potential security issue in omelasticsearch. Please note that the security issue only exists in non-default configuration if the “errorfile” parameter was specified.

ChangeLog:

http://www.rsyslog.com/changelog-for-7-4-2-v7-stable/

Download:

http://www.rsyslog.com/rsyslog-7-4-2-v7-stable/

As always, feedback is appreciated.

Best regards,
Florian Riedl

Changelog for 6.5.1 (v6-beta)

Version 6.5.1 [BETA] 2012-10-11

  • added tool “logctl” to handle lumberjack logs in MongoDB
  • imfile ported to new v6 config interface
  • imfile now supports config parameter for maximum number of submits
    which is a fine-tuning parameter in regard to input baching
  • added pure JSON output plugin parameter passing mode
  • ommongodb now supports templates
  • bugfix: imtcp could abort on exit due to invalid free()
  • bugfix: remove invalid socket option call from imuxsock
    Thanks to Cristian Ionescu-Idbohrn and Jonny Törnbom
  • bugfix: missing support for escape sequences in RainerScript
    only \’ was supported. Now the usual set is supported. Note that v5
    used \x as escape where x was any character (e.g. “\n” meant “n” and NOT
    LF). This also means there is some incompatibility to v5 for well-know
    sequences. Better break it now than later.
  • bugfix: small memory leaks in template() statements
    these were one-time memory leaks during startup, so they did NOT grow
    during runtime
  • bugfix: config validation run did not always return correct return state
  • bugfix: config errors did not always cause statement to fail
    This could lead to startup with invalid parameters.

rsyslog 6.5.1 (v6-beta) released

This is the new v6-beta, which includes the full v6-subset of the new config language as well as somewhat improved support for lumberjack/CEE. This version concludes development efforts for v6.

Note that it is recommended to use v7 if you do not have any special need for v6.

ChangeLog:

http://www.rsyslog.com/changelog-for-6-5-1-v6-beta/

Download:

http://www.rsyslog.com/rsyslog-6-5-1-beta/

As always, feedback is appreciated.

Best regards,
Tim Eifler

Changelog for 7.1.5 (v7-devel)

Version 7.1.5 [devel] 2012-09-26

  • implemented RainerScript prifield() function
  • implemented RainerScript field() function
  • added new module imkmsg to process structured kernel log
    Thanks to Milan Bartos for contributing this module
  • implemented basic RainerScript optimizer, which will speed up script
    operations
  • bugfix: invalid free if function re_match() was incorrectly used
    if the config file parser detected that param 2 was not constant, some
    data fields were not initialized. The destructor did not care about that.
    This bug happened only if rsyslog startup was unclean.

rsyslog 7.1.5 (v7-devel) released

The script engine has been enhanced with new functions and even better performance. There is also a new plugin (imkmsg), which offers support for structured kernel logs. Special thanks to Milan Bartos, who contributed this module. The 7.1.5 version completes the core of enhancements to support lumberjack structured logging (but of course there will be more enhancements in the future).
RPM files for 7.1.5 can be found at http://www.rsyslog.com/rhelcentos-rpms/

ChangeLog:

http://www.rsyslog.com/changelog-for-7-1-5-v7-devel/

Download:

http://www.rsyslog.com/rsyslog-7-1-5-v7-devel/

As always, feedback is appreciated.

Best regards,
Tim Eifler

rsyslog 6.3.3 (devel) released

This is a very important milestone release. It features the new config parser and thus provides the basis for a more intuitive config format. With 6.3.3 there are already some enhancements to the format. However, more changes will come up with the next minor releases. For details, please check this link:

http://www.rsyslog.com/rsyslog-6-3-3-config-format-improvements/

It is worth noting that the performance of script-based filters (“if … then”) has notable been improved. Preliminary benchmarks show an improvement of at least a factor of three (more detailed benchmarks will be done after the new scoped object statements have been introduced).

We would appreciate early adoption of this release. One goal in releasing it is to see if the new parser actually is able to handle all legacy configurations found in practice (note that the parser was written from scratch).

ChangeLog:

http://www.rsyslog.com/changelog-for-6-3-3-v6-devel/

Download:

http://www.rsyslog.com/rsyslog-6-3-3-devel/

As always, feedback is appreciated.

Best regards,
Tom Bergfeld

Scroll to top