rsyslog

Why does the rsyslog testbench sometimes fail?

Rsyslog contains a set of automatted tests, the so-called “testbench”. It is invoked via the standard method of “make check” and “make distcheck”. Since its introduction in version 3, the testbench has been continously enhanced and extended. It now contains around 150 individual tests, which sum up to around 80 tests from the autoconf point of view (some autoconf tests run a couple of subtests, thus the difference in number). The testbench has been proven to be very useful and caught numerous problems before new code was released.

But the testbench is not perfect, and it may sometimes fail without any actual problem. There are two reasons for this. One is that the test require a very specific environment. For example, some parser based tests assume that the system the test is run on is configured to be named “localhost.localdomain” (the default for many test deployments). This needs to be the case because there currently is no way in rsyslog to overwrite the local hostname. Some parser tests use malformed messages, in which case (as of the RFC), the local system name must be used. As such, we need to have a specific system name set in order to prove the results. In the long term, we’ll add the capability to overwrite system name inside rsyslog, but it does not make sense to create a dirty trick just for testbench use. So this needs to wait until we get to it as part of regular development. Note that a similar issues may exist at other places. An obvious one is the database tests, where we need pre-created users, databases, tables etc in order to run the tests.

The other issue is a bit more subtle. The syslog protocol is simply, without App-Layer acknowledgments. This makes it hard to know when rsyslog has received a while bunch of test data. That in turn makes it hard to definitely say when all test data has arrived and an instance can be shut down. So the whole process is a bit racy. To “solve” this, we use some wait periods in tests affected by this problem. However, longer wait periods mean longer test bench runtime and this reduces our development productivity. So we use wait time that usually does the job, but may fail under some circumstances (most notably when –enable-debug is set). This can affect a couple of TCP-based tests (like imtcp_conndrop.sh and similar ones). We have not yet a good idea what a clean solution to this problem is, where “clean” means that it a) always works and b) does no introduce unnecessary code complexity under non-testbench runs.

Given these problems, some care must be taken interpreting testbench results. Most importantly, a fail does not necessarily mean that things are actually broken. It merely means that one needs to look at the actual test and check a) why it fails and b) if it fails repeatedly. Especially the “racy” test tend to occasionally fail without any real problem. We’ve also seen them to fail consistenly on some platforms, simply because our timing assumptions are not valid there (Solaris was one example where we needed to adjust our overall wait periods).

So testbench results need to be taken with a grain of salt, and require interpretation. We know this is inconvenient for occasional users, but it is the best compromise we currently can offer.

Using the Text File Input Module

Log files should be processed by rsyslog. Here is some information on how the file monitor works. This will only describe setting up the Text File Input Module. Further configuration like processing rules or output methods will not be described.

Things to think about

The configuration given here should be placed on top of the rsyslog.conf file.

Config Statements

module(load="imfile" PollingInterval="10")
# needs to be done just once. PollingInterval is a module directive and is only set once when loading the module
# File 1
input(type="imfile" File="/path/to/file1" 
Tag="tag1" 
StateFile="/var/spool/rsyslog/statefile1" 
Severity="error" 
Facility="local7")
# File 2
input(type="imfile" File="/path/to/file2" 
Tag="tag2" 
StateFile="/var/spool/rsyslog/statefile2")
# ... and so on ...
#

How it works

The configuration for using the Text File Input Module is very extensive. At the beginning of your rsyslog configuration file, you always load the modules. There you need to load the module for Text File Input as well. Like all other modules, this has to be made just once. Please note that the directive PollingInterval is a module directive which needs to be set when loading the module.

module(load="imfile" PollingInterval="10")

Next up comes the input and its parameters. We configure a input of a certain type and then set the parameters to be used by this input. This is basically the same principle for all inputs:

# File 1
input(type="imfile" File="/path/to/file1" 
Tag="tag1" 
StateFile="/var/spool/rsyslog/statefile1" 
Severity="error" 
Facility="local7")

File specifies, the path and name of the text file that should be monitored. The file name must be absolute.

Tag will set a tag in front of each message pulled from the file. If you want a colon after the tag you must set it as well, it will not be added automatically.

StateFile will create a file where rsyslog keeps track of the position it currently is in a file. You only need to set the filename. This file always is created in the rsyslog working directory (configurable via $WorkDirectory). This file is important so rsyslog will not pull messages from the beginning of the file when being restarted.

Severity will give all log messages of a file the same severity. This is optional. By default all mesages will be set to “notice”.

Facility gives alle log messages of a file the same facility. Again, this is optional. By default all messages will be set to “local0”.

These statements are needed for monitoring a file. There are other statements described in the doc, which you might want to use. If you want to monitor another file the statements must be repeated.

Since the files cannot be monitored in genuine real time (which generates too much processing effort) you need to set a polling interval:

PollingInterval 10

This is a module setting and it defines the interval in which the log files will be polled. By default this value is set to 10 seconds. If you want this to get more near realtime, you can decrease the value, though this is not suggested due to increasing processing load. Setting this to 0 is supported, but not suggested. Rsyslog will continue reading the file as long as there are unprocessed messages in it. The interval only takes effect once rsyslog reaches the end of the file.

Important

The StateFile needs to be unique for every file that is monitored. If not, strange things could happen.

Changelog for 5.7.9 (v5-beta)

Version 5.7.9  [V5-BETA] (rgerhards), 2011-03-16

  • improved testbench
    among others, life tests for ommysql (against a test database) have been added, valgrind-based testing enhanced, …
  • enhance: fallback *at runtime* to epoll_create if epoll_create1 is not available. Thanks to Michael Biebl for analysis and patch!
  • bugfix: failover did not work correctly if repeated msg reduction was on
    closes: http://bugzilla.adiscon.com/show_bug.cgi?id=236
    affected directive was: $ActionExecOnlyWhenPreviousIsSuspended on
  • bugfix: minor memory leak in omlibdbi (< 1k per instance and run)
  • bugfix: (regression) omhdfs did no longer compile
  • bugfix: omlibdbi did not use password from rsyslog.conf
    closes: http://bugzilla.adiscon.com/show_bug.cgi?id=203

Changelog for 5.7.7 (v5-beta)

Version 5.7.7  [V5-BETA] (rgerhards), 2011-03-02

  • bugfix: potential abort condition when $RepeatedMsgReduction set to on as well as potentially in a number of other places where MsgDup() was used. This only happened when the imudp input module was used and it depended on name resolution not yet had taken place. In other words, this was a strange problem that could lead to hard to diagnose instability. So if you experience instability, chances are good that this fix will help.

Changelog for 5.7.6 (v5-beta)

Version 5.7.6  [V5-BETA] (rgerhards), 2011-02-25

  • bugfix: fixed a memory leak and potential abort condition this could happen if multiple rulesets were used and some output batches contained messages belonging to more than one ruleset.
    fixes: http://bugzilla.adiscon.com/show_bug.cgi?id=226
    fixes: http://bugzilla.adiscon.com/show_bug.cgi?id=218
  • bugfix: memory leak when $RepeatedMsgReduction on was used
    bug tracker: http://bugzilla.adiscon.com/show_bug.cgi?id=225
Scroll to top