Encasing of control structures
When using control structures to create some case filtering, there are some things to consider when formatting this.
In general, a control structure can be as simple as this:
if $msg contains "word" then action() else action()
While this is correct and will work as expected, it is generally better style to encase expressions and actions. That way, they are more clearly separated from the control structure itself for better overview and can be more easily expanded without error. When using multiple actions or expressions, encasing is needed anyway, so we think it’s good practice to always use it.
Example:
if ($msg contains "word") then { action() } else { action() }
And a more complex example:
if ($msg contains "word" and $source == "123.123.123.123") then { action() action() } else { action() action() }
All three sample snippets above are correct in their own way. Added complexity makes encasing necessary though. So, you may as well use it for all your filters to have a consistent and all-case proof configuration.
action.execOnlyWhenPreviousIsSuspended preciseness
The “action.execOnlyWhenPreviousIsSuspended” parameter permits to conditionally execute an action only if the previous one is suspended. When an action actually gets suspended depends on the output module implementing the action, but the general rule is that an action suspends itself if there is a problem doing it’s job (e.g. a target server is down). This setting is generally used to implement failover destinations. So let’s call actions which have this attribute set “failover actions”.
Note that many action implement transactions to improve performance. If so, usually the success or error state of the action is only available after the transaction has been comitted. At this point, the messages have already been submitted to the action and the action could not set suspension state — simply because at that point everything looked fine. So when the failover action was about to be executed, the suspended flag was not set and the action not executed. Depending on system configuration, this can lead to loss of some messages during the failover process. If this seems totally unacceptable to you, you need to make sure that the transaction is committed after each message — this permits the output to obtain the actual success state (note that with some technologies, even this is not totally possible – a prime example is plain TCP syslog protocol).
To ensure commit of each message, the queue’s maximum batch size needs to be reduced to one. This can be done via the “queue.dequeuebatchsize” parameter. Note that reducing the batch size can have severe performance impact. Unfortunately, it is impossible to to have both high performance and precise failover actions. The simple reason is that for precise failover actions we need to know the result of the commit operation, which makes it necessary to do immediate commits.
Lower Bound for Queue Sizes
The queue.size parameter permits to specify the maximum queue size in number of messages. While not technically enforced, there is a lower limit on this parameter. Setting it to very low values (roughly below 100 messages) is not supported and can lead to unpredictable results. Also, future version my automatically adjust to a safe lower bound and/or decide to fail queue startup in those cases. So if you use very low values, do so at your own risk.
Why does the rsyslog testbench sometimes fail?
Rsyslog contains a set of automatted tests, the so-called “testbench”. It is invoked via the standard method of “make check” and “make distcheck”. Since its introduction in version 3, the testbench has been continously enhanced and extended. It now contains around 150 individual tests, which sum up to around 80 tests from the autoconf point of view (some autoconf tests run a couple of subtests, thus the difference in number). The testbench has been proven to be very useful and caught numerous problems before new code was released.
But the testbench is not perfect, and it may sometimes fail without any actual problem. There are two reasons for this. One is that the test require a very specific environment. For example, some parser based tests assume that the system the test is run on is configured to be named “localhost.localdomain” (the default for many test deployments). This needs to be the case because there currently is no way in rsyslog to overwrite the local hostname. Some parser tests use malformed messages, in which case (as of the RFC), the local system name must be used. As such, we need to have a specific system name set in order to prove the results. In the long term, we’ll add the capability to overwrite system name inside rsyslog, but it does not make sense to create a dirty trick just for testbench use. So this needs to wait until we get to it as part of regular development. Note that a similar issues may exist at other places. An obvious one is the database tests, where we need pre-created users, databases, tables etc in order to run the tests.
The other issue is a bit more subtle. The syslog protocol is simply, without App-Layer acknowledgments. This makes it hard to know when rsyslog has received a while bunch of test data. That in turn makes it hard to definitely say when all test data has arrived and an instance can be shut down. So the whole process is a bit racy. To “solve” this, we use some wait periods in tests affected by this problem. However, longer wait periods mean longer test bench runtime and this reduces our development productivity. So we use wait time that usually does the job, but may fail under some circumstances (most notably when –enable-debug is set). This can affect a couple of TCP-based tests (like imtcp_conndrop.sh and similar ones). We have not yet a good idea what a clean solution to this problem is, where “clean” means that it a) always works and b) does no introduce unnecessary code complexity under non-testbench runs.
Given these problems, some care must be taken interpreting testbench results. Most importantly, a fail does not necessarily mean that things are actually broken. It merely means that one needs to look at the actual test and check a) why it fails and b) if it fails repeatedly. Especially the “racy” test tend to occasionally fail without any real problem. We’ve also seen them to fail consistenly on some platforms, simply because our timing assumptions are not valid there (Solaris was one example where we needed to adjust our overall wait periods).
So testbench results need to be taken with a grain of salt, and require interpretation. We know this is inconvenient for occasional users, but it is the best compromise we currently can offer.
How to use rate limiting in rsyslog?
This article is tested with rsyslog 5.7.1 on Fedora 13. It will not work with version of rsyslog prior to 5.7.1.
In rsyslog 5.7.1 we introduced rate limiting. This is a option for the Unix Socket Input module called imuxsock. In short, this option limits the amount of messages written into logfiles by a process, if the process tries to write huge amounts of messages in a short period of time.
If you look a bit more into the details, this new feature makes a lot of sense. Because when regular logging happens, this makes no difference. But if a process goes wild (and I bet everyone has experienced this at least once) and starts to create large bulks of log messages, this affects the whole system in general. This blocks not only your I/O, but creates a large amount of logs that usually is simply not necessary and doesn’t really hold useful information. Therefore such messages are now discarded automatically. We will describe later how you can customize these settings for this.
This article is aimed for those who want to learn a bit more about how the rate limit works or who are new to this topic. Therefore we will not only describe how to set this up or change settings, but we want to show you some ways of testing, how imuxsock works and where some caveats are lying.
Please go on to the following pages.
Second try to test rate limiting
Go back to How to build test-tools?
After building the test tool syslog caller, we can give the testing another try. The syslog caller tool will be one process that creates the designated amount of messages. Since it has a very large output, we need it to produce a lot of messages to see the proper effects. We leave the rsyslog config as it is now.
Before we start, delete the logfile and restart rsyslog.
rm /var/log/everything.log /etc/rc.d/init.d/rsyslog restart
Now we start the syslog caller tool with the following command:
/path/to/rsyslog/tests/syslog_caller -m 1000000
This will run for a little while and then bring you back onto the command prompt. With “-m 1000000” we determine how many messages will be generated. In this case we try 1 million messages, just to be on the safe side.
The entries in the logfile should look like this:
Oct 5 18:22:51 client kernel: imklog 5.7.1, log source = /proc/kmsg started. Oct 5 18:22:51 client rsyslogd: [origin software="rsyslogd" swVersion="5.7.1" x-pid="7193" x-info="http://www.rsyslog.com"] startOct 5 18:23:52 client syslog_caller[7200]: test message nbr 0, severity=6 Oct 5 18:23:52 client syslog_caller[7200]: test message nbr 0, severity=6 Oct 5 18:23:52 client syslog_caller[7200]: test message nbr 1, severity=6 Oct 5 18:23:52 client syslog_caller[7200]: test message nbr 2, severity=6 Oct 5 18:23:52 client syslog_caller[7200]: test message nbr 3, severity=6 . . . Oct 5 18:23:52 client syslog_caller[7200]: test message nbr 47, severity=6 Oct 5 18:23:52 client syslog_caller[7200]: test message nbr 48, severity=6 Oct 5 18:23:52 client syslog_caller[7200]: test message nbr 49, severity=6 Oct 5 18:23:52 client rsyslogd-2177: imuxsock begins to drop messages from pid 7200 due to rate-limiting Oct 5 18:23:57 client rsyslogd-2177: imuxsock lost 111673 messages from pid 7200 due to rate-limiting Oct 5 18:23:57 client syslog_caller[7200]: test message nbr 111723, severity=6 Oct 5 18:23:57 client syslog_caller[7200]: test message nbr 111724, severity=6 Oct 5 18:23:57 client syslog_caller[7200]: test message nbr 111725, severity=6 Oct 5 18:23:57 client syslog_caller[7200]: test message nbr 111726, severity=6 Oct 5 18:23:57 client syslog_caller[7200]: test message nbr 111727, severity=6
The first two messages that we have in the logfile are the startup messages. Right after that, the messages from syslog_caller appear. In the brackets behind the process name we find again the PID. If you compare the PID to the other log messages, you will find that is always the same. Syslog_caller is definately suitable for the test. In the shown log above, i skipped some messages, but the essence is visible. Take a look at message 49 (which is message 50 obviously). Right after that, you find a message logged from rsyslog directly. It states that imuxsock starts dropping messages from PID 7200, in this case the syslog_caller. The next message is again from rsyslog and states the amount of messages that have been lost due to rate-limiting. After this, messages are logged again, at least until rate limiting intervenes again.
This is how we can test the rate limiting properly.
How to build the testing tools?
This article has been tested with rsyslog 5.7.1 on Fedora 13. It is part of the article “How to use rate limiting?”
Go back to First try to test rate limiting (fail)
When building a configuration for rsyslog, you will sometimes stumble upon the question, if your setup really works. To prove this in your environment is very complicated in most cases. For this we have made some test tools, which help you test your configuration. The main reason for us to create these tools was for testing new features and try out scenarios. But in the end we deliver them with every release. We will describe the way how to create and use the testing tools with the example of the tool syslog_caller. This tool creates a bunch of messages for the local loghost.
First you need to navigate to your rsyslog folder. From there you need to go to the subfolder tests. The path could look like this:
/home/username/rsyslog/tests/
We need to “make” the tests first before we can use it.
# make syslog_caller
The make process gives you some output. It looks like this:
CC syslog_caller.or CCLD syslog_caller
After that, we are ready to use the test-tool. Use the following command:
./syslog_caller
This will generate 500 messages by default. You can change the default values with some switches. The switches are usually documented in the .c file of the tool. In this case, when using the following:
./syslog_caller -m 10000
will generate 10000 messages. With the test tools, you can do some stress-testing for your rsyslog installation. A good way to proof, if your configuration is working correctly.
Periodic statistics on rsyslog counters
Since rsyslog 5.7.0 (V5-Devel) there is a new module called impstats.
This module provides periodic output of rsyslog internal counters. Note that the whole statistics system is currently under development. So availabilty and format of counters may change and is not yet stable (so be prepared to change your trending scripts when you upgrade to a newer rsyslog version).
The set of available counters will be output as a set of syslog messages. This output is periodic, with the interval being configurable (default is 5 minutes). Be sure that your configuration records the counter messages (default is syslog.info).
Note that loading this module has impact on rsyslog performance. Depending on settings, this impact may be severe (for high-load environments).
Please find below some simple steps how to use that module.
Of course you have to name it additionally in the configure. The configure should look like this (please note that the parameters can be different to your configure, the important thing is ––enable-impstats; this example is for Fedora 13 )
./configure --enable-impstats --sbindir=/sbin --libdir=/lib
The next steps are make and make install
make
make install
Furthermore we have to add that module in the rsyslog config. Below you will find a short excerpt of a sample config.
#### Modules #### $ModLoad imuxsock $ModLoad imklog #$ModLoad immark $ModLoad impstats $PStatsInterval 600 $PStatsSeverity 7 syslog.debug /var/log/rsyslog-stats
$ModLoad impstats – tells rsyslog to load the module impstats
$PStatsInterval 600 – is a configuration directive of impstats
$PStatsSeverity 7 – is also a configuration directive of impstats
Here you will find all information about the impstats module and its configuration directives.
If we have a look at the mentioned output file rsyslog-stats we can see the results of the static module.
The content of that file should look like this
Sep 17 11:43:49 localhost rsyslogd-pstats: umxsock: submitted=16 Sep 17 11:43:49 localhost rsyslogd-pstats: main Q: size=1 enqueued=2403 full=0 maxqsize=2
At the actual point all objects are shown in the results. Every Object has its own counter like
umxsock: (= object) submitted=16 (=counter; 16 logs received by the object umxsock). The main queue of rsyslog is also shown -> main Q with the parameters size (messages in the queue), enqueued (all received messages), full (how often was the queue full) and maxqsize (the maximal amount of messages in the queue).
Please be sure that the features and options of that module will be develeloped soon.
All information about the periodic statistics module you will find in the documentation. There is also a list of rsyslog impstats counters available on the web site.
Preserving syslog sender over NAT
Question:
I have a number of syslog clients behind a NAT device. The receiver receives syslog messages that travelled over the NAT device. This leads the receiver to believe that all messages originated from the same IP address. With stock syslogd, I can not differentiate between the senders. Is there any way to record the correct sender of the message with rsyslog?
Answer:
OK, I’ve now had some real lab time. The good news in short: if you use rsyslog both on the senders as well as on the receiver, you do NOT have any problems with NAT.
To double-check (and out of curiosity), I also tried with stock syslogd. I used the ones that came with RedHat and FreeBSD. Neither of them reports the sending machine correctly, they all report the NAT address. Obviously, this is what made this thread appear, but it is a good verification for the correctness of my lab. Next, I tried rsyslogd on the sender and stock syslogd on the receiver (just RedHat this time). The machine was still incorrectly displayed as the NAT address. However, now the real machine name immediately followed the NAT address, so you could differentiate the different machines – but in a inconsistent way.
Finally, I tried to run the stock syslogds against rsyslogd. Again, the host was not properly displayed. Actually, this time the host was not displayed at all (with the default rsyslogd template). Instead, the tag showed up in the host field. So this configuration is basically unusable.
The root cause of the NAT issue with stock syslogd obviously is that it does NOT include the HOST header that should be sent as of RFC 3164. This requires the receiver to take the host from the socket, which – in a NATed environment – can only hold the mangled NAT address. Rsyslog instead includes the HOST header, so the actual host name can be taken from that (this is the way rsyslog works with the default templates).
I barely remember seeing this in code when I initially forked rsyslog from sysklogd. I have not verified it once again. I have also not tested with syslog-ng, simply because that is not my prime focus and a lab would have required too much time.
To make a long story short: If you use rsyslog on both the senders and receivers, NAT is no issue for you.
Does rsyslog support RFC3195/COOKED?
Rsyslog support RFC3195 RAW and COOKED. The COOKED profile does not support PATH elements, but there is also currently no sender known to use them.