The rocket-fast system for log processing

Log normalization and the leading space

Log normalization is simple, but has its quirks. A common pitfall is syslog message format as induced by RFC3164. Let’s look at a common case:  A log message has been sent to rsyslog. The message itself had no irregular characters. But, the message that should have been parsed by mmnormalize now has a leading space character. Basically, the message that should be parsed looks like this:

This is a test

Usually, one would think, that a simple parser can be used here. You might be correct, but there is a small caveat about this. The rulebase entry we currently have looks something like this:

rule=:%word1:word% %word2:word% %word3:word% %word4%

But strangely, rsyslog responds the following:

mmnormalize generated: {“originalmsg”: ” This is a test″, “unparsed-data”: ” This is a test″}

How comes, that rsyslog cannot parse the message? Why is there a leading space character in from of the message? The answer is, that messages are processed as RFC3164. In this RFC it is defined, that everything after the ":" of the syslog header is to be considered as the message. Thus, the message has a leading space now.

How is this to be solved? Simply insert the space to your rules in the rulebase. This will lead to a rule like this:

rule=: %word1:word% %word2:word% %word3:word% %word4%

Please note, that there has just the space character been added. Further, this is really only a example. The rule will fit to all messages that are 4 words long, so it is really not very suitable to be adopted to your configuration.

One thought on “Log normalization and the leading space

  1. Hi,

    I have a central rsyslog server, and rsyslog clients that ship their logs to central rsyslog. rsyslog clients on servers are v5, on central rssylog is v7. central rsyslog sends incoming logs of clients to elasticsearch and also ship local logs of central server. On clients, I’m using imfile modul to read apache logs and also use imfile on central rsyslog server to ship his apache logs to elasticsearch. The problem is that apache logs that are coming from clients have a space in msg part so normalize rule for those logs is:
    rule=: %client_ip:word% %rlogname:word% %ruser:word% [%apache_date:word% %tz:char-to:]%] "%method:word% %url:word% %pver:char-to:"%" %status:word% %bytesend:word% "%referrer:char-to:"%" "%useragent:char-to:"%"
    And normalize rule for local apache logs is:
    rule=:%client_ip:word% %rlogname:word% %ruser:word% [%apache_date:word% %tz:char-to:]%] "%method:word% %url:word% %pver:char-to:"%" %status:word% %bytesend:word% "%referrer:char-to:"%" "%useragent:char-to:"%"

    The only difference between the rules is that the one that normalize incoming apache logs from the clients has one space at first, and the one that normalize local apache logs of central rsyslog server has no space.

    Here is template for incoming apache logs and the template for local apache logs. I had to use position.from=2 because of dobule space in msg of incoming logs. If I use the same template for local apache logs, the first space is cut of and the second character which is first number of ip adress of client:
    template(name="httpd-access_remote" type="list") {
    property(name="msg" position.from="2″)

    template(name="httpd-access_local" type="list") {

Comments are closed.