The rocket-fast system for log processing

Log normalization for different formats

In this article we want to show you a very functional use-case. We want to use rsyslog in conjunction with log normalization to bring login events from several different log sources together into a readable format and get rid of all the useless information which we don’t need. The log sources will be windows, linux and snare. All of them create log messages in a very different and often hard to read format. After bringing the information into a format that suits us well, we will finally write the essence of the log messages into a file. For this example we will also show two different output formats.

What do we need?

We need several things to be installed. In brackets we will show the version we used for this guide.

These are the current versions at the date of this guide and need to be considered the minimum to be installed.

What do the logs look like?

The log messages come in different formats as already explained. We differentiate between 3 different formats.

1. Windows

The Adiscon products for Windows are able to send syslog messages directly in @cee/lumberjack format. That means, that the message consists of the default syslog header, which is followed by all the message properties being filled into the @cee representation format.

2. Linux

The log messages from Linux are in no specific format. Therefore we need to make some effort to get the data we want.

3. Snare

Snare logs are somewhat special again. They are somewhat like csv, but it uses tabs as a delimiter.


You should install libee, libestr and liblognorm before installing rsyslog. This is simply, because you won’t be able to install rsyslog the other way round. For rsyslog you need to know, that we will need several additional modules. These are:

  • imtcp
  • mmjsonparse
  • mmnormalize

You configure should then look simliar to this:

./configure --prefix=/usr --enable-imtcp --enable-mmjsonparse --enable-mmnormalize

This will be sufficient, since we do not want to use any more extra or special features.

Please note: There are many ways to distribute rsyslog. But, make sure that the platform you build rsyslog on is the same platform as where it should be used. You cannot build rsyslog on CentOS and use it on Ubuntu. The differences between the platforms are just to big. The same applies to different versions of the same platform. When building on a older platform and using it on a newer version, this may work, but with restrictions. Whereas building on a newer version and using it on a older version will probably not work at all. So, if you build rsyslog from Source and want to use it on another machine, make sure the platform is the same.

The Configuration

The configuration is quite extensive. Thus we will show it in several parts. You can download the complete configuration here.

1. Rulebase

Before we begin with the rsyslog configuration, we will set up our rulebase for later. The rulebase is needed for mmnormalize and is a separate file. Our rulebase consists of two parts. The first part will handle the linux login messages and the second part is for the login messages sent by Snare. For more information about how to create a rulebase, visit this link.

# SSH and Sudo logins
prefix=%rcvdat:date-rfc3164% %rcvdfrom:word%
rule=: sudo: pam_unix(sudo:session): session %type:word% for user root by %user:char-to:(%(uid=%-:number%)
rule=: sudo: pam_unix(sudo:session): session %type:word% for user %user:word%
rule=: sshd[%-:number%]: pam_unix(sshd:session): session %type:word% for user %user:word% by (uid=%-:number%)
rule=: sshd[%-:number%]: pam_unix(sshd:session): session %type:word% for user %user:word%

As you can see here, we have four rules. They are all lead by what is defined by the prefix. They represent the different log messages, but all variable parts that are needed for our final log will be put into properties. And in the case that some values might be different for each message, but not needed later, they will be parsed into a null property.

# Snare logins

The Snare format is way different than before. It is basically a tab delimited message format and rsyslog will by default replace all character codes into their ASCII values. Thus the message will look different and we have alle those character code replacements which are represented here as well. Most values from the Snare message are not needed and will be again filled into a null property. But the values we need, will be put into a real property, which we can use later.

2. Modules

In this first configuration step for the rsyslog configuration, we configure the modules that we will use.


For this example we have basically four modules. Imuxsock will keep local logging activity, though we don’t really need it now. Imtcp will take care of syslog reception, so we can get the log messages via syslog from the various machines. Mmjsonparse and mmnormalize are message modification modules. The first will serve to parse messages in json format, the latter will parse messages according to a rulebase.

3. Inputs

Since we have three log sources, we will use three receivers.

input(type="imtcp" port="13514" Ruleset="windows")
input(type="imtcp" port="13515" Ruleset="linux")
input(type="imtcp" port="13516" Ruleset="snare")

As you can see, all three receivers are working on a different port and are directly bound to a ruleset. That way, there is no need to sort the messages per sender later with filters. The rulesets will be different for each receiver, since there are different needs of processing.

4. Templates

Now come the two format templates for the output.

template(name="csv" type="list") {
        property(name="$!usr!rcvdat" format="csv")
        property(name="$!usr!rcvdfrom" format="csv")
        property(name="$!usr!user" format="csv")
        property(name="$!usr!type" format="csv")
template(name="cee" type="subtree" subtree="$!usr")

The first template is called csv. It will write our needed values in a comma-separated format. The second template is called cee. We can simply choose a subtree of values here which will automatically be put in @cee format. As you can see here, there is a huge difference concerning the effort needed for setting up the templates. In the end, the log messages look somewhat similar and will have the same values, but the format is different.

5. Rulesets

Now we get to the most interesting part. The rulesets. We will have three different rulesets which all serve their special purpose. As you will see later, we have 4 rulesets, indeed. But the forth ruleset holds the output and will be explained in step 6.

# Rulesets
ruleset(name="windows") {
       if $parsesuccess == "OK" then {
                if $!id == "4634" then
                        set $!usr!type = "logoff";
                else if $!id == "4624" then
                        set $!usr!type = "logon";
                set $!usr!rcvdfrom = $!source;
                set $!usr!rcvdat = $timereported;
                set $!usr!user = $!TargetUserName;
                call output

The first ruleset will handle the log messages sent from Windows. Since they will be already sent in @cee-format, there is not much need for processing. First a action is called for all messages that run into this ruleset. Basically, the message will be parsed for their properties and values. If the message is successfully parsed (which only happens if the format is correct) then we go on and fill some user-defined variables in a special subtree of values. As you can see, it is checked whether the field "id" has a certain number and filles the variable $!usr!type with logoff or logon, which makes the final file more readable. After that we set a variable for the source system, timestamp of the log message and the user name that either logged on or off. Finally, we call our fourth ruleset for writing to disk.

ruleset(name="linux") {
        if $parsesuccess == "FAIL" then
        #/* only try normalization if non-lumberjack */
        action(type="mmnormalize" rulebase="/etc/rulebase.rb" userawmsg="on")
        if $!user != "" then {
                if $!type == "opened" then
                        set $!usr!type = "logon";
                else if $!type == "closed" then
                        set $!usr!type = "logoff";
                set $!usr!rcvdfrom = $!rcvdfrom;
                set $!usr!rcvdat = $!rcvdat;
                set $!usr!user = $!user;
                call output

The second ruleset is for the linux log messages. Again we first check the log messages with mmjsonparse, but simply to verify that they are NOT in lumberjack format. If parsing the messages failed, which we want here, the log messages will be run through mmnormalize. This module uses the rulebase we created in the beginning and parses the messages according to this rulebase. If parsed successfully, processing continues. We check if the variable $!user isn’t empty. If that is the case, we set again our variables in the subtree and again we begin with the type of message (logon/logoff). Basically opened or closed would be sufficient enough for an experienced reader of logfiles, but we want to have all the log messages in the same format, thus this value will also be changed into logon and logoff. After that we alse set a variable for the source system, timestamp of the log message and the user name again. Finally, we call our fourth ruleset for writing to disk.

ruleset(name="snare") {
        if $parsesuccess == "FAIL" then
        #/* only try normalization if non-lumberjack */
        action(type="mmnormalize" rulebase="/etc/rulebase.rb" userawmsg="on")
        if $!user != "" then {
                if $!id == "4634" then
                        set $!usr!type = "logoff";
                else if $!id == "4624" then
                        set $!usr!type = "logon";
                set $!usr!rcvdfrom = $!rcvdfrom;
                set $!usr!rcvdat = $!rcvdat;
                set $!usr!user = $!user;
                call output

The third ruleset is for the log messages from Snare. As with the linux messages, we check the messages if they are non-lumberjack by using mmjsonparse. And again we will use mmnormalize and the given rulebase to parse the messages. The rest is like for the linux messages. The only difference is, that we have basically Windows messages and need to use the Event ID to determine if the message represents a logon or a logoff. So you can consider this part as a combination from both rulesets. And of course finally, we call the ruleset for the output.

6. Output

As mentioned before, there is a fourth ruleset in the configuration, which just handles the output.

ruleset(name="outwriter") {
        action(type="omfile" file="/var/log/logfile.csv" template="csv")
        action(type="omfile" file="/var/log/logfile.cee" template="cee")

This final ruleset provides us with our two outputs. They are both writing to disk. The only difference is, that they are using different templates as per the templates we defined earlier. So one file will hold the data in csv format whilst the other file holds the data in @cee format.

Final Thoughts

We have now our rsyslog configuration and our rulebase. The log messages will be parsed for the specific data we want from the different login log messages according to their specific format. And here is what the result might look like.

First in csv-format:

"Feb  5 14:19:00","win7fr","fr-win","logon"
"Feb  5 14:19:00","win7fr","fr-win","logoff"
"Jan 16 09:28:33","linuxvm","fr-lin","logon"
"Jan 16 09:28:33","linuxvm","fr-lin","logoff"

And in @cee-format:

@cee: { "type": "logon", "rcvdfrom": "win7fr", "rcvdat": "Feb  5 14:19:00", "user": "fr-win" }
@cee: { "type": "logoff", "rcvdfrom": "win7fr", "rcvdat": "Feb  5 14:19:00", "user": "fr-win" }
@cee: { "type": "logon", "rcvdfrom": "linuxvm", "rcvdat": "Jan 16 09:28:33", "user": "fr-lin" }
@cee: { "type": "logoff", "rcvdfrom": "linuxvm", "rcvdat": "Jan 16 09:28:33", "user": "fr-lin" }

So we now have two different formats, holding basically the same information. As you can see, we have now only the really necessary information of those log messages, namely the type of log message, the source system, the timestamp of this occurence and the user associated to this event.