utf8

Adding the BOM to a message

In some environments where no regular character sets are used, it comes to problems with encoding and decoding messages in the right format. There is a special case, with japanese characters not correctly being decoded in a hybrid environment. The case is the following:

We have a linux machine with rsyslog, that is sending messages via syslog to a Windows machine running WinSyslog. The problem is, that though the messages were encoded as UTF8, WinSyslog decoded them in a different format. The result was, that messages were unreadable.

The solution is a bit tricky though. At least for beginners. We need to add the BOM (Byte Order Mark) to the messages that are being sent. The BOM will tell the software that is receiving the messages, that the format is UTF8. Thus the receiver will decode the message correctly and it is readable again.

To achieve this, we need the following for our example from above:

  • The language of the linux operating system
  • rsyslog (v5.7.10 beta or later) – this is the first version where the $BOM system directive can be used
  • WinSyslog (10.2a or later) – in this version, the decoding in conjuction with the BOM has been introduced
  • alternatively to WinSyslog, MonitorWare Agent (7.2a or later) can be used

Part 1: Configuring rsyslog

We need to configure rsyslog to insert the BOM into a message. In our example, we will keep this very simple, since we only want to forward messages to a different syslog server. The configuration should look like this:

$ModLoad immark.so
$ModLoad imuxsock.so
$ModLoad imklog.so

$template mytemplate,"<%PRI%>%TIMESTAMP:::date-rfc3339%%HOSTNAME% %SYSLOGTAG:1:32%%msg:::sp-if-no-1st-sp%%BOM%%msg%"
$ActionForwardDefaultTemplate mytemplate
*.* @x.x.x.x:514

The $ModLoad directive loads the modules. Therefore it is at the top. The modules loaded here are the basic modules needed for local logging. Of course you can set different modules, too.

With $template we will define the format of the message that we will be sending. Here “mytemplate” is the name of the template. The rest after the comma is the format for default syslog forwarding. Only difference is the %$BOM% that is used right before the message. It works as a identifier for the receiver for the encoding format. That is the most important part. Please note, that the template shown here is in one line. A linebreak is only shown due to website limits.

Since we do nothing else than forwarding here, we use $ActionForwardDefaultTemplate to make our template default for every forwarding action we might use. The directive has to be followed by the template name of course.

Finally, we have our action. This tells rsyslog to forward all messages via UDP to our central syslog server. Instead of x you need to use the IP of course. The port is 514.

You might have a different configuration as basis and might adapt things. Instead of using the template as default for all forwarding rules, you could instead add a semicolon after the port in the action and add the template name here. Then only this specific action will use the template.

Part 2: Important configuration part in WinSyslog

Basically, you can use any WinSyslog configuration. The only thing you should change in any case isthe output encoding format in the actions you use. In all output actions, you can define the Output Encoding Format. You must use “Unicode (UTF8)” here.

We will show some examples of the most commonly used output actions:

using_bom_01

Img 1: Here we see the Write to File action. This action will simply write the log messages into a file.

using_bom_02

Img 2: This shows the Forward via Syslog action.

using_bom_03

Img 3: Here, the Write to Database action is shown.

As shown in the screenshots, the output encoding can be set for all actions. This is mandatory and can be set for all output actions where this is necessary.

Conclusion:

We need to keep in mind, that with certain character sets, problems could occur when encoding and decoding in UTF8. Usually, the problems occur when decoding the message, because the receiver can not really identify the message as UTF8. In our case, the encoding detection would have gone via the Windows API and had as result SHIFT_JIS, which is totally wrong. The result were messages that unreadable. That is, why we need to have BOM support in both sender and receiver.

Scroll to top