How to use set variable and exec_template
As of version 7.5.8, 8.1.4 and the soon to be released 7.6.0, there is a new function available. You can now fill variables with dynamic data through a template. The new function is called exec_template.
Basically, you will set a variable with either a static value or from a given property as described before. But there are cases, where this is not appropriate. For example, if you want to filter for some specific data from $msg, but cannot use the filter on the whole property because of possible multiple occurences. Please note, that the following is just a basic example and exec_template can be used for a large variety of other cases as well.
First, we need to define a template. In our case, we want to extract a specific field from the $msg property.
template(name="extract" type="string" string="%msg:F:5%")
If this template was used to write to a file, the output would resemble the fifth field from a TAB delimited log message. This could also be more specific to extract only specific characters from a field, delimited by other characters than TAB. Or you could even use complex regex functions in the template to extract the data you want.
Now we need to fill a custom variable. This is where exec_template will be used.
set $!xyz = exec_template("extract");
Instead of a already known value or property name, we call the function and give it the template that should be executed. This will result in the variable being filled with the content of the fifth field from %msg property. Now you can use this variable for filters or even other templates. Here are some examples:
if $!xyz contains 'abc' then {action()}
template(name="DynaFile" type="string" string="/var/log/%$!xyz%-data/%timereported%-%$!xyz%.log")
As you can see there are many useful possibilities and using exec_template is rather easy.
If you have additional ideas for use-cases that should be described, please let us know.
Parsing JSON (CEE) Logs and Sending them to Elasticsearch
Original post: Structured Logging with rsyslog and Elasticsearch via @sematext
When your applications generate a lot of logs, you’d probably want to make some sense of them through searches and statistics. Here’s when structured logging comes in handy, and I would like to share some thoughts and configuration examples of how you could use a popular syslog daemon like rsyslog to handle both structured and unstructured logs. Then I’ll show you how to:
- take a JSON from a syslog message and index it in Elasticsearch (which eats JSON documents)
- append other syslog properties (like the date) to the existing JSON to make a bigger JSON document that would be indexed in Elasticsearch. This is how we set up rsyslog to handle CEE-formatted messages in our log analytics tool, Logsene
On structured logging
If we take an unstructured log message, like:
Joe bought 2 apples
And compare it with a similar one in JSON, like:
{“name”: “Joe”, “action”: “bought”, “item”: “apples”, “quantity”: 2}
We can immediately spot a good and a bad point of structured logging: if we index these logs, it will be faster and more precise to search for “apples” in the “item” field, rather than in the whole document. At the same time, the structured log will take up more space than the unstructured one.
But in most use-cases there will be more applications that would log the same subset of fields. So if you want to search for the same user across those applications, it’s nice to be able to pinpoint the “name” field everywhere. And when you add statistics, like who’s the user buying most of our apples, that’s when structured logging really becomes useful.
Finally, it helps to have a structure when it comes to maintenance. If a new version of the application adds a new field, and your log becomes:
Joe bought 2 red apples
it might break some log-parsing, while structured logs rarely suffer from the same problem.
Enter CEE and Lumberjack: structured logging with syslog
With syslog, as defined by RFC3164, there is already a structure in the sense that there’s a priority value (severity*8 + facility), a header (timestamp and hostname) and a message. But this usually isn’t the structure we’re looking for.
CEE and Lumberjack are efforts to introduce structured logging to syslog in a backwards-compatible way. The process is quite simple: in the message part of the log, one would start with a cookie string “@cee:”, followed by an optional space and then a JSON or XML. From this point on I will talk about JSON, since it’s the format that both rsyslog and Elasticsearch prefer. Here’s a sample CEE-enhanced syslog message:
@cee: {“foo”: “bar”}
This makes it quite easy to use CEE-enhanced syslog with existing syslog libraries, although there are specific libraries like liblumberlog, which make it even easier. They’ve also defined a list of standard fields, and applications should use those fields where they’re applicable – so that you get the same field names for all applications. But the schema is free, so you can add custom fields at will.
CEE-enhanced syslog with rsyslog
rsyslog has a module named mmjsonparse for handling CEE-enhanced syslog messages. It checks for the “CEE cookie” at the beginning of the message, and then tries to parse the following JSON. If all is well, the fields from that JSON are loaded and you can then use them in templates to extract whatever information seems important. Fields from your JSON can be accessed like this: $!field-name. An example of how they can be used is shown here.
To get started, you need to have at least rsyslog version 6.6.0, and I’d recommend using version 7 or higher. If you don’t already have that, check out the repositories for RHEL/CentOS and Ubuntu.
Also, mmjsonparse is not enabled by default. If you use the repositories, install the rsyslog-mmjsonparse package. If you compile rsyslog from sources, specify –enable-mmjsonparse when you run the configure script. In order for that to work you’d probably have to install libjson and liblognorm first, depending on your operating system.
For a proof of concept, we can take this config:
#load needed modules module(load="imuxsock") # provides support for local system logging module(load="imklog") # provides kernel logging support module(load="mmjsonparse") #for parsing CEE-enhanced syslog messages #try to parse structured logs *.* :mmjsonparse: #define a template to print field "foo" template(name="justFoo" type="list") { property(name="$!foo") constant(value="\n") #we'll separate logs with a newline } #and now let's write the contents of field "foo" in a file *.* action(type="omfile" template="justFoo" file="/var/log/foo")
To see things, better, you can start rsyslog in foreground and in debug mode:
rsyslogd -dn
And in another terminal, you can send a structured log, then see the value in your file:
# logger ‘@cee: {“foo”:”bar”}’
# cat /var/log/foo
bar
If we send an unstructured log, or an invalid JSON, nothing will be added
# logger ‘test’
# logger ‘@cee: test2’
# cat /var/log/foo
bar
But you can see in the debug output of rsyslog why:
mmjsonparse: no JSON cookie: ‘test’
[…]
mmjsonparse: toParse: ‘ test2’
mmjsonparse: Error parsing JSON ‘ test2’: boolean expected
Indexing logs in Elasticsearch
To index our logs in Elasticsearch, we will use an output module of rsyslog called omelasticsearch. Like mmjsonparse, it’s not compiled by default, so you will have to add the –enable-elasticsearch parameter to the configure script to get it built when you run make. If you use the repositories, you can simply install the rsyslog-elasticsearch package.
omelasticsearch expects a valid JSON from your template, to send it via HTTP to Elasticsearch. You can select individual fields, like we did in the previous scenario, but you can also select the JSON part of the message via the $!all-json property. That would produce the message part of the log, without the “CEE cookie”.
The configuration below should be good for inserting the syslog message to an Elasticsearch instance running on localhost:9200, under the index “system” and type “events“.
#load needed modules module(load="imuxsock") # provides support for local system logging module(load="imklog") # provides kernel logging support module(load="mmjsonparse") #for parsing CEE-enhanced syslog messages module(load="omelasticsearch") #for indexing to Elasticsearch #try to parse a structured log *.* :mmjsonparse: #define a template to print all fields of the message template(name="messageToES" type="list") { property(name="$!all-json") } #write the JSON message to the local ES node *.* action(type="omelasticsearch" template="messageToES")
After restarting rsyslog, you can see your JSON will be indexed:
# logger ‘@cee: {“foo”: “bar”, “foo2”: “bar2″}’
# curl -XPOST localhost:9200/system/events/_search?q=foo2:bar2 2>/dev/null | sed s/.*_source//
” : { “foo”: “bar”, “foo2”: “bar2” }}]}}
As for unstructured logs, $!all-json will produce a JSON with a field named “msg”, having the message as a value:
# logger test
# curl -XPOST localhost:9200/system/events/_search?q=test 2>/dev/null | sed s/.*_source//
” : { “msg”: “test” }}]}}
It’s “msg” because that’s rsyslog’s property name for the syslog message.
Including other properties
But the message isn’t the only interesting property. I would assume most would want to index other information, like the timestamp, severity, or host which generated that message.
To do that, one needs to play with templates and properties. In the future it might be made easier, but at the time of this writing (rsyslog 7.2.3), you need to manually craft a valid JSON to pass it to omelasticsearch. For example, if we want to add the timestamp and the syslogtag, a working template might look like this:
template(name="customTemplate" type="list") { #- open the curly brackets, #- add the timestamp field surrounded with quotes #- add the colon which separates field from value #- open the quotes for the timestamp itself constant(value="{\"timestamp\":\"") #- add the timestamp from the log, # format it in RFC-3339, so that ES detects it by default property(name="timereported" dateFormat="rfc3339") #- close the quotes for timestamp, #- add a comma, then the syslogtag field in the same manner constant(value="\",\"syslogtag\":\"") #- now the syslogtag field itself # and format="json" will ensure special characters # are escaped so they won't break our JSON property(name="syslogtag" format="json") #- close the quotes for syslogtag #- add a comma #- then add our JSON-formatted syslog message, # but start from the 2nd position to omit the left # curly bracket constant(value="\",") property(name="$!all-json" position.from="2") }
Summary
If you’re interested in searching or analyzing lots of logs, structured logging might help. And you can do it with the existing syslog libraries, via CEE-enhanced syslog. If you use a newer version of rsyslog, you can parse these logs with mmjsonparse and index them in Elasticsearch with omelasticsearch. If you are interested in indexing/searching logs in general, check out other Sematext logging posts or follow @sematext.
rsyslog 8.1.4 (v8-devel) released
We have just released 8.1.4 of the v8-devel branch.
This release adds the capability to set a variable based on a template. It also adds KEEPALIVE support to the RELP modules (which now require librelp 1.2.2+ for this reason). It also adds some usability enhancements and bug fixes.
Note that this release can be considered a “normal” devel version, with moderate risk associated to it. Experience in the past weeks suggest so. Special thanks to Pavel Levshin for all his work and tests. Still note that there is a considerably higher risk running the devel version than the stable one.
ChangeLog:
http://www.rsyslog.com/changelog-for-8-1-4-v8-devel/
Download:
http://www.rsyslog.com/rsyslog-8-1-4-v8-devel/
Feedback is *very much* appreciated.
Best regards,
Florian Riedl
Changelog for 8.1.4 (v8-devel)
Version 8.1.4 [devel] 2014-01-10
- add exec_template() RainerScript function
- imrelp: support for TCP KEEPALIVE added
- bumped librelp dependency to 1.2.2 to support new KEEPALIVE feature
- Add directives for numerically specifying GIDs/UIDs
The already present directives (FileOwner, FileGroup, DirOwner, DirGroup) translate names to numerical IDs, which depends on the user information being available during rsyslog’s startup. This can fail if the information is obtained over a network or from a service such as SSSD. The new directives provide a way to specify the numerical IDs directly and bypass the lookup.
Thanks to Tomas Heinrich for the patch. - bugfix: action commitTransaction() processing did not properly handle suspended actions
- bugfix: omelasticsearch fail.es stats counter was improperly maitained
rsyslog 7.5.8 (v7-devel) released
Users are suggested to try out this versions and report errors. If we don’t learn of serious problems, we will most probably release 7.6.0 next week based on this code here.
More detailed information is available in the changelog.
ChangeLog:
http://www.rsyslog.com/changelog-for-7-5-8-v7-devel/
Download:
http://www.rsyslog.com/rsyslog-7-5-8-v7-devel/
As always, feedback is appreciated.
Best regards,
Florian Riedl
rsyslog 7.5.8 (v7-devel)
Download file name: rsyslog 7.5.8 (devel)
rsyslog 7.5.8 (devel)
sha256 hash: 38a8b5be53feaae9225e46f2c93d7c
Author: Rainer Gerhards (rgerhards@adiscon.com)
Version: 7.5.8 File
size: 2.595 MB
Changelog for 7.5.8 (v7-devel)
Version 7.5.8 [v7-release candidate] 2014-01-09
- add exec_template() RainerScript function
- add debug.onShutdown and debug.logFile global paramters
These enebale the new “debug on shutdown” mode, which can be used to track hard to find problems that occur during system shutdown. - Add directives for numerically specifying GIDs/UIDs
The already present directives (FileOwner, FileGroup, DirOwner, DirGroup) translate names to numerical IDs, which depends on the user information being available during rsyslog’s startup. This can fail if the information is obtained over a network or from a service such as SSSD. The new directives provide a way to specify the numerical IDs directly and bypass the lookup.
Thanks to Tomas Heinrich for the patch. - actions now report if they suspend and resume themselves
this is by default on and controllable by the action.reportSuspension global parameter - bugfix: omelasticsearch fail.es stats counter was improperly maintained
- bugfix: mmrfc5424addhmac: “key” parameter was not properly processed
- add new impstats action counters:
- * suspended
- * suspended.duration
- * resumed
rsyslog 7.4.8 (v7-stable) released
We have just released 7.4.8 of the v7-stable branch. This is a pure bug-fixing release.
More detailed information is available in the ChangeLog.
rsyslog 7.4.8 (v7-stable)
Download file name: rsyslog 7.4.8 (stable)
rsyslog 7.4.8 (stable)
sha256 hash: 0d140baad95fa5e4e7d76a9e955751
Author: Rainer Gerhards (rgerhards@adiscon.com)
Version: 7.4.8 File size: 2.92 MB
Changelog for 7.4.8 (v7-stable)
Version 7.4.8 [v7.4-stable] 2014-01-08
- rsgtutil provides better error messages on unfinished signature blocks
- bugfix: guard against control characters in internal (error) messages
Thanks to Ahto Truu for alerting us. - bugfix: immark did emit messages under kern.=info instead of syslog.=info
Note that his can potentially break exisiting configurations that rely on immark sending as kern.=info. Unfortunately, we cannot leave this unfixed as we never should emit messages under the kern facilit