Connecting with Logstash via Apache Kafka
Original post: Recipe: rsyslog + Kafka + Logstash by @Sematext
This recipe is similar to the previous rsyslog + Redis + Logstash one, except that we’ll use Kafka as a central buffer and connecting point instead of Redis. You’ll have more of the same advantages:
- rsyslog is light and crazy-fast, including when you want it to tail files and parse unstructured data (see the Apache logs + rsyslog + Elasticsearch recipe)
- Kafka is awesome at buffering things
- Logstash can transform your logs and connect them to N destinations with unmatched ease
There are a couple of differences to the Redis recipe, though:
- rsyslog already has Kafka output packages, so it’s easier to set up
- Kafka has a different set of features than Redis (trying to avoid flame wars here) when it comes to queues and scaling
As with the other recipes, I’ll show you how to install and configure the needed components. The end result would be that local syslog (and tailed files, if you want to tail them) will end up in Elasticsearch, or a logging SaaS like Logsene (which exposes the Elasticsearch API for both indexing and searching). Of course you can choose to change your rsyslog configuration to parse logs as well (as we’ve shown before), and change Logstash to do other things (like adding GeoIP info).
Getting the ingredients
First of all, you’ll probably need to update rsyslog. Most distros come with ancient versions and don’t have the plugins you need. From the official packages you can install:
- rsyslog. This will update the base package, including the file-tailing module
- rsyslog-kafka. This will get you the Kafka output module
If you don’t have Kafka already, you can set it up by downloading the binary tar. And then you can follow the quickstart guide. Basically you’ll have to start Zookeeper first (assuming you don’t have one already that you’d want to re-use):
bin/zookeeper-server-start.sh config/zookeeper.properties
And then start Kafka itself and create a simple 1-partition topic that we’ll use for pushing logs from rsyslog to Logstash. Let’s call it rsyslog_logstash:
bin/kafka-server-start.sh config/server.properties bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic rsyslog_logstash
Finally, you’ll have Logstash. At the time of writing this, we have a beta of 2.0, which comes with lots of improvements (including huge performance gains of the GeoIP filter I touched on earlier). After downloading and unpacking, you can start it via:
bin/logstash -f logstash.conf
Though you also have packages, in which case you’d put the configuration file in /etc/logstash/conf.d/ and start it with the init script.
Configuring rsyslog
With rsyslog, you’d need to load the needed modules first:
module(load="imuxsock") # will listen to your local syslog module(load="imfile") # if you want to tail files module(load="omkafka") # lets you send to Kafka
If you want to tail files, you’d have to add definitions for each group of files like this:
input(type="imfile" File="/opt/logs/example*.log" Tag="examplelogs" )
Then you’d need a template that will build JSON documents out of your logs. You would publish these JSON’s to Kafka and consume them with Logstash. Here’s one that works well for plain syslog and tailed files that aren’t parsed via mmnormalize:
template(name="json_lines" type="list" option.json="on") {
constant(value="{")
constant(value="\"timestamp\":\"")
property(name="timereported" dateFormat="rfc3339")
constant(value="\",\"message\":\"")
property(name="msg")
constant(value="\",\"host\":\"")
property(name="hostname")
constant(value="\",\"severity\":\"")
property(name="syslogseverity-text")
constant(value="\",\"facility\":\"")
property(name="syslogfacility-text")
constant(value="\",\"syslog-tag\":\"")
property(name="syslogtag")
constant(value="\"}")
}
By default, rsyslog has a memory queue of 10K messages and has a single thread that works with batches of up to 16 messages (you can find all queue parameters here). You may want to change:
– the batch size, which also controls the maximum number of messages to be sent to Kafka at once
– the number of threads, which would parallelize sending to Kafka as well
– the size of the queue and its nature: in-memory(default), disk or disk-assisted
In a rsyslog->Kafka->Logstash setup I assume you want to keep rsyslog light, so these numbers would be small, like:
main_queue( queue.workerthreads="1" # threads to work on the queue queue.dequeueBatchSize="100" # max number of messages to process at once queue.size="10000" # max queue size )
Finally, to publish to Kafka you’d mainly specify the brokers to connect to (in this example we have one listening to localhost:9092) and the name of the topic we just created:
action( broker=["localhost:9092"] type="omkafka" topic="rsyslog_logstash" template="json" )
Assuming Kafka is started, rsyslog will keep pushing to it.
Configuring Logstash
This is the part where we pick the JSON logs (as defined in the earlier template) and forward them to the preferred destinations. First, we have the input, which will use to the Kafka topic we created. To connect, we’ll point Logstash to Zookeeper, and it will fetch all the info about Kafka from there:
input {
kafka {
zk_connect => "localhost:2181"
topic_id => "rsyslog_logstash"
}
}
At this point, you may want to use various filters to change your logs before pushing to Logsene/Elasticsearch. For this last step, you’d use the Elasticsearch output:
output {
elasticsearch {
hosts => "localhost" # it used to be "host" pre-2.0
port => 9200
#ssl => "true"
#protocol => "http" # removed in 2.0
}
}
And that’s it! Now you can use Kibana (or, in the case of Logsene, either Kibana or Logsene’s own UI) to search your logs!
Recipe: Apache Logs + rsyslog (parsing) + Elasticsearch
Original post: Recipe: Apache Logs + rsyslog (parsing) + Elasticsearch by @Sematext
This recipe is about tailing Apache HTTPD logs with rsyslog, parsing them into structured JSON documents, and forwarding them to Elasticsearch (or a log analytics SaaS, like Logsene, which exposes the Elasticsearch API). Having them indexed in a structured way will allow you to do better analytics with tools like Kibana:
We’ll also cover pushing logs coming from the syslog socket and kernel, and how to buffer all of them properly. So this is quite a complete recipe for your centralized logging needs.
Getting the ingredients
Even though most distros already have rsyslog installed, it’s highly recommended to get the latest stable from the rsyslog repositories. The packages you’ll need are:
- rsyslog. The base package, including the file-tailing module (imfile)
- rsyslog-mmnormalize. This gives you mmnormalize, a module that will do the parsing of common Apache logs to JSON
- rsyslog-elasticsearch, for the Elasticsearch output
With the ingredients in place, let’s start cooking a configuration. The configuration needs to do the following:
- load the required modules
- configure inputs: tailing Apache logs and system logs
- configure the main queue to buffer your messages. This is also the place to define the number of worker threads and batch sizes (which will also be Elasticsearch bulk sizes)
- parse common Apache logs into JSON
- define a template where you’d specify how JSON messages would look like. You’d use this template to send logs to Logsene/Elasticsearch via the Elasticsearch output
Loading modules
Here, we’ll need imfile to tail files, mmnormalize to parse them, and omelasticsearch to send them. If you want to tail the system logs, you’d also need to include imuxsock and imklog (for kernel logs).
# system logs module(load="imuxsock") module(load="imklog") # file module(load="imfile") # parser module(load="mmnormalize") # sender module(load="omelasticsearch")
Configure inputs
For system logs, you typically don’t need any special configuration (unless you want to listen to a non-default Unix Socket). For Apache logs, you’d point to the file(s) you want to monitor. You can use wildcards for file names as well. You also need to specify a syslog tag for each input. You can use this tag later for filtering.
input(type="imfile"
File="/var/log/apache*.log"
Tag="apache:"
)NOTE: By default, rsyslog will not poll for file changes every N seconds. Instead, it will rely on the kernel (via inotify) to poke it when files get changed. This makes the process quite realtime and scales well, especially if you have many files changing rarely. Inotify is also less prone to bugs when it comes to file rotation and other events that would otherwise happen between two “polls”. You can still use the legacy mode=”polling” by specifying it in imfile’s module parameters.
Queue and workers
By default, all incoming messages go into a main queue. You can also separate flows (e.g. files and system logs) by using different rulesets but let’s keep it simple for now.
For tailing files, this kind of queue would work well:
main_queue( queue.workerThreads="4" queue.dequeueBatchSize="1000" queue.size="10000" )
This would be a small in-memory queue of 10K messages, which works well if Elasticsearch goes down, because the data is still in the file and rsyslog can stop tailing when the queue becomes full, and then resume tailing. 4 worker threads will pick batches of up to 1000 messages from the queue, parse them (see below) and send the resulting JSONs to Elasticsearch.
If you need a larger queue (e.g. if you have lots of system logs and want to make sure they’re not lost), I would recommend using a disk-assisted memory queue, that will spill to disk whenever it uses too much memory:
main_queue( queue.workerThreads="4" queue.dequeueBatchSize="1000" queue.highWatermark="500000" # max no. of events to hold in memory queue.lowWatermark="200000" # use memory queue again, when it's back to this level queue.spoolDirectory="/var/run/rsyslog/queues" # where to write on disk queue.fileName="stats_ruleset" queue.maxDiskSpace="5g" # it will stop at this much disk space queue.size="5000000" # or this many messages queue.saveOnShutdown="on" # save memory queue contents to disk when rsyslog is exiting )
Parsing with mmnormalize
The message normalization module uses liblognorm to do the parsing. So in the configuration you’d simply point rsyslog to the liblognorm rulebase:
action(type="mmnormalize" rulebase="/opt/rsyslog/apache.rb" )
where apache.rb will contain rules for parsing apache logs, that can look like this:
version=2 rule=:%clientip:word% %ident:word% %auth:word% [%timestamp:char-to:]%] "%verb:word% %request:word% HTTP/%httpversion:float%" %response:number% %bytes:number% "%referrer:char-to:"%" "%agent:char-to:"%"%blob:rest%
Where version=2 indicates that rsyslog should use liblognorm’s v2 engine (which is was introduced in rsyslog 8.13) and then you have the actual rule for parsing logs. You can find more details about configuring those rules in the liblognorm documentation.
Besides parsing Apache logs, creating new rules typically requires a lot of trial and error. To check your rules without messing with rsyslog, you can use the lognormalizer binary like:
head -1 /path/to/log.file | /usr/lib/lognorm/lognormalizer -r /path/to/rulebase.rb -e json
NOTE: If you’re used to Logstash’s grok, this kind of parsing rules will look very familiar. However, things are quite different under the hood. Grok is a nice abstraction over regular expressions, while liblognorm builds parse trees out of specialized parsers. This makes liblognorm much faster, especially as you add more rules. In fact, it scales so well, that for all practical purposes, performance depends on the length of the log lines and not on the number of rules. This post explains the theory behind this assuption, and this is actually proven by various tests. The downside is that you’ll lose some of the flexibility offered by regular expressions. You can still use regular expressions with liblognorm (you’d need to set allow_regex to on when loading mmnormalize) but then you’d lose a lot of the benefits that come with the parse tree approach.
Template for parsed logs
Since we want to push logs to Elasticsearch as JSON, we’d need to use templates to format them. For Apache logs, by the time parsing ended, you already have all the relevant fields in the $!all-json variable, that you’ll use as a template:
template(name="all-json" type="list"){
property(name="$!all-json")
}Template for time-based indices
For the logging use-case, you’d probably want to use time-based indices (e.g. if you keep your logs for 7 days, you can have one index per day). Such a design will give your cluster a lot more capacity due to the way Elasticsearch merges data in the background (you can learn the details in our presentations at GeeCON and Berlin Buzzwords).
To make rsyslog use daily or other time-based indices, you need to define a template that builds an index name off the timestamp of each log. This is one that names them logstash-YYYY.MM.DD, like Logstash does by default:
template(name="logstash-index"
type="list") {
constant(value="logstash-")
property(name="timereported" dateFormat="rfc3339" position.from="1" position.to="4")
constant(value=".")
property(name="timereported" dateFormat="rfc3339" position.from="6" position.to="7")
constant(value=".")
property(name="timereported" dateFormat="rfc3339" position.from="9" position.to="10")
}And then you’d use this template in the Elasticsearch output:
action(type="omelasticsearch" template="all-json" dynSearchIndex="on" searchIndex="logstash-index" searchType="apache" server="MY-ELASTICSEARCH-SERVER" bulkmode="on" action.resumeretrycount="-1" )
Putting both Apache and system logs together
If you use the same rsyslog to parse system logs, mmnormalize won’t parse them (because they don’t match Apache’s common log format). In this case, you’ll need to pick the rsyslog properties you want and build an additional JSON template:
template(name="plain-syslog"
type="list") {
constant(value="{")
constant(value="\"timestamp\":\"") property(name="timereported" dateFormat="rfc3339")
constant(value="\",\"host\":\"") property(name="hostname")
constant(value="\",\"severity\":\"") property(name="syslogseverity-text")
constant(value="\",\"facility\":\"") property(name="syslogfacility-text")
constant(value="\",\"tag\":\"") property(name="syslogtag" format="json")
constant(value="\",\"message\":\"") property(name="msg" format="json")
constant(value="\"}")
}Then you can make rsyslog decide: if a log was parsed successfully, use the all-json template. If not, use the plain-syslog one:
if $parsesuccess == "OK" then {
action(type="omelasticsearch"
template="all-json"
...
)
} else {
action(type="omelasticsearch"
template="plain-syslog"
...
)
}And that’s it! Now you can restart rsyslog and get both your system and Apache logs parsed, buffered and indexed into Elasticsearch. If you’re a Logsene user, the recipe is a bit simpler: you’d follow the same steps, except that you’ll skip the logstash-index template (Logsene does that for you) and your Elasticsearch actions will look like this:
action(type="omelasticsearch" template="all-json or plain-syslog" searchIndex="LOGSENE-APP-TOKEN-GOES-HERE" searchType="apache" server="logsene-receiver.sematext.com" serverport="80" bulkmode="on" action.resumeretrycount="-1" )
Coupling with Logstash via Redis
Original post: Recipe: rsyslog + Redis + Logstash by @Sematext
OK, so you want to hook up rsyslog with Logstash. If you don’t remember why you want that, let me give you a few hints:
- Logstash can do lots of things, it’s easy to set up but tends to be too heavy to put on every server
- you have Redis already installed so you can use it as a centralized queue. If you don’t have it yet, it’s worth a try because it’s very light for this kind of workload.
- you have rsyslog on pretty much all your Linux boxes. It’s light and surprisingly capable, so why not make it push to Redis in order to hook it up with Logstash?
In this post, you’ll see how to install and configure the needed components so you can send your local syslog (or tail files with rsyslog) to be buffered in Redis so you can use Logstash to ship them to Elasticsearch, a logging SaaS like Logsene (which exposes the Elasticsearch API for both indexing and searching) so you can search and analyze them with Kibana:
Changelog for 8.11.0 (v8-stable)
Version 8.11.0 [v8-stable] 2015-06-30
- new signature provider for Keyless Signature Infrastructure (KSI) added
- build system: re-enable use of “make distcheck”
- bugfix imfile: regex multiline mode ignored escapeLF option
Thanks to Ciprian Hacman for reporting the problem
closes https://github.com/rsyslog/rsyslog/issues/370 - bugfix omkafka: fixed several concurrency issues, most of them related to dynamic topics.
Thanks to Janmejay Singh for the patch. - bugfix: execonlywhenpreviousissuspende
d did not work correctly
This especially caused problems when an action with this attribute was configured with an action queue. - bugfix core engine: ensured global variable atomicity
This could lead to problems in RainerScript, as well as probably in other areas where global variables are used inside rsyslog. I wouldn’t outrule it could lead to segfaults.
Thanks to Janmejay Singh for the patch. - bugfix imfile: segfault when using startmsg.regex because of empty log line
closes https://github.com/rsyslog/rsyslog/issues/357
Thanks to Ciprian Hacman for the patch. - bugfix: build problem on Solaris
Thanks to Dagobert Michelsen for reporting this and getting us up to
speed on the openCWS build farm. - bugfix: build system strndup was used even if not present now added compatibility function. This came up on Solaris builds.
Thanks to Dagobert Michelsen for reporting the problem.
closes https://github.com/rsyslog/rsyslog/issues/347
rsyslog 8.9.0 (v8-stable) released
We have released rsyslog 8.9.0.
http://www.rsyslog.com/changelog-for-8-9-0-v8-stable/
Download:
http://www.rsyslog.com/downloads/download-v8-stable/
As always, feedback is appreciated.
Best regards,
Florian Riedl
Changelog for 8.9.0 (v8-stable)
Version 8.9.0 [v8-stable] 2015-04-07
- omprog: add option “hup.forward” to forwards HUP to external plugins
This was suggested by David Lang so that external plugins (and other
programs) can also do HUP-specific processing. The default is not
to forward HUP, so no change of behavior by default. - imuxsock: added capability to use regular parser chain
Previously, this was a fixed format, that was known to be spoken on
the system log socket. This also adds new parameters:- sysSock.useSpecialParser module parameter
- sysSock.parseHostname module parameter
- useSpecialParser input parameter
- parseHostname input parameter
- 0mq: improvements in input and output modules
See module READMEs, part is to be considered experimental.
Thanks to Brian Knox for the contribution. - imtcp: add support for ip based bind for imtcp -> param “address”
Thanks to github user crackytsi for the patch. - bugfix: MsgDeserialize out of sync with MsgSerialize for StrucData
This lead to failure of disk queue processing when structured data was
present. Thanks to github user adrush for the fix. - bugfix imfile: partial data loss, especially in readMode != 0
closes https://github.com/rsyslog/rsyslog/issues/144 - bugfix: potential large memory consumption with failed actions
see also https://github.com/rsyslog/rsyslog/issues/253 - bugfix: omudpspoof: invalid default send template in RainerScript format
The file format template was used, which obviously does not work for
forwarding. Thanks to Christopher Racky for alerting us.
closes https://github.com/rsyslog/rsyslog/issues/268 - bugfix: size-based legacy config statements did not work properly
on some platforms, they were incorrectly handled, resulting in all
sorts of “interesting” effects (up to segfault on startup) - build system: added option –without-valgrind-testbench
… which provides the capability to either enforce or turn off
valgrind use inside the testbench. Thanks to whissi for the patch. - rsyslogd: fix misleading typos in error messages
Thanks to Ansgar Püster for the fixes.
Using rsyslog and Elasticsearch to Handle Different Types of JSON Logs
Originally posted on the Sematext blog: Using Elasticsearch Mapping Types to Handle Different JSON Logs
By default, Elasticsearch does a good job of figuring the type of data in each field of your logs. But if you like your logs structured like we do, you probably want more control over how they’re indexed: is time_elapsed an integer or a float? Do you want your tags analyzed so you can search for big in big data? Or do you need it not_analyzed, so you can show top tags via the terms aggregation? Or maybe both?
In this post, we’ll look at how to use index templates to manage multiple types of logs across multiple indices. Also, we’ll explain how to use rsyslog to handle JSON logging and specify types.
Elasticsearch Mapping and Logs
To control settings for how a field is analyzed in Elasticsearch, you’ll need to define a mapping. This works similarly in Logsene, our log analytics SaaS, because it uses Elasticsearch and exposes its API.
With logs you’ll probably use time-based indices, because they scale better (in Logsene, for instance, you get daily indices). To make sure the mapping you define today applies to the index you create tomorrow, you need to define it in an index template.
Managing Multiple Types
Mappings provide a nice abstraction when you have to deal with multiple types of structured data. Let’s say you have two apps generating logs of different structures: both have a timestamp field, but one recording logins has a user field, and another one recording purchases has an amount field.
To deal with this, you can define the timestamp field in the _default_ mapping which applies to all types. Then, in each type’s own mapping we’ll define fields unique to that mapping. The following snippet is an example that works with Logsene, provided that aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee is your Logsene app token. If you roll your own Elasticsearch, you can use whichever name you want, and make sure the template name applies to matches index pattern (for example, logs-* will work if your indices are in the logs-YYYY-MM-dd format).
curl -XPUT 'logsene-receiver.sematext.com/_template/aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee_MyTemplate' -d '{
"template" : "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee*",
"order" : 21,
"mappings" : {
"_default_" : {
"properties" : {
"timestamp" : { "type" : "date" }
}
},
"firstapp" : {
"properties" : {
"user" : { "type" : "string" }
}
},
"secondapp" : {
"properties" : {
"amount" : { "type" : "long" }
}
}
}
}'Sending JSON Logs to Specific Types
When you send a document to Elasticsearch by using the API, you have to provide an index and a type. You can use an Elasticsearch client for your preferred language to log directly to Elasticsearch or Logsene this way. But I wouldn’t recommend this, because then you’d have to manage things like buffering if the destination is unreachable.
Instead, I’d keep my logging simple and use a specialized logging tool, such as rsyslog, to do the hard work for me. Logging to a file is usually the easiest option. It’s local, and you can have your logging tool tail the file and send contents over the network. I usually prefer sockets (like syslog) because they let me configure rsyslog to:
– write events in a human format to a local file I can tail if I need to (usually in development)
– forward logs without hitting disk if I need to (usually in production)
Whatever you prefer, I think writing to local files or sockets is better than sending logs over the network from your application. Unless you’re willing to do a reliability trade-off and use UDP, which gets rid of most complexities.
Opinions aside, if you want to send JSON over syslog, there’s the JSON-over-syslog (CEE) format that we detailed in a previous post. You can use rsyslog’s JSON parser module to take your structured logs and forward them to Logsene:
module(load="imuxsock") # can listen to local syslog socket
module(load="omelasticsearch") # can forward to Elasticsearch
module(load="mmjsonparse") # can parse JSON
action(type="mmjsonparse") # parse CEE-formatted messages
template(name="syslog-cee" type="list") { # Elasticsearch documents will contain
property(name="$!all-json") # all JSON fields that were parsed
}
action(
type="omelasticsearch"
template="syslog-cee" # use the template defined earlier
server="logsene-receiver.sematext.com"
serverport="80"
searchType="syslogapp"
searchIndex="LOGSENE-APP-TOKEN-GOES-HERE"
bulkmode="on" # send logs in batches
queue.dequeuebatchsize="1000" # of up to 1000
action.resumeretrycount="-1" # retry indefinitely (buffer) if destination is unreachable
)To send a CEE-formatted syslog, you can run logger ‘@cee: {“amount”: 50}’ for example. Rsyslog would forward this JSON to Elasticsearch or Logsene via HTTP. Note that Logsene also supports CEE-formatted JSON over syslog out of the box if you want to use a syslog protocol instead of the Elasticsearch API.
Filtering by Type
Once your logs are in, you can filter them by type (via the _type field) in Kibana:

However, if you want more refined filtering by source, we suggest using a separate field for storing the application name. This can be useful when you have different applications using the same logging format. For example, both crond and postfix use plain syslog.
rsyslog 8.6.0 (v8-stable) released
We have released rsyslog 8.6.0.
This is the first stable release under a new release cycle and versioning scheme. This new scheme is important news in itself. For more details, please have a look here:
http://www.rsyslog.com/rsyslogs-new-release-cycle-and-versioning-scheme/
Version 8.6.0 contains important new features like the ability to monitor files via wildcards in imfile. It also contains new, experimental zero message queue modules (special thanks to team member Brian Knox), improvements to RainerScript and mmnormalize (thanks to Singh Janmejay) and a couple of other improvements.
This release also contains important bug fixes.
This is a recommended upgrade for all users.
http://www.rsyslog.com/changelog-for-8-6-0-v8-stable/
Download:
http://www.rsyslog.com/downloads/download-v8-stable/
As always, feedback is appreciated.
Best regards,
Florian Riedl
Changelog for 8.6.0 (v8-stable)
Version 8.6.0 [v8-stable] 2014-12-02
NOTE: This version also incorporates all changes and enhancements made for
v8.5.0, but in a stable release. For details see immediately below.
- configuration-setting rsyslogd command line options deprecated
For most of them, there are now proper configuration objects. Some few will be completely dropped if nobody insists on them. Additional info at
http://blog.gerhards.net/2014/11/phasing-out-legacy-command- line-options.html - new and enhanced plugins for 0mq. These are currently experimantal.
Thanks to Brian Knox who contributed the modules and is their author. - empty rulesets have been permitted. They no longer raise a syntax error.
- add parameter -N3 to enable config check of partial config file
Use for config include files. Disables checking if any action exists at
all. - rsyslogd -e option has finally been removed
It is deprectated since many years. - testbench improvements
Testbench is now more robust and has additional tests. - testbench is now by default disabled
To enable it, use –enable-testbench. This was done as the testbench now does better checking if required modules are present and this in turn would lead to configure error messages where non previously were if we would leave –enable-testbench on by default. Thus we have turned it off. This should not be an issue for those few testbench users. - add new RainerScript functions warp() and replace()
Thanks to Singh Janmejay for the patch. - mmnormalize can now also work on a variable
Thanks to Singh Janmejay for the patch. - new property date options for day ordinal and week number
Thanks to github user arrjay for the patch - remove –enable-zlib configure option, we always require it
It’s hard to envision a system without zlib, so we turn this off
closes https://github.com/rsyslog/rsyslog/issues/76 - slight source-tree restructuring: contributed modules are now in their own ./contrib directory. The idea is to make it clearer to the end user which plugins are supported by the rsyslog project (those in ./plugins).
- bugfix: imudp makes rsyslog hang on shutdown when more than 1 thread used
closes https://github.com/rsyslog/rsyslog/issues/126 - bugfix: not all files closed on auto-backgrounding startup
This could happen when not running under systemd. Some low-numbered fds were not closed in that case. - bugfix: typo in queue configuration parameter made parameter unusable
Thanks to Bojan Smojver for the patch. - bugfix: unitialized buffer off-by-one error in hostname generation
The DNS cache used uninitialized memory, which could lead to invalid hostname generation.
Thanks to Jarrod Sayers for alerting us and provinding analysis and patch recommendations. - bugfix imuxsock: possible segfault when SysSock.Use=”off”
Thanks to alexjfisher for reporting this issue.
closes https://github.com/rsyslog/rsyslog/issues/140 - bugfix: RainerScript: invalid ruleset names were accepted during ruleset defintion, but could of course not be used when e.g. calling a ruleset.
IMPORTANT: this may cause existing configurations to error out on start, as they invalid names could also be used e.g. when assigning rulesets. - bugfix: some module entry points were not called for all modules callbacks like endCnfLoad() were primarily being called for input modules. This has been corrected. Note that this bugfix has some regression potential.
- bugfix omlibdbi: connection was taken down in wrong thread
This could have consequences depending on the driver being used. In general, it looks more like a cosmetic issue. For example, with MySQL it lead to a small memory but also an annoying message about a thread not properly torn down. - imttcp was removed because it was an incompleted experimental module
- pmrfc3164sd because it was a custom module nobody used
We used to keep this as a sample inside the tree, but whoever wants to look at it can check in older versions inside git - omoracle was removed because it was orphaned and did not build/work for quite some years and nobody was interested in fixing it
remote syslog PRI vulnerability – CVE: CVE-2014-3683
remote syslog PRI vulnerability
===============================
CVE: CVE-2014-3683
Status of this report
———————
FINAL
Updated 2014-10-06: effect on sysklogd milder than in initial assesment
Reporter
——-
mancha
Rainer Gerhards
Affected
——–
– rsyslog, most probably all versions (checked v3-stable and above)
– sysklogd (checked most recent versions)
Short Description
—————–
While preparing a fix for CVE-2014-3634 for sysklogd, mancha discovered and
privately reported that the initial rsyslog fix set was incomplete. It did
not cover cases where PRI values > MAX_INT caused integer overflows
resulting in negative values.
Root Cause
———-
A while parsing the syslog message’s PRI value, an integer overlow can
happen (for PRI > MAX_INT), which may cause negative PRI values. The
solutions provided in CVE-2014-3634 do not handle that case.
Effect in Practice
——————
General
~~~~~~~
Almost all distributions do ship rsyslog without remote reception by
default and almost all distros also put firewall rules into place that
prevent reception of syslog messages from remote hosts (even if rsyslog
would be listening). With these defaults, it is impossible to trigger
the vulnerability in v7 and v8. Older versions may still be vulnerable
if a malicious user writes to the local log socket.
Even when configured to receive remote message (on a central server),
it is good practice to protect such syslog servers to accept only
messages from trusted peers, e.g. within the relay chain. If setup in
such a way, a trusted peer must be compromised to send malfromed
messages. This further limits the magnitude of the vulnerability.
If, however, any system is permitted to send data unconditionally to
the syslogd, a remote attack is possible.
sysklogd
~~~~~~~~
This causes no extra problems as CVE-2014-3634 because negative
PRI values are masked off, so the overflow region is still at a
maximum of 104 bytes. For more detailsm see CVE-2014-3634.
rsyslogd
~~~~~~~~
Rsyslogd experiences the same problem as sysklogd.
As in CVE-2014-3634, rsyslog v7 and v8 have an elevated risk due to some
tables lookups used during writing. For details see CVE2014-3634.
Contrary to CVE-2014-3634, rsyslog v5 is also likely to segfault, if
and only if the patch for CVE-2014-3634 is applied.
Note that a segfault of rsyslog can cause message loss. There are multiple
scenarios for this, but likely ones are:
– reception via UDP, where all messages arriving during downtime are lost
– corruption of disk queue structures, which can lead to loss of all disk
queue contents (manual recovery is possible).
This list does not try to be complete. Note that disk queue corruption is likely
to occur in default settings, because the important queue information file (.qi)
is only written on successful shutdown. Without a valid .qi file, queue message
files cannot be processed.
How to Exploit
————–
A syslog message with a PRI > MAX_INT needs to be sent and the resulting
overflow must lead to a negative value. It is usually sufficient
to send just the PRI as in this example
“<3500000000>”
Severity
——–
Given the fact that multiple changes must be made to default system configurations
and potential problems we classify the severity of this vulnerability as
MEDIUM
Note that the probability of a successful attack is on LOW. However, the risk of
message loss is HIGH in those rare instances where an attack is successful. As
mentioned above, it cannot totally be outruled that remote code injection is
possible using this vulnerability.
Patches
——-
sysklogd
~~~~~~~~
A patch for version 1.5 is available at:
http://sf.net/projects/mancha/files/sec/sysklogd-1.5_CVE-2014-3634.diff
rsyslog
~~~~~~~
Patches are available for versions known to be in wide-spread use.
Version 8.4.2 is not vulnerable. Version 7.6.7, while no longer being project
supported, received a patch and is also not vulnerable.
Versions 8.4.1 and 7.6.6 do NOT handle integer overflows and resulting negative
PRI values correctly. So upgrading to them is NOT a sufficient solution. All
older v7 and v8 versions are vulnerable.
The rsyslog project also provides patches for older versions 5 and 3. This is
purely a convenience to those that still run these very outdated versions.
Note that these patches address the segfault issue. They do NOT offer all features
of the v7/8 series, as this would require considerate code changes. Most
importantly, the “invld” facility is not available in the v3 patch. Also,
the dead-version patches do not try to assing a specific severity to messages
with invalid PRI values nor do they prevent parsing those messages. In general,
it is suggested to upgrade to the currently supported version 8.4.2.
All patches and downloads can be found on http://www.rsyslog.com
Special thanks to mancha for his suggestions on how to fix the problem. The
core idea went into the rsyslog patches.


