imtcp

rsyslog 8.29.0 (v8-stable) released

We have released rsyslog 8.29.0.

This release features a number of changes. E.g. imptcp now has an experimental parameter for multiline messages, and new statistics counters.

Most notably though, is the improved error reporting in the rsyslog core and in several modules like imtcp, imptcp, omfwd and the core modules. There is also an article available about the improved/enhanced error reporting:

https://www.linkedin.com/pulse/improving-rsyslog-debug-output-jan-gerhards

If you have questions or feedback in relation to the article and/or debug output, please let us know or leave a comment below the article.

Other than that, the new version provides quite a number of bugfixes.

For a complete list of changes, fixes and enhancements, please visit the ChangeLog.

The packages will follow when they are finished.

ChangeLog:

rsyslog 8.27.0 (v8-stable) released

We have released rsyslog 8.27.0.

This release provides, apart from a lot of fixes, many useful feature enhancements. Most notably is the imkafka module, which allows the use of kafka as an input. In addition to this, imptcp and imtcp received quite a number of enhancements and the overall error reporting got improved quite a bit.

For a complete list of changes, fixes and enhancements, please visit the ChangeLog.

ChangeLog:

Using rsyslog to Reindex/Migrate Elasticsearch data

Original post: Scalable and Flexible Elasticsearch Reindexing via rsyslog by @Sematext

This recipe is useful in a two scenarios:

  • migrating data from one Elasticsearch cluster to another (e.g. when you’re upgrading from Elasticsearch 1.x to 2.x or later)
  • reindexing data from one index to another in a cluster pre 2.3. For clusters on version 2.3 or later, you can use the Reindex API

Back to the recipe, we used an external application to scroll through Elasticsearch documents in the source cluster and push them to rsyslog via TCP. Then we used rsyslog’s Elasticsearch output to push logs to the destination cluster. The overall flow would be:

rsyslog to Elasticsearch reindex flow

This is an easy way to extend rsyslog, using whichever language you’re comfortable with, to support more inputs. Here, we piggyback on the TCP input. You can do a similar job with filters/parsers – you can find GeoIP implementations, for example – by piggybacking the mmexternal module, which uses stdout&stdin for communication. The same is possible for outputs, normally added via the omprog module: we did this to add a Solr output and one for SPM custom metrics.

The custom script in question doesn’t have to be multi-threaded, you can simply spin up more of them, scrolling different indices. In this particular case, using two scripts gave us slightly better throughput, saturating the network:

rsyslog to Elasticsearch reindex flow multiple scripts

Writing the custom script

Before starting to write the script, one needs to know how the messages sent to rsyslog would look like. To be able to index data, rsyslog will need an index name, a type name and optionally an ID. In this particular case, we were dealing with logs, so the ID wasn’t necessary.

With this in mind, I see a number of ways of sending data to rsyslog:

  • one big JSON per line. One can use mmnormalize to parse that JSON, which then allows rsyslog do use values from within it as index name, type name, and so on
  • for each line, begin with the bits of “extra data” (like index and type names) then put the JSON document that you want to reindex. Again, you can use mmnormalize to parse, but this time you can simply trust that the last thing is a JSON and send it to Elasticsearch directly, without the need to parse it
  • if you only need to pass two variables (index and type name, in this case), you can piggyback on the vague spec of RFC3164 syslog and send something like
    destination_index document_type:{"original": "document"}
    

This last option will parse the provided index name in the hostname variable, the type in syslogtag and the original document in msg. A bit hacky, I know, but quite convenient (makes the rsyslog configuration straightforward) and very fast, since we know the RFC3164 parser is very quick and it runs on all messages anyway. No need for mmnormalize, unless you want to change the document in-flight with rsyslog.

Below you can find the Python code that can scan through existing documents in an index (or index pattern, like logstash_2016.05.*) and push them to rsyslog via TCP. You’ll need the Python Elasticsearch client (pip install elasticsearch) and you’d run it like this:

python elasticsearch_to_rsyslog.py source_index destination_index

The script being:

from elasticsearch import Elasticsearch
import json, socket, sys

source_cluster = ['server1', 'server2']
rsyslog_address = '127.0.0.1'
rsyslog_port = 5514

es = Elasticsearch(source_cluster,
      retry_on_timeout=True,
      max_retries=10)
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((rsyslog_address, rsyslog_port))


result = es.search(index=sys.argv[1], scroll='1m', search_type='scan', size=500)

while True:
  res = es.scroll(scroll_id=result['_scroll_id'], scroll='1m')
  for hit in result['hits']['hits']:
    s.send(sys.argv[2] + ' ' + hit["_type"] + ':' + json.dumps(hit["_source"])+'\n')
  if not result['hits']['hits']:
    break

s.close()

If you need to modify messages, you can parse them in rsyslog via mmjsonparse and then add/remove fields though rsyslog’s scripting language. Though I couldn’t find a nice way to change field names – for example to remove the dots that are forbidden since Elasticsearch 2.0 – so I did that in the Python script:

def de_dot(my_dict):
  for key, value in my_dict.iteritems():
    if '.' in key:
      my_dict[key.replace('.','_')] = my_dict.pop(key)
    if type(value) is dict:
      my_dict[key] = de_dot(my_dict.pop(key))
  return my_dict

And then the “send” line becomes:

s.send(sys.argv[2] + ' ' + hit["_type"] + ':' + json.dumps(de_dot(hit["_source"]))+'\n')

Configuring rsyslog

The first step here is to make sure you have the lastest rsyslog, though the config below works with versions all the way back to 7.x (which can be found in most Linux distributions). You just need to make sure the rsyslog-elasticsearch package is installed, because we need the Elasticsearch output module.

# messages bigger than this are truncated
$maxMessageSize 10000000  # ~10MB

# load the TCP input and the ES output modules
module(load="imtcp")
module(load="omelasticsearch")

main_queue(
  # buffer up to 1M messages in memory
  queue.size="1000000"
  # these threads process messages and send them to Elasticsearch
  queue.workerThreads="4"
  # rsyslog processes messages in batches to avoid queue contention
  # this will also be the Elasticsearch bulk size
  queue.dequeueBatchSize="4000"
)

# we use templates to specify how the data sent to Elasticsearch looks like
template(name="document" type="list"){
  # the "msg" variable contains the document
  property(name="msg")
}
template(name="index" type="list"){
  # "hostname" has the index name
  property(name="hostname")
}
template(name="type" type="list"){
  # "syslogtag" has the type name
  property(name="syslogtag")
}

# start the TCP listener on the port we pointed the Python script to
input(type="imtcp" port="5514")

# sending data to Elasticsearch, using the templates defined earlier
action(type="omelasticsearch"
  template="document"
  dynSearchIndex="on" searchIndex="index"
  dynSearchType="on" searchType="type"
  server="localhost"  # destination Elasticsearch host
  serverport="9200"   # and port
  bulkmode="on"  # use the bulk API
  action.resumeretrycount="-1"  # retry indefinitely if Elasticsearch is unreachable
)

This configuration doesn’t have to disturb your local syslog (i.e. by replacing /etc/rsyslog.conf). You can put it someplace else and run a different rsyslog process:

rsyslogd -i /var/run/rsyslog_reindexer.pid -f /home/me/rsyslog_reindexer.conf

And that’s it! With rsyslog started, you can start the Python script(s) and do the reindexing.

rsyslog 8.9.0 (v8-stable) released

We have released rsyslog 8.9.0.

This is primarily a bug-fixing release with a couple of improvements in omprog, imuxsock and the zero message queue plugins.
ChangeLog:

http://www.rsyslog.com/changelog-for-8-9-0-v8-stable/

Download:

http://www.rsyslog.com/downloads/download-v8-stable/

As always, feedback is appreciated.

Best regards,
Florian Riedl

Changelog for 8.9.0 (v8-stable)

Version 8.9.0 [v8-stable] 2015-04-07

  • omprog: add option “hup.forward” to forwards HUP to external plugins
    This was suggested by David Lang so that external plugins (and other
    programs) can also do HUP-specific processing. The default is not
    to forward HUP, so no change of behavior by default.
  • imuxsock: added capability to use regular parser chain
    Previously, this was a fixed format, that was known to be spoken on
    the system log socket. This also adds new parameters:

    • sysSock.useSpecialParser module parameter
    • sysSock.parseHostname module parameter
    • useSpecialParser input parameter
    • parseHostname input parameter
  • 0mq: improvements in input and output modules
    See module READMEs, part is to be considered experimental.
    Thanks to Brian Knox for the contribution.
  • imtcp: add support for ip based bind for imtcp -> param “address”
    Thanks to github user crackytsi for the patch.
  • bugfix: MsgDeserialize out of sync with MsgSerialize for StrucData
    This lead to failure of disk queue processing when structured data was
    present. Thanks to github user adrush for the fix.
  • bugfix imfile: partial data loss, especially in readMode != 0
    closes https://github.com/rsyslog/rsyslog/issues/144
  • bugfix: potential large memory consumption with failed actions
    see also https://github.com/rsyslog/rsyslog/issues/253
  • bugfix: omudpspoof: invalid default send template in RainerScript format
    The file format template was used, which obviously does not work for
    forwarding. Thanks to Christopher Racky for alerting us.
    closes https://github.com/rsyslog/rsyslog/issues/268
  • bugfix: size-based legacy config statements did not work properly
    on some platforms, they were incorrectly handled, resulting in all
    sorts of “interesting” effects (up to segfault on startup)
  • build system: added option –without-valgrind-testbench
    … which provides the capability to either enforce or turn off
    valgrind use inside the testbench. Thanks to whissi for the patch.
  • rsyslogd: fix misleading typos in error messages
    Thanks to Ansgar Püster for the fixes.

Changelog for 7.4.7 (v7-stable)

Version 7.4.7  [v7.4-stable] 2013-12-10

  • bugfix: limiting queue disk space did not work properly
    •   queue.maxdiskspace actually initializes queue.maxfilesize
    •   total size of queue files was not checked against queue.maxdiskspace for disk assisted queues.

    Thanks to Karol Jurak for the patch.

  • bugfix: linux kernel-like ratelimiter did not work properly with all inputs (for example, it did not work with imdup). The reason was that the PRI value was used, but that needed parsing of the message, which was done too late.
  • bugfix: disk queues created files in wrong working directory if the $WorkDirectory was changed multiple times, all queues only used the last value set.
  • bugfix: legacy directive $ActionQueueWorkerThreads was not honored
  • bugfix: segfault on startup when certain script constructs are used
    e.g. “if not $msg …”
  • bugfix: imuxsock: UseSysTimeStamp config parameter did not work correctly
    Thanks to Tomas Heinrich for alerting us and provinding a solution suggestion.
  • bugfix: $SystemLogUseSysTimeStamp/$SystemLogUsePIDFromSystem did not work
    Thanks to Tomas Heinrich for the patch.
  • improved checking of queue config parameters on startup
  • bugfix: call to ruleset with async queue did not use the queue
    closes: http://bugzilla.adiscon.com/show_bug.cgi?id=443
  • bugfix: if imtcp is loaded and no listeners are configured (which is uncommon), rsyslog crashes during shutdown.

rsyslog statistic counter plugin imtcp

Plugin – imtcp

This plugin maintains statistics for each listener. The statistic is named after the given input name (or “imtcp” if none is configured), followed by the listener port in parenthesis. For example, the counter for a listener on port 514 with no set name is called “imtcp(514)”.

The following properties are maintained for each listener:

  • submitted – total number of messages submitted for processing since startup

Back to statistics counter overview

rsyslog 7.5.4 (v7-devel) released

This release offers some interesting features. It provides a new module called mmpstrucdata to parse RFC5424 structured data into json message properties. Also the default queue.size values have been altered to more suitable values. Omfwd and omfile received new parameters and we changed a bigger portion of the documentation to improve usability by linking relevant web ressources to quickly find additional information. Finally, there have been a few other changes and bugfixes.

More detailed information is available in the changelog.

ChangeLog:

http://www.rsyslog.com/changelog-for-7-5-4-v7-devel/

Download:

http://www.rsyslog.com/rsyslog-7-5-4-v7-devel/

As always, feedback is appreciated.

Best regards,
Florian Riedl

Changelog for 7.5.4 (v7-devel)

Version 7.5.4 [devel] 2013-10-07

  • mmpstrucdata: new module to parse RFC5424 structured data into json message properties
  • change main/ruleset queue defaults to be more enterprise-like
    new defaults are queue.size 100,000 max workers 2, worker activation after 40,000 msgs are queued, batch size 256. These settings are much more useful for enterprises and will not hurt low-end systems that much. This is part of our re-focus on enterprise needs.
  • omfwd: new action parameter “maxErrorMessages” added
  • omfile: new module parameters to set action defaults added
    * dirCreateMode
    * fileCreateMode
  • mmutf8fix: new module to fix invalid UTF-8 sequences
  • imuxsock: handle unlimited number of additional listen sockets
  • doc: improve usability by linking to relevant web ressources
    The idea is to enable users to quickly find additional information, samples, HOWTOs and the like on the main site. At the same time, (very) slightly remove memory footprint when few listeners are monitored.
  • bugfix: omfwd parameter streamdrivermmode was not properly handled
    It was always overwritten by whatever value was set via the legacy directive $ActionSendStreamDriverMode
  • imtcp: add streamdriver.name module parameter
    permits overriding the system default stream driver (gtls, ptcp)
  • bugfix: build system: libgcrypt.h needed even if libgrcypt was disabled
    Thanks to Jonny Törnbom for reporting this problem
  • imported bugfixes from 7.4.4
Scroll to top