solving rsyslog write errors
When rsyslog reports a write error, it includes the operating-system generated error message. It should hopefully give you a clue what the problem cause was. Unfortunately from time to time to root cause is not obvious.
In this case please check the following potential causes:
- Was OS/rsyslog config change applied but rsyslog not restarted? Rsyslog configuration changes are only applied when rsyslog is restarted. Similarly, many operating system process limitations (like file size and several permission settings) are only applied if process is restarted. If in doubt, do a restart of rsyslog. Doing so can potentially save you a lot of time.
- Is rsyslog configured to drop privileges? If so, the user or group dropped to may simply not have the right permission. Try to comment out the privilege drop to see if this is the root cause.
- Does SELinux prevent rsyslog to access the file? This is often the case if you write to non-standard locations. To check if this is the cause, you can disable SELinux on the system. If it then works, you know the root cause. But please do not run with SELinux disabled. Instead, configure it correctly.
- Are you using something similar to SELinux? For example AppArmor on Ubuntu? Investigate and check if it causes the trouble.
- Do you run rsyslog via systemd? Are there any limits specified in the service file? Most modern Linuxes use systemd, so this is for sure a place to check.
- Are there any global limits specified in the system configuration? Note: systemd ignores them, so if you use systemd, your really need to check the systemd configuration and rsyslog’s unit file!
- Are there any file system limitations?
- Did the system (temporarily) run out of space? This could especially be the case for intermittent problems.
This list probably is not conclusive but should give you a good idea of known trouble spots.
For a quick but rough check to find the culprit, you can run rsyslog in an interactive terminal window. Use the root account and do not drop privileges. If it works there, chances are pretty good that some other operating system component is causing the trouble.
rsyslog version numbering change
Rsyslog used a version number scheme of
8.<real-version>.0
where we incremented <real-version> every 6 weeks with each release. The 8 and 0 are constant (well, the 0 could change to 1 with a very important patch, but in practice we have only done this once).
While this scheme has worked pretty well since we introduced it, we often see people not understanding that there is really a big difference between 8.24 and e.g. 8.40. Followind recent trends in software versioning, we will make more clear how old a version really is. Begining with today’s release, we change the version number slightly to
8.yymm.0
where yy is the two-digit year and mm the two-digit month of the release date. We release every 6 weeks, so we will never have two releases within the same month.
So while you expected 8.41.0, you will now get 8.1901.0. To make things even more clear, rsyslog visible version output will be even more up to the point: rsyslog -v will now report “8.1901.0 (aka 2019.01)“.
Rainer Gerhards’ blog has more details on why we did this change and how we came to the new system.
rsyslog error 51 [warning]
File has been truncated
This is not a real error but rather a warning message. Most probably it occurs when monitoring files with imfile and “reopenOnTruncate” has been set to “on”. In this case, it indicates truncation of the file has been detected, and as such imfile begins to read it’s content from the beginning again.
Please see the imfile documentation for limitations of truncation processing.
message modification modules: why run in direct (queue) mode?
Message modificaton modules modify the message object, so the next actions can process the modified message. However, if the action that invokes the message modification module runs on a real queue (anything other than queue.type=”direct”), the message object is actually duplicated, and done so only for executing the action. In other words, the duplicated message object is immediately destroyed after the action completes. That means the modification made by the module will never be visible by anyone else.
So never run a message modification module on a non-direct queue. Message modification modules usually start with the letters “mm” (as in “mmjsonparse”).
Note that this is not a bug: rsyslog’s design is generic, and for most other actions the duplication of message is necessary in many cases. The config parser detects this kind of problems, but does not auto-correct it as the issue points to a potentially larger issue.
rsyslog error 2357
Warning parsing config file.
This unfortunately is a pretty generic error code which is emitted when there is a problem understanding the configuration files. The main configuration file is usually /etc/rsyslog.conf and it may include other configuration files. The error message names the file and the approximate location of the error.
The error text should describe what needs to be fixed. If that does not help, it may make sense to check out rsyslog support options.
This is a stub entry: If you have questions please post a comment or visit the github issue tracker.
rsyslog error 2007
What does it mean?
This is a generic error message that unfortunately can happen in a number of cases.
In practice, it is often associated with suspension of actions. Then it comes with a text like
action “action 17” suspended
The number behind action changes.
How to solve it?
A frequent case for this error message on Debian-based distributions (like raspbian) is that rsyslog.conf contains the instruction to write to the xconsole pipe, but this pipe is never read. If so, you can simply delete these lines to remove the error message. These lines are usually found at the end of rsyslog.conf.
For other error message, it probably is a good idea to check rsyslog’s issue tracker at github and file a new issue if you can’t find a related case.
Note: we try to keep this page update if we see other frequent causes of this error.
librelp 1.2.16
librelp 1.2.16 [download]
This new release of librelp provides API changes that allow better handling of oversize messages, as well as defining the listener interface. In addition, a few bugfixes for memory leaks and several minor issues are included.
For more details, please take a look at the changelog below.
* add new API: relpSrvSetOversizeMode()
This permits to tell librelp how to handle oversize messages.
Traditionally (and now default), this aborts the session. We now
added an option to truncate the message instead.
Also, in case of session abort a descriptive error message is
emitted. This did not happen previously and caused confusion.
closes https://github.com/rsyslog/librelp/issues/81
* add new API: relpSrvSetLstnAddr()
It permits to set the listen address inside the relp server.
If not called, the server will bind to all interfaces.
Thanks to github user perlei for contributing it.
– support additional hashes for fingerprint mode
old-style SHA1 is used automatically
Thanks to github user briaeros for the patch.
see also https://github.com/rsyslog/librelp/pull/55
– bugfix: potential memory leak
This is very unlikely to occur in practice. Memory can be leaked
when TLS initialization fails when the client tries to connect
to the server. However, if this actually happens, it can happen
frequently and so accumulate to a large leak.
No report of such occurence from practice.
Detected by Coverity Scan, CID 266008.
– bugfix: memory leak on protocol error
Receiving relp frames are not correctly deallocated while handling
protocol errors resulting in memory leak of dirty pages.
Thanks to github user gleentea for the patch.
see also https://github.com/rsyslog/librelp/issues/59
closes https://github.com/rsyslog/librelp/issues/60
– fixed a couple of minor issues:
* fix memory leak when relp frame construction fails
detected by clang static analyzer
* removed unnecessary code
detected by clang static analyzer
* fix memory leak
This leaks occurs if the process is already totally out of memory,
a situation that is very rare and will also cuase other troubles.
So the practical relevance of this patch looks rather slim.
Detected by clang static analyzer.
* fix memory leak on relpSrvRun() error
this is kind of cosmetic, because it can only occur when the
run fails, which usually should lead to termination of the
calling application
deteced by Coverity Scan, CID 266016
* fix memory leak on relp listener construction error
detected by Coverity Scan, CID 266014, 266015
* also resolved all other issues reported by Coverity scan
– CI
* added native testbench (formerly used rsyslog for testing)
* added additional compile tests
sha256sum: 0c235dd2a01060ad5e64438879b31ae64e7640d0e262aa1a287a2dd9bc60fd53
librelp stack buffer overflow vulnerability (CVE-2018-1000140)
On Monday March 19th, 2018, the librelp development team was informed by the security team at lgtm.com (Semmle) about a critical security vulnerability in librelp. The vulnerability is a long-standing bug that exists since version 1.1.1 (2013-06-11). It affects the client certificate validation in TLS mode which can lead to a stack buffer overrun and thus remote code execution.
Users of librelp are strongly advised to upgrade their packages as a matter of urgency.
Affected packages and versions
- librelp 1.2.14 down to 1.1.1
Disclosure process
The security team followed best practices when they notified the librelp development team, who subsequently validated their claim. As agreed, one of the researchers applied for a CVE but unfortunately made a mistake in the afternoon of March 20th, which lead to high-level information about the vulnerability becoming public via the Distributed Weakness Filing Project [1].
We have to assume that the vulnerability became publicly known at that point. The librelp team finalized a patch [2] on March 20th, and a new release of librelp was released on March, 21st. It is available in both source and binary form from the project’s package repository. Note that the patch commit message is intentionally vague so as not to attract additional attention while details of the vulnerability were being disclosed.
The vulnerability
The vulnerability is caused by a call to snprintf on line 1205 of tcp.c [3].
This coding pattern is dangerous, because snprintf returns the number of bytes that it would have written if the buffer had been big enough. Most notably, that number is not necessarily equal to the number of bytes that it actually wrote. It is a common mistake to assume that snprintf returns the number of bytes written. In certain situations and if the data provided to snprintf is controlled by an attacker, this can lead to a stack overflow and the potential to remotely execute code. The code analysis provided by lgtm.com detects potentially dangerous uses of snprintf in open source projects: https://lgtm.com/rules/1505913226124/
Unfortunately, librelp is indeed vulnerable. In order to exploit this vulnerability, an attacker needs to be able to connect to a TLS-enabled RELP logging interface provided by librelp (for example, as can be provided by rsyslog). The attacker then needs to supply an X.509 certificate containing more than 32KB of “subject alt names”. One of these strings needs to overlap the 32KB boundary. For example, if one of the strings starts 10 bytes before the end of the 32KB buffer but is 100 bytes long, then the loop in librelp’s tcp.c will write the first 10 bytes of the string to the buffer but still increment iAllNames by 100. On the next iteration of the loop, the next string will be written at a starting offset of 32KB + 90. An attacker can control the size of this “gap” by varying the length of the overlapping string and utilize it to control exactly which part of the stack they want to overwrite. In particular, this means that they can avoid overwriting the stack canary, which makes the vulnerability significantly easier to exploit.
The teams at librelp and lgtm.com have not yet released a proof-of-concept exploit for this vulnerability.
Severity and mitigation
In the opinion of the librelp/rsyslog development team:
- the vulnerability is unquestionably critical as it could lead to RCE
- depending on GnuTLS, it may be hard to actually exploit the vulnerability
- the severity is mitigated if security recommendations are followed
The use of GnuTLS
librelp uses GnuTLS for handling TLS connections. GnuTLS’s behaviour and handling of overly large fields therefore influences this vulnerability in librelp.
The maximum size of the name fields in question is specified in RFC5280 as follows:
Section 4.1:
SubjectAltName ::= GeneralNames
GeneralNames ::= SEQUENCE SIZE (1..MAX) OF GeneralName
Appendix B imposes the following “restriction”:
The construct “SEQUENCE SIZE (1..MAX) OF” appears in several ASN.1 constructs. A valid ASN.1 sequence will have zero or more entries. The SIZE (1..MAX) construct constrains the sequence to have at least one entry. MAX indicates that the upper bound is unspecified.
Implementations are free to choose an upper bound that suits their environment.
As such, much depends on the actual GnuTLS binary that is linked to librelp. If it imposes a limit of 32KB (or less), the vulnerability cannot be exploited. We are currently trying to understand the default behavior of GnuTLS in this regard and will update this advisory when we receive new information.
RFC5280 Sect. 4.2.1.6 imposes restrictions on which characters can be used inside the “subject alternative name”. Brief review indicates that some byte sequences are not permitted. If so, and if GnuTLS implements these validations, the ability of an attacker to inject arbitrary code is reduced, making it harder to craft a workable exploit.
Nevertheless, it should be assumed that GnuTLS does not provide any safeguards. Even if there are, and if those safeguards are enabled by default, we have to consider the possibility of any checks being disabled in a specific binary.
Trusted networks
Regarding the mitigation of the severity of this vulnerability: users are strongly advised not to expose logging port like syslog TCP and RELP to a public network; these ports are best firewalled to make them available only on a dedicated network. If that advice is followed, the risk is significantly reduced to already-compromised systems on a trusted network.
However, some logs-as-a-service providers do expose RELP on the public Internet. The rsyslog team has been in touch with and/or reviewed documentation for leading providers. Those we checked either do not expose RELP at all or do not support TLS with RELP, and are therefore not at risk.
Credit
- Bas van Schaik; lgtm.com / Semmle
- Kevin Backhouse; lgtm.com / Semmle
References
[1] https://docs.google.com/spreadsheets/d/1PlDOsZ4Q36JU4Dz9zyBB2F3814dScppCRCe1muCT7JI
[2] https://github.com/rsyslog/librelp/commit/2cfe657672636aa5d7d2a14cfcb0a6ab9d1f00cf
[3] Affected line in tcp.c (ibrelp version 1.2.14): https://github.com/rsyslog/librelp/blob/532aa362f0f7a8d037505b0a27a1df452f9bac9e/src/tcp.c#L1205
librelp stack buffer overflow vulnerability (CVE-2018-1000140)
On Monday March 19th, 2018, the librelp development team was informed by the security team at lgtm.com (Semmle) about a critical security vulnerability in librelp. The vulnerability is a long-standing bug that exists since version 1.1.1 (2013-06-11). It affects the client certificate validation in TLS mode which can lead to a stack buffer overrun and thus remote code execution.
Users of librelp are strongly advised to upgrade their packages as a matter of urgency.
Affected packages and versions
- librelp 1.2.14 down to 1.1.1
Disclosure process
The security team followed best practices when they notified the librelp development team, who subsequently validated their claim. As agreed, one of the researchers applied for a CVE but unfortunately made a mistake in the afternoon of March 20th, which lead to high-level information about the vulnerability becoming public via the Distributed Weakness Filing Project [1].
We have to assume that the vulnerability became publicly known at that point. The librelp team finalized a patch [2] on March 20th, and a new release of librelp was released on March, 21st. It is available in both source and binary form from the project’s package repository. Note that the patch commit message is intentionally vague so as not to attract additional attention while details of the vulnerability were being disclosed.
The vulnerability
The vulnerability is caused by a call to snprintf on line 1205 of tcp.c [3].
This coding pattern is dangerous, because snprintf returns the number of bytes that it would have written if the buffer had been big enough. Most notably, that number is not necessarily equal to the number of bytes that it actually wrote. It is a common mistake to assume that snprintf returns the number of bytes written. In certain situations and if the data provided to snprintf is controlled by an attacker, this can lead to a stack overflow and the potential to remotely execute code. The code analysis provided by lgtm.com detects potentially dangerous uses of snprintf in open source projects: https://lgtm.com/rules/1505913226124/
Unfortunately, librelp is indeed vulnerable. In order to exploit this vulnerability, an attacker needs to be able to connect to a TLS-enabled RELP logging interface provided by librelp (for example, as can be provided by rsyslog). The attacker then needs to supply an X.509 certificate containing more than 32KB of “subject alt names”. One of these strings needs to overlap the 32KB boundary. For example, if one of the strings starts 10 bytes before the end of the 32KB buffer but is 100 bytes long, then the loop in librelp’s tcp.c will write the first 10 bytes of the string to the buffer but still increment iAllNames by 100. On the next iteration of the loop, the next string will be written at a starting offset of 32KB + 90. An attacker can control the size of this “gap” by varying the length of the overlapping string and utilize it to control exactly which part of the stack they want to overwrite. In particular, this means that they can avoid overwriting the stack canary, which makes the vulnerability significantly easier to exploit.
The teams at librelp and lgtm.com have not yet released a proof-of-concept exploit for this vulnerability.
Severity and mitigation
In the opinion of the librelp/rsyslog development team:
- the vulnerability is unquestionably critical as it could lead to RCE
- depending on GnuTLS, it may be hard to actually exploit the vulnerability
- the severity is mitigated if security recommendations are followed
The use of GnuTLS
librelp uses GnuTLS for handling TLS connections. GnuTLS’s behaviour and handling of overly large fields therefore influences this vulnerability in librelp.
The maximum size of the name fields in question is specified in RFC5280 as follows:
Section 4.1:
SubjectAltName ::= GeneralNames
GeneralNames ::= SEQUENCE SIZE (1..MAX) OF GeneralName
Appendix B imposes the following “restriction”:
The construct “SEQUENCE SIZE (1..MAX) OF” appears in several ASN.1 constructs. A valid ASN.1 sequence will have zero or more entries. The SIZE (1..MAX) construct constrains the sequence to have at least one entry. MAX indicates that the upper bound is unspecified.
Implementations are free to choose an upper bound that suits their environment.
As such, much depends on the actual GnuTLS binary that is linked to librelp. If it imposes a limit of 32KB (or less), the vulnerability cannot be exploited. We are currently trying to understand the default behavior of GnuTLS in this regard and will update this advisory when we receive new information.
RFC5280 Sect. 4.2.1.6 imposes restrictions on which characters can be used inside the “subject alternative name”. Brief review indicates that some byte sequences are not permitted. If so, and if GnuTLS implements these validations, the ability of an attacker to inject arbitrary code is reduced, making it harder to craft a workable exploit.
Nevertheless, it should be assumed that GnuTLS does not provide any safeguards. Even if there are, and if those safeguards are enabled by default, we have to consider the possibility of any checks being disabled in a specific binary.
Trusted networks
Regarding the mitigation of the severity of this vulnerability: users are strongly advised not to expose logging port like syslog TCP and RELP to a public network; these ports are best firewalled to make them available only on a dedicated network. If that advice is followed, the risk is significantly reduced to already-compromised systems on a trusted network.
However, some logs-as-a-service providers do expose RELP on the public Internet. The rsyslog team has been in touch with and/or reviewed documentation for leading providers. Those we checked either do not expose RELP at all or do not support TLS with RELP, and are therefore not at risk.
Credit
- Bas van Schaik; lgtm.com / Semmle
- Kevin Backhouse; lgtm.com / Semmle
References
[1] https://docs.google.com/spreadsheets/d/1PlDOsZ4Q36JU4Dz9zyBB2F3814dScppCRCe1muCT7JI
[2] https://github.com/rsyslog/librelp/commit/2cfe657672636aa5d7d2a14cfcb0a6ab9d1f00cf
[3] Affected line in tcp.c (ibrelp version 1.2.14): https://github.com/rsyslog/librelp/blob/532aa362f0f7a8d037505b0a27a1df452f9bac9e/src/tcp.c#L1205
librelp 1.2.15
librelp 1.2.15 [download]
This new release of librelp provides several bugfixes and can be built on Solaris and AIX.
For more details, please take a look at the changelog below.
– made build on AIX
Thanks to Philippe Duveau for providing the patches
– bugfix: invalid handling of snprintf() return code
– bugfix: invalid assert predicate
an assert could change status variable due to typo, so in debug
mode processing could fail.
thanks to github user KatMisato for alerting us
fixes https://github.com/rsyslog/librelp/issues/66
– some code cleanup
– bugfix: error message on open error was truncated
The “connection already open” error message when trying to open
an already open connection was truncated due to too-small size
specified.
Thanks to rsyslog forum user AlanR for the problem report.
sha256sum: a931832d9056660feee76d52195b21d4e9e06d5ec8e96b26af44e998529da999