The Problem
It has been noticed that rsyslogd DOES block when waiting on a remote rsyslog server over TCP. Its non-blocking functionality/rate limiting is IO related and limited to the local ruleset. This causes issues for applications like Squid in that, if it expects the system to respond and it doesn’t, squid hangs as well. On the contrary, if there is no rsyslogd running at all, then squid doesn’t seem to expect a response and continues to run without logging.
Generally the rsyslog queueing subsystem tries to buffer to memory initially. So even if the remote server goes offline, no disk file is generated. File on disk are created only if there is need to, for example if rsyslog runs out of (configured) memory queue space or needs to shutdown (and thus persist yet unsent messages).
Note that actual spool files are only created if the remote server is down and there is no more space in the in-memory queue. By default, a short failure of the remote server will never result in the creation of a disk file as a couple of hundred messages can be held in memory by default. These parameters can be fine-tuned
The problem here is that (by design!), the TCP send() API always returns success but buffers the message locally until it can be transmitted to the remote host. This behavior is fine - it enables TCP to be used reliably even when a network path is temporarily down. The downside is that the sender never knows if and when the data was received by the remote peer. Of course, there is an upper limit on the buffer size. It’s depending on TCP window negotiations, and if I remember correctly the ultimate upper limit is around 240K. I haven’t checked, but the lower limit is probably around 1.5K, the size of an Ethernet packet.
If we now assume a generous syslog message size of 150 bytes (many are much smaller), we can have between 10 and 1,600 messages sitting in this buffer! So we may lose up to 1,600 messages if the server goes down - without even noticing it.
It is probably good to add that this is not a rsyslog problem. It’s a protocol issue and as such all softwares implementing plain TCP syslog have the same shortcoming!
What I believe is happening is that the rsyslog is waiting for some ACK to return from the remote server but requests are being dropped and maybe the following config line is just leaving it in a state of limbo
$ActionResumeRetryCount -1 # infinite retries if host is down
Solutions
- only log to local files and have rsyslog read these. - I understand there is some formatting issues with doing this that could be problematic for the bigger central logging system. what we are losing in terms of log granularity far outweighs the gains by allowing the application to log to a local file and have that file read in by rsyslog.
- Switch syslog from transmitting over TCP to UDP - unreliable transport. Not recommended
- Switch syslog from transmitting over TCP to http://www.rsyslog.com/doc/imrelp.html - strong possibility as RELP supports app-level acks, so that the client (sender) knows what was processed by the server and what not. If so, it can resend the right messages in case of a connection failure. Is rsyslogs recommended solution to remote logging for all of the reasons given above. It would need testing still as to whether this module will prevent the local client from blocking applications. http://www.rsyslog.com/tag/relp/ is simple.
- further rsyslog tuning should be looked into - http://linux.die.net/man/5/rsyslog.conf http://www.rsyslog.com/doc/rsyslog_conf_modules.html
Testing
Plan
- run rsyslog in TCP mode and count errors in /var/log/jboss/server.log, /var/log/jboss/timing.log, /var/log/jboss/access.log (status 4xx and status 5xx) and /var/log/squid/watcher (the log file of a script created to query squid every 5 seconds using squidclient -k) for failures
- block remote rsyslog and recount the number of errors over the same time period
- setup rsyslog to use RELP protocol and repeat steps 1 and 2
- if there is still a high error rate, modify squid to point individual log files, one-by-one to disk to see if only one particular log causes the failures.
Results
Load simulation with 5k users:
Rsyslog Version | Protocol | Squid Fail? | Time taken |
5.8.10 | TCP | Yes | ~3 mins |
5.8.10 | RELP | Yes (but only intermittent failure and significantly reduced) | ~8 mins |
7.2.6 | TCP | No | over 15 mins tested |
7.2.6 | RELP | No | over 15 mins tested |
We then increased load testing up to 100k requests per minute, the switchover to RELP and rsyslog v7 and after 40 minutes of load, zero failures were seen.