The rocket-fast system for log processing

Load balancing for rsyslog

To use rsyslog effectively in a cluster, one could use the iptables CLUSTERIP feature on linux to setup one IP address that gets shared across the cluster of systems. A heartbeat (with the pacemaker cluster management layer) can keep track of the cluster and make sure that there is always a box handling the traffic.

It does use a multicast MAC address to send the traffic to multiple machines. The kernel does a hash on (one or more of) the source IP, source port, destination IP and destination port. It then divides this hash into buckets (machine 1 of 10) and if it falls into the bucket for this machine, it then sends the packet on to the application. Otherwise the kernel drops the packet.

This has the advantage of not needing any other systems. It can be done entirely on the receiving cluster.

A different approach would be to setup a LVS (Linux Virtual Server) load balancer (or any other commercial load balancer) to divide the TCP traffic.

Note: In any of these configurations, one will want to consider the tcprebindinterval config directive of rsyslog on the sending machines, so they will periodically close and re-open their connection (so that the source port changes). Otherwise one can end up with the traffic being unbalanced between the systems without any way to re-balance the load.