FAQ: Troubleshooting UDP Packet Loss
====================================

This document addresses common questions related to diagnosing and resolving UDP packet loss when using the ``imudp`` input module. While UDP is fast, it is inherently unreliable. These entries will help you understand why loss occurs and how to mitigate it.

.. _faq-imudp-q1:

Q1: I see packet loss in my OS statistics for ``imudp``. What are the potential causes, troubleshooting steps, and probable fixes?
----------------------------------------------------------------------------------------------------------------------------------

**Answer:**

This is a common issue when dealing with UDP-based logging. Packet loss typically occurs when the rate of incoming messages overwhelms the system's or rsyslog's capacity to process them.

Potential Causes
~~~~~~~~~~~~~~~~

* **High Message Rate:** A burst of traffic or a sustained high volume of logs exceeds the processing capacity.
* **Insufficient OS Buffers:** The kernel's network receive buffer for the UDP socket is too small and overflows. This is the most common cause.
* **CPU Limitations:** The CPU is too busy to service the network queue, causing the kernel buffer to fill and drop packets.
* **Rsyslog Configuration:** The default ``imudp`` settings may not be optimized for a high-traffic environment. Downstream actions within your rsyslog configuration could also be slow, causing backpressure.
* **Network Issues:** Packet loss is happening on the network before the packets even reach the rsyslog server.

Troubleshooting Steps
~~~~~~~~~~~~~~~~~~~~~

1. **Confirm Kernel Drops:** First, verify that the OS kernel is the one dropping packets due to a full receive buffer. See the specific FAQ entry on how to check this with ``netstat`` or ``ss``.
2. **Check for Unknown Ports:** Use OS network statistics to see if packets are arriving on ports you are not listening on, which indicates a client misconfiguration.
3. **Inspect Network Traffic:** Use a tool like ``tcpdump`` to see if the packets are arriving at the server's network interface. This helps rule out upstream network problems.
4. **Enable ``impstats``:** Use rsyslog's ``impstats`` module to monitor internal queue performance and see if rsyslog itself is dropping messages after they have been received from the kernel.

Probable Fixes
~~~~~~~~~~~~~~

1. **Increase Kernel Receive Buffer:** This is the most effective fix. You can request a larger buffer directly in your ``imudp`` configuration. Rsyslog will attempt to set it using privileged calls if run as root.

   .. code-block:: rsyslog

      # Request a 4MB receive buffer for the UDP input
      input(type="imudp" port="514" rcvbuf="4m")

   If you are not running as root, you must also increase the OS-level maximum:

   .. code-block:: sh

      # Set the maximum allowed buffer size to 8MB
      sysctl -w net.core.rmem_max=8388608

2. **Tune ``imudp`` Performance:** Assign more worker threads to process incoming data in parallel and adjust the batch size.

   .. code-block:: rsyslog

      input(type="imudp" port="514" threads="4" batchSize="256")

3. **Switch to a Reliable Protocol:** If packet loss is unacceptable, the ultimate fix is to stop using UDP. Migrate your clients and server to use **TCP (``imtcp``)** or, even better, **RELP (``imrelp``)**, which is designed for guaranteed log delivery.

.. _faq-imudp-q2:

Q2: How can I definitively check if the OS kernel is dropping UDP packets due to full receive buffers?
------------------------------------------------------------------------------------------------------

**Answer:**

Yes, you can check this directly by querying kernel network statistics. The kernel increments a specific counter every time it drops a UDP packet because a socket's receive buffer is full.

Use one of the following commands:

1. **``netstat`` (classic tool):**

   .. code-block:: sh

      netstat -su

   In the output, look for the ``receive buffer errors`` counter in the ``Udp:`` section. A non-zero, increasing value is a clear indicator of packet loss due to buffer overflow.

2. **``nstat`` (cleaner output):**

   .. code-block:: sh

      nstat -auz

   Look for the ``UdpRcvbufErrors`` counter. It directly corresponds to this issue.

To troubleshoot, check the value before and after a period of high traffic. If the counter increases, you have confirmed the cause of the packet loss.

----

Q3: Why does ``tcpdump`` show traffic as "ip-proto-17" without a port number?
-----------------------------------------------------------------------------

**Answer:**

This indicates you are seeing **fragmented UDP packets**.

When a device sends a UDP datagram that is larger than the network's Maximum Transmission Unit (MTU, typically ~1500 bytes), the IP layer must break it into smaller fragments.

* Only the **first fragment** contains the full UDP header with the source and destination ports.
* All **subsequent fragments** do not contain the UDP header. ``tcpdump`` can see from the IP header that the payload belongs to protocol 17 (UDP), but it cannot find the port numbers, so it simply labels it ``ip-proto-17``.

The presence of many such packets means a device on your network is sending very large log messages or other UDP data. The solution is to identify the source device and, if possible, configure it to send smaller messages.

----

Q4: My OS stats show "packets to unknown port". How do I find out which port is being used?
-------------------------------------------------------------------------------------------

**Answer:**

The "packets to unknown port" counter is incremented when a UDP packet arrives on a port where no application is listening. To find the destination port, you must use a network sniffing tool like ``tcpdump``.

This command will show all UDP traffic that is *not* going to your standard, known syslog ports (e.g., 514 and 10514).

.. code-block:: sh

   tcpdump -i any -n 'udp and not (dst port 514 or dst port 10514)'

The output will show you the destination IP and port of the unexpected traffic, allowing you to identify the misconfigured client. For example, ``10.1.1.5.4321 > 10.1.1.1.9999`` shows a packet destined for port ``9999``.

----

Q5: How does rsyslog set the UDP receive buffer? Does it use ``SO_RCVBUFFORCE``?
--------------------------------------------------------------------------------

**Answer:**

Yes, rsyslog has a sophisticated, two-step process for setting the buffer size specified by the ``rcvbuf`` parameter in ``imudp``:

1. **Attempt ``SO_RCVBUFFORCE``:** First, it tries to set the buffer using the ``SO_RCVBUFFORCE`` socket option. This is a privileged call that bypasses the system's maximum buffer limit (``net.core.rmem_max``). This call will only succeed if rsyslog is running with ``CAP_NET_ADMIN`` capabilities (e.g., as the ``root`` user).

2. **Fallback to ``SO_RCVBUF``:** If the ``SO_RCVBUFFORCE`` call fails (e.g., because rsyslog is not running as root), it immediately falls back to using the standard, unprivileged ``SO_RCVBUF`` call. This call is limited by the system-wide ``net.core.rmem_max`` setting.

This means that if you run rsyslog as root, setting a large ``rcvbuf`` is sufficient. If you run as a non-privileged user, you must also tune the ``net.core.rmem_max`` sysctl parameter.