Linux Network Performance Tuning Prompt
Diagnose slow network throughput, high latency, retransmits, ephemeral port exhaustion, and tune TCP/UDP stack parameters (BBR, buffers, queues) safely.
- Target user
- Linux sysadmins and SREs tuning host network performance
- Difficulty
- Advanced
- Tools
- Claude, ChatGPT
The prompt
You are a senior Linux network engineer who has tuned production network stacks for high-throughput services (CDNs, load balancers, databases). You know which sysctls matter and which are cargo-culted. I will provide: - The symptom (throughput below link spec, high p99 latency, retransmit storms, "connection reset", ephemeral port exhaustion, accept queue overflow) - Host role: client / server / proxy / load-balancer - NIC and link spec (`ethtool <iface>`, `ethtool -i <iface>`) - Current congestion control: `sysctl net.ipv4.tcp_congestion_control` - Output of `ss -s`, `ss -tnp state established | wc -l`, `nstat`, `netstat -s | egrep -i "retrans|drop|listen"` - Distro, kernel version, sysctl baseline (`sysctl -a 2>/dev/null | grep -E "(rmem|wmem|tcp|netdev|somaxconn)"`) Your job: 1. **Classify the symptom**: - **Throughput below link** → window, buffer, or NIC-offload issue - **High latency p99 only** → buffer bloat, retransmit, or NIC interrupt pinning - **Retransmit storms** → loss, drops, ECN misconfiguration, MTU blackhole - **`netstat -s` listen overflow** → `somaxconn` / app accept queue too small - **Ephemeral port exhaustion** → outbound-heavy host; `ip_local_port_range`, `tcp_tw_reuse` - **`Connection reset`** → backlog full, conntrack table, app `RST` on close, firewall 2. **TCP throughput math**: max-throughput ≈ window_size / RTT. A 1 MB window over 100ms RTT caps at ~80 Mbps. Bump buffers if BDP exceeds default. 3. **Congestion control choice**: - **`cubic`** (Linux default) — fair, stable, latency-tolerant. Good for general - **`bbr`** — high throughput on lossy/long-fat paths; uses bandwidth × RTT model; can be unfair to cubic neighbors - Switch with `sysctl net.ipv4.tcp_congestion_control=bbr`; needs `tcp_bbr` module 4. **Buffer auto-tuning** (default in modern kernels): `net.ipv4.tcp_rmem` / `tcp_wmem` are min/default/max — raise max for long-fat networks. 5. **NIC tuning**: - **Multi-queue + IRQ affinity** (`ethtool -L`, `irqbalance`, manual `/proc/irq/<n>/smp_affinity`) - **Offload features** (`ethtool -k`): TSO, GSO, GRO, LRO. Helpful for throughput, can hurt latency or break NV-routed traffic - **Ring buffer size** (`ethtool -G`): raise if `ifconfig` shows drops in rx ring 6. **For load balancers / proxies**: tune `somaxconn`, `tcp_max_syn_backlog`, app's `listen(backlog)`. Listen drops are invisible without `nstat`. 7. **Conntrack** (firewalled hosts): `nf_conntrack_max`, `nf_conntrack_buckets`, hash size. Table full = silent packet drops. 8. **For DSCP / QoS / multi-queue scheduling**: `fq_codel`, `cake`, or `mq` qdiscs — defaults are usually fine; `pfifo_fast` is legacy. Mark DESTRUCTIVE: disabling firewall to "test," switching to `bbr` on a load balancer mid-day, dropping ring buffer size, disabling offloads on a live link. --- Symptom: [DESCRIBE — include rate, latency target, link spec] Host role: [client/server/proxy/LB] NIC + link: [`ethtool` output] TCP / sysctl baseline: ``` [PASTE relevant `sysctl -a` excerpts] ``` `ss -s`, `nstat`, `netstat -s` highlights: ``` [PASTE] ``` Reproduction: `iperf3 -c <host>` or workload-specific benchmark: ``` [PASTE] ```
Why this prompt works
Network tuning is rife with cargo-cult sysctls copied from Stack Overflow answers a decade old. This prompt forces measurement-driven tuning: identify the actual bottleneck (window, drops, buffers, NIC) before changing parameters.
How to use it
- Measure first:
iperf3for raw throughput,ss -tifor per-flow TCP info (cwnd,rtt, retrans),nstatfor kernel counters. - State the target: “1 Gbps link, currently getting 300 Mbps” tells the model the gap.
- Include
nstatandnetstat -s— drop and overflow counters are diagnostic. - Identify role: server-side tuning differs from client-side (LB tunes accept queue; client tunes ephemeral ports).
Useful commands
# Link spec
ethtool <iface> # speed, duplex
ethtool -i <iface> # driver
ethtool -S <iface> | head -40 # extended stats (drops, errors)
ethtool -k <iface> # offload features
ethtool -g <iface> # ring buffer sizes
ethtool -L <iface> # multi-queue setting
# TCP stack overview
ss -s # summary
ss -ti '( sport = :443 )' | head # per-flow info: cwnd, rtt, retrans
ss -tnp state listening
ss -lntp # listeners
# Kernel counters
nstat # all SNMP counters; deltas between runs
netstat -s | egrep -i "retrans|drop|listen|overflow"
sar -n EDEV 1 5 # per-NIC error rates
sar -n TCP,ETCP 1 5 # TCP rates
# Buffers / autotuning
sysctl net.ipv4.tcp_rmem net.ipv4.tcp_wmem net.core.rmem_max net.core.wmem_max
sysctl net.ipv4.tcp_congestion_control
sysctl net.core.somaxconn net.ipv4.tcp_max_syn_backlog
# Conntrack
sysctl net.netfilter.nf_conntrack_max net.netfilter.nf_conntrack_buckets
cat /proc/sys/net/netfilter/nf_conntrack_count
dmesg | grep -i conntrack
# IRQ pinning / multi-queue
cat /proc/interrupts | grep <iface>
mpstat -I SCPU 1 3
sudo ethtool -L <iface> combined N # set N queues
# Throughput test
iperf3 -s # on receiver
iperf3 -c <server> -P 4 -t 30 # 4 parallel streams
iperf3 -c <server> -R # reverse
# Latency
mtr -rwbc 100 <host>
ping -c 100 -i 0.1 <host>
Tuning patterns
High-throughput server (long-fat network)
# /etc/sysctl.d/99-network-perf.conf
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864
net.ipv4.tcp_congestion_control = bbr
net.core.default_qdisc = fq # required by BBR for best behavior
net.core.netdev_max_backlog = 16384
Connection accept-heavy server (load balancer)
net.core.somaxconn = 65535 # also bump app's listen(backlog)
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.ip_local_port_range = 1024 65535
Outbound-heavy client (API gateway)
net.ipv4.tcp_tw_reuse = 1
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_fin_timeout = 15
NIC tuning (post-sysctl)
# Maximize multi-queue
sudo ethtool -L eth0 combined $(nproc)
# Raise ring buffer if drops in rx
ethtool -g eth0 # max?
sudo ethtool -G eth0 rx 4096 tx 4096
# Pin queues to specific cores (basic)
sudo systemctl stop irqbalance
sudo bash -c 'i=0; for q in /proc/irq/*/eth0-rx-*; do echo $((1 << i)) > "$q/smp_affinity"; i=$((i+1)); done'
Common findings this catches
netstat -s | grep "listen drops"> 0 with steady arrival → app’s listen backlog too small; raisesomaxconnAND app config.ethtool -S | grep droprising → ring buffer too small or NIC hardware drops; raiserxring.nstat | grep TcpExtTCPSackRecvhigh → significant out-of-order; check for path-MTU or middlebox loss.- Throughput plateaus at exactly link speed / 8 → window-limited; BDP > current window; raise tcp_rmem/wmem max.
- Long-fat path stuck at low throughput on cubic → switch to BBR with
fqqdisc; expect 2-10× on lossy paths. - Conntrack table full in
dmesg→ raisenf_conntrack_maxANDnf_conntrack_buckets. - Ephemeral port exhaustion on outbound API gateway → enable
tcp_tw_reuse; widen port range.
When to escalate
- NIC firmware bugs (consistent silent drops not in counters) — driver update or NIC replacement.
- Cross-region throughput limited by physical / provider topology — tuning won’t help; choose different placement.
- Application accepting connections slowly (not the kernel) — coordinate with app team; backlog tuning only papers over.
Related prompts
-
Linux Block I/O Performance Investigation Prompt
Diagnose slow disk I/O, high iowait, queue depth saturation, and storage performance regressions using iostat, blktrace, fio, and per-device metrics.
-
Linux High Load & CPU Runaway Investigation Prompt
Diagnose high load average, CPU saturation, run-queue pressure, IRQ storms, and steal time on Linux servers — distinguish user CPU vs system CPU vs I/O wait vs steal.
-
Linux Host Network Connectivity Debug Prompt
Diagnose single-host Linux networking — broken routes, firewall blocks, DNS, conntrack exhaustion, ephemeral port exhaustion, MTU issues — without confusing it with cloud/SDN problems.