You are a senior Linux network engineer who has stood up and debugged bonded NICs on switches from Cisco, Arista, Juniper, and Mellanox. You can read `/proc/net/bonding/bond0` like a chart and tell whether LACP failed at L1 or L2. I will provide: - The bond mode (`active-backup` / `balance-tlb` / `balance-alb` / `802.3ad` / `balance-rr`) - The symptom (slave down, full bond down, throughput well below sum of links, traffic only on one link, LACP not forming) - Output of `cat /proc/net/bonding/bond0` - `ip -d link show bond0` and `ip -d link show <slave>` - `ethtool <slave>` and `ethtool -S <slave>` per slave - The switch side: LACP enabled? LAG group config? port channel summary if Cisco? - dmesg lines around bonding/LACP events Your job: 1. **Decode the bond mode-specific expectations**: - **`active-backup`** (mode 1) — only ONE slave active; failover on link down. Throughput = single-link. - **`balance-tlb`** (mode 5) — outbound balanced by load; inbound on a single slave. No switch config needed. - **`balance-alb`** (mode 6) — both directions balanced via ARP negotiation. No switch config needed. - **`802.3ad` / LACP** (mode 4) — requires switch-side LAG/port-channel. Hash-based distribution. - **`balance-rr`** (mode 0) — packet-level round-robin. Can cause TCP reordering; rare. 2. **For LACP (mode 4) failures**: - **Aggregator ID** — both slaves should be in the SAME aggregator (visible in `/proc/net/bonding/bond0`) - **Partner Mac Address** — must be non-zero (received LACPDU); zero = no LACP from switch side - **Partner Key** — must match within the aggregator - **Actor / Partner state** — LACPDUs include flags; "Activity," "Timeout," "Aggregation," "Synchronization," "Collecting," "Distributing" - All-zeros partner = switch not sending LACP, or wrong VLAN/trunk config 3. **Hash policy** (`xmit_hash_policy`): - **`layer2`** (default) — hash on MAC. Two hosts always pick the same slave. - **`layer2+3`** — MAC + IP. Adds IP differentiation; better for routed traffic. - **`layer3+4`** — IP + TCP/UDP port. Best for many flows between two hosts (same MAC). - For server-to-server bulk transfers (e.g., backups), `layer3+4` gets parallelism. Default `layer2` results in single-link throughput. 4. **For throughput < sum of links** in mode 4: - **Single flow** (one TCP connection) is hashed to ONE slave — it cannot exceed single-link speed. This is by design. - **Many flows** should distribute. If they don't, check `xmit_hash_policy` matches the workload. - Switch-side hash policy must complement (most modern switches have symmetric hashing). 5. **For slave flapping**: - `miimon` polls link state via MII; default 100ms - `arp_interval` + `arp_ip_target` polls via ARP — useful for switches that hide MII status - Confirm `link detected: yes` in `ethtool <slave>` - dmesg may show MII link toggles 6. **For `active-backup` failover delay**: - `updelay` and `downdelay` set hysteresis; defaults often 0 (immediate). Raise if seeing flap from brief blips. - `primary` option pins which slave is preferred when both up 7. **For asymmetric traffic**: - `tlb`/`alb` modes have inbound-on-one-slave property by design - LACP relies on switch's hash; check switch-side `show port-channel hash-distribution` Mark DESTRUCTIVE: changing bond mode requires bond-down (loss of all traffic), removing a slave from a single-slave bond. --- Bond mode: [DESCRIBE] Symptom: [DESCRIBE — throughput target vs actual, slave count, switch model] `cat /proc/net/bonding/bond0`: ``` [PASTE] ``` `ip -d link show bond0` and each slave: ``` [PASTE] ``` Per-slave `ethtool <slave>` and `ethtool -S <slave>` highlights: ``` [PASTE] ``` Switch-side config or `show etherchannel summary`: ``` [PASTE] ``` Recent dmesg (bond/LACP): ``` [PASTE] ```

Why this prompt works

Bonding failures often look like “the link is fine” while throughput is half. The mode determines what’s possible — active-backup will never beat one-link throughput regardless of switch — and LACP debugging requires reading partner state precisely. This prompt forces a mode-aware diagnosis.

How to use it

State the bond mode upfront. Diagnosis differs.
Always include /proc/net/bonding/bond<N> — it has the LACP partner detail, aggregator ID, slave status.
Include the switch side when LACP is involved. The bond can’t form alone.
For throughput problems, identify how many flows are involved. Single-flow over LACP cannot exceed one link.

Useful commands

# Bond status (most informative)
cat /proc/net/bonding/bond0
ip -d link show bond0
ip -d link show <slave>

# Per-slave NIC
ethtool <slave>                    # link, speed, duplex
ethtool -S <slave>                 # extended (drops, errors)
ethtool -k <slave>                 # offload features
ethtool -i <slave>                 # driver

# LACP-specific in /proc/net/bonding/bond0:
# - "Actor Mac address" — should be a real MAC
# - "Partner Mac Address" — should be the switch's MAC (NOT 00:00:00:00:00:00)
# - "Aggregator ID" — all working slaves in the same aggregator
# - "Actor/Partner key" — should match
# - "Actor/Partner Port State" — should be 0x3D or similar (Activity, Timeout, Aggregation, Sync, Collecting, Distributing)

# Detail
sudo ip -s link show bond0          # statistics
sudo ip -s link show <slave>

# Add / remove slaves dynamically
sudo ip link set <slave> down
sudo ip link set <slave> nomaster   # remove from bond
sudo ip link set <slave> master bond0  # add to bond

# Config files (Ubuntu/Debian netplan)
cat /etc/netplan/*.yaml
# RHEL/CentOS NetworkManager
sudo nmcli connection show
sudo nmcli connection show bond0
sudo nmcli connection modify bond0 bond.options "mode=802.3ad,miimon=100,xmit_hash_policy=layer3+4"

# Test throughput (multi-flow for LACP)
iperf3 -c <server> -P 8 -t 30

# dmesg
dmesg | grep -E "bond|802.3ad|802.1Q" | tail -50

Config patterns

Active-backup (simple, no switch config needed)

# Netplan
network:
  version: 2
  bonds:
    bond0:
      interfaces: [eth0, eth1]
      addresses: [192.168.1.10/24]
      gateway4: 192.168.1.1
      parameters:
        mode: active-backup
        primary: eth0
        mii-monitor-interval: 100
        up-delay: 200
        down-delay: 200

802.3ad LACP (requires switch LAG)

network:
  version: 2
  bonds:
    bond0:
      interfaces: [eth0, eth1]
      addresses: [192.168.1.10/24]
      parameters:
        mode: 802.3ad
        transmit-hash-policy: layer3+4
        lacp-rate: fast        # 1s LACPDU vs default 30s
        mii-monitor-interval: 100

NetworkManager (nmcli)

sudo nmcli connection add type bond con-name bond0 ifname bond0 \
    bond.options "mode=802.3ad,miimon=100,xmit_hash_policy=layer3+4,lacp_rate=fast"
sudo nmcli connection add type ethernet con-name slave-eth0 ifname eth0 master bond0
sudo nmcli connection add type ethernet con-name slave-eth1 ifname eth1 master bond0
sudo nmcli connection up bond0

Common findings this catches

Partner Mac Address: 00:00:00:00:00:00 → switch not sending LACP (wrong port, wrong mode). Confirm switch config.
Aggregator ID differs between slaves → only one slave in the active aggregator (others can’t join — usually speed/duplex mismatch). Check ethtool <slave>.
xmit_hash_policy: layer2 with all traffic to one destination MAC → all traffic hashed to one slave. Switch to layer3+4.
Single iperf3 over LACP shows 1 Gbps on a 4×1 Gbps bond — by design (single flow → one link). Test with -P 8.
Mode: balance-rr and TCP retransmits high → packet reordering; switch to LACP or tlb.
slave shows link detected: yes but bond says it’s down → miimon issue; try ARP monitoring (arp_ip_target).
MTU mismatch between slaves → packets get dropped silently; set MTU on bond and slaves.

Mode selection cheatsheet

Goal	Mode	Switch config?
Simple failover	`active-backup` (1)	No
Outbound load distribution, no switch config	`balance-tlb` (5)	No
In + out, no switch config	`balance-alb` (6)	No
Standardized link aggregation	`802.3ad` LACP (4)	Yes (LAG/port-channel)
Maximum throughput single flow	None — bonding can’t exceed single link per flow	—

When to escalate

Switch-side LACP not forming despite correct partner key/state — pull in network team; usually a switch config issue.
Asymmetric traffic causing throughput cap that LACP shouldn’t have — check switch’s hash distribution; may need re-hash.
Driver-specific issues (e.g., MLX or BNX2X under specific kernel versions) — driver upgrade or firmware update; check vendor advisories.

Linux Bonding / LACP Troubleshooting Prompt

Why this prompt works

How to use it

Useful commands

Config patterns

Active-backup (simple, no switch config needed)

802.3ad LACP (requires switch LAG)

NetworkManager (nmcli)

Common findings this catches

Mode selection cheatsheet

When to escalate

Related prompts

Linux Host Network Connectivity Debug Prompt

Linux Network Performance Tuning Prompt

Linux VLAN & Bridge Troubleshooting Prompt

Why this prompt works

How to use it

Useful commands

Config patterns

Active-backup (simple, no switch config needed)

802.3ad LACP (requires switch LAG)

NetworkManager (nmcli)

Common findings this catches

Mode selection cheatsheet

When to escalate

Related prompts

Linux Host Network Connectivity Debug Prompt

Linux Network Performance Tuning Prompt

Linux VLAN & Bridge Troubleshooting Prompt

Free: the DevOps AI Incident-Triage Cheat Sheet