Skip to content
DevOps AI ToolKit
Newsletter
All guides
AI for Linux Admins By James Joyner IV · · 10 min read

Debugging the Linux ARP and Neighbor Table with ip neigh

Stale ARP entries, FAILED neighbor states, and gratuitous ARP cause baffling intermittent connectivity. Here's how to read the Linux neighbor table and fix it with AI help.

  • #linux
  • #ai
  • #networking
  • #arp
  • #ip-neigh
  • #troubleshooting

You can ping a host one minute and not the next. DNS is fine. The route is fine. tcpdump shows your packets leaving but nothing coming back. After ten minutes of doubting the network team, you finally run ip neigh and there it is: the gateway sitting in FAILED state, or worse, a neighbor entry pointing at a MAC address that moved to a different host hours ago.

The ARP cache — the neighbor table in modern terms — is one of those layers you forget exists until it silently breaks your day. It lives below routing and above the wire, and when it goes wrong it produces exactly the kind of intermittent, “works for me” symptom that wastes afternoons. Let me walk through how I actually read and fix it, and where an AI copilot earns its keep turning raw state dumps into a verdict.

Stop using arp, start using ip neigh

If you still type arp -n, retire the habit. The net-tools package is deprecated and its ARP view is IPv4-only and lossy. The modern tool is ip neigh, which covers IPv4 ARP and IPv6 NDP in one place and — critically — exposes the state of each entry, which is where all the diagnostic value lives.

ip neigh show
# 192.168.1.1   dev eth0 lladdr 00:1a:2b:3c:4d:5e REACHABLE
# 192.168.1.50  dev eth0 lladdr 00:1a:2b:3c:4d:6f STALE
# 192.168.1.99  dev eth0  FAILED

Those trailing words are not decoration. The Linux neighbor table is a small state machine, and the state tells you whether the kernel currently believes it can reach that host.

Reading the neighbor states

There are a handful of states and each means something specific:

  • REACHABLE — confirmed good; the kernel verified this neighbor recently.
  • STALE — the entry exists but hasn’t been confirmed lately. This is normal. The kernel will re-validate on next use. STALE is not a problem.
  • DELAY / PROBE — the kernel is actively re-checking reachability right now.
  • FAILED — resolution failed; the kernel sent ARP/NDP requests and got no answer. This is the one that breaks connectivity.
  • PERMANENT — a static entry you (or something) added manually; it never expires and never re-validates.
  • INCOMPLETE — resolution in progress, no reply yet.

The two that cause real trouble are FAILED (the neighbor genuinely isn’t answering ARP — wrong VLAN, host down, firewall eating ARP) and a stale-but-wrong PERMANENT or cached entry pointing at a MAC that has since moved. The classic case of the latter: a service IP fails over to a new host, but your box keeps sending frames to the old MAC because nothing flushed the cache and no gratuitous ARP arrived.

A real triage flow

Here’s the sequence I run when connectivity to one specific host is flapping:

# What does the kernel think the neighbor looks like right now?
ip neigh show 192.168.1.50

# Watch it transition as you generate traffic
ping -c1 192.168.1.50; ip neigh show 192.168.1.50

# Confirm the MAC against what's actually on the wire
sudo tcpdump -ni eth0 arp and host 192.168.1.50

# If you suspect a stale entry, delete it and force re-resolution
sudo ip neigh del 192.168.1.50 dev eth0
ping -c1 192.168.1.50
ip neigh show 192.168.1.50   # should now be REACHABLE with the correct MAC

That ip neigh del followed by a re-ping is the single most useful move in the whole exercise: it forces the kernel to throw away whatever it cached and resolve fresh. If the entry comes back correct and connectivity returns, you had a stale-cache problem. If it comes back FAILED, the problem is below you — switching, VLAN, or the host itself.

Where AI turns dumps into answers

The state machine is small but the evidence gets messy fast: a neighbor dump across several interfaces, a tcpdump of ARP traffic, and a switch’s MAC table all need to be cross-referenced. That correlation is exactly what an AI copilot is good at. I paste the three captures and ask for a verdict.

Prompt: You are a senior network engineer. Here is ip neigh show for eth0, a 20-second tcpdump -ni eth0 arp, and the symptom (intermittent loss to 192.168.1.50). Tell me whether this is a stale-cache problem, an ARP resolution failure, or a duplicate-MAC/IP conflict, and the single command to confirm.

Response (excerpt): The neighbor for .50 is PERMANENT with MAC …:6f, but the tcpdump shows .50 replying from …:9a — two different MACs for one IP. This is a stale static entry shadowing the real host (likely after a failover). Delete it: sudo ip neigh del 192.168.1.50 dev eth0, then re-ping; the entry should repopulate as REACHABLE with …:9a.

The model spotted the MAC mismatch across two captures faster than I’d scan them by hand. But notice the discipline: it didn’t tell me to reboot or flush the whole table — it identified one entry and gave a confirmable next step. That’s the right altitude. The AI decodes and correlates; I run the delete and watch the state transition with my own eyes before declaring victory.

The traps worth knowing

A few things that bite people:

  • Don’t flush the entire table reflexively. ip neigh flush all will briefly disrupt every active connection on the box while entries re-resolve. Delete the one bad entry instead.
  • PERMANENT entries don’t self-heal. If someone added a static ARP entry “to fix a problem” months ago, it will happily point at a dead MAC forever. Audit for PERMANENT entries on hosts with weird connectivity.
  • Gratuitous ARP and failovers. When an IP moves between hosts, the new owner is supposed to broadcast a gratuitous ARP so everyone updates their cache. If it doesn’t (or a switch swallows it), your stale entry is the result. This is common with floating VIPs and keepalived.
  • ARP table overflow. On busy L2 segments, the neighbor cache can fill up; watch dmesg for “neighbour table overflow” and tune net.ipv4.neigh.default.gc_thresh* if you see it. That’s a tuning problem, not a per-host bug.

The takeaway

The neighbor table is a thin, often-ignored layer that produces some of the most maddening intermittent failures in Linux networking. The fix is rarely complicated once you can see the state — REACHABLE, STALE, FAILED, PERMANENT each tell a clear story, and a single targeted ip neigh del resolves the most common stale-cache case. AI is genuinely useful here for correlating a neighbor dump with a packet capture and handing you a verdict, but the model’s job ends at the diagnosis. You delete the one bad entry, watch it re-resolve correctly, and confirm connectivity yourself — the human stays in control of anything that touches live state.

For the broader connectivity workflow this slots into, see troubleshooting Linux network connectivity layer by layer and the modern ip and iproute2 toolkit. When you want a reusable starting point, the Linux host network connectivity debug prompt gives the AI the right framing to triage from your captures.

Free download · 368-page PDF

Download the Free 500-Prompt DevOps AI Toolkit

500 battle-tested, copy-paste AI prompts engineered by a senior systems engineer — every one with fill-in placeholders and safety/back-out notes. Drop your email and it's yours.

  • 500 prompts: Linux · Kubernetes · Terraform · OpenStack · GitLab · Docker · Monitoring · Incident Response
  • Instant PDF download — yours free, forever
  • Plus one practical AI-workflow email a week (no spam)

Single opt-in · unsubscribe anytime · no spam.