Troubleshooting¶
Each entry below documents a real failure mode that has surfaced during p4net development or use. Symptom first, then cause, then fix.
"Permission denied" creating namespaces / veth¶
Symptom
PermissionError: [Errno 13] Permission denied from ip netns add,
ip link add ... type veth, or tc qdisc add.
Cause. p4net needs CAP_NET_ADMIN to manipulate namespaces, links,
and traffic-control state. Without root or that capability, every
network-touching call fails.
Fix
Run with sudo. If sudo strips your venv from PATH,
use sudo env "PATH=$PATH" p4net <topology.py> or pass the
absolute path: sudo "$(. .venv/bin/activate && which p4net)" ....
setcap cap_net_admin+ep on the Python interpreter works in
principle but has nasty interactions with shebang lookup and
setuid; not recommended unless you understand the trade-offs.
"p4c not found" / "simple_switch_grpc not found"¶
Symptom
FileNotFoundError: [Errno 2] No such file or directory: 'p4c'
or 'simple_switch_grpc'.
Cause. The external compiler or BMv2 binary isn't on PATH.
Fix
Confirm with which p4c and which simple_switch_grpc. If
they're installed but not on PATH, fix PATH in your shell or
invoke p4net with an explicit PATH (sudo env "PATH=$PATH" ...).
If they're not installed, see
Installation → Required external tools.
BMv2 startup hangs¶
Symptom
Network.start() hangs at "waiting for BMv2 gRPC to become ready"
and eventually times out (default 10 s).
Cause. BMv2 takes longer than 10 s to bind its gRPC port. Common
reasons: the host is heavily loaded, the kernel is paging, or
simple_switch_grpc was started in a tight loop and the previous
instance hasn't released the port yet.
Fix
Pass a longer ready timeout to Network:
If startup still hangs, check the BMv2 log under
<log_dir>/<switch>.log — the orchestrator records it. A common
line to look for is Could not bind: Address already in use,
which means a previous instance is still around.
gRPC UNAVAILABLE from the P4Runtime client¶
Symptom
grpc.RpcError: <_InactiveRpcError ... StatusCode.UNAVAILABLE ...>
on client.connect() or the first set_pipeline_config call.
Cause. Same family as the BMv2-startup-hangs case: the client is trying to talk to a port BMv2 hasn't bound yet.
Fix
The orchestrator polls the gRPC port before calling connect(),
so this rarely surfaces in the orchestrator-managed flow. If
you're driving P4RuntimeClient directly, add your own readiness
poll, or accept the failure and retry.
gRPC UNKNOWN with cryptic byte payload¶
Symptom
A write fails with grpc.RpcError(StatusCode.UNKNOWN, ...) and
the message contains binary data.
Cause. BMv2 packs per-update error details into a
grpc-status-details-bin trailer (a serialized google.rpc.Status
containing one or more p4.v1.Error messages with numeric canonical
codes). p4net's P4RuntimeClient._translate_rpc_error decodes those
into specific exceptions (DuplicateEntryError, EntryNotFoundError,
PipelineError, etc.), but only when the trailer is present.
Fix
Make sure you're using p4net ≥ 0.1.0 (the codec landed in phase 5).
If you're seeing the raw UNKNOWN from a client.last_error
access, log the full exception's details() and trailing_metadata()
— the original status is in grpc-status-details-bin.
pyroute2.NetlinkError(17, 'File exists') on link create¶
Symptom
Network.start() fails with LinkError: interface 'h1-eth0'
already exists or similar netlink errors with errno 17.
Cause. A previous run left the veth pairs around. Normally the
atexit cleanup or the SIGINT handler tears them down, but neither
runs if the process was killed with SIGKILL (e.g. kill -9 or OOM)
or terminated before the handlers were installed.
Fix
Verify with ip link show — leftover veths usually carry
<host>-eth<N> names. Delete with:
Then re-run.
IPv6 link-local addresses on switch-side veth (older versions)¶
Symptom
On p4net < 0.2.0, <host> ifconfig and <switch> log show stray
fe80:: addresses; punt-path captures show MLD chatter.
Cause. The Linux kernel auto-generates fe80:: link-local
addresses on every interface that comes up while
net.ipv6.conf.<iface>.disable_ipv6=0. Pre-0.2.0 didn't gate the
sysctl before bringing the interface up.
Fix
Upgrade to p4net ≥ 0.2.0. The orchestrator now writes
disable_ipv6=1 on every interface that doesn't have an
explicit IPv6 address before the link goes up; interfaces with
Host.ip6 get disable_ipv6=0, accept_ra=0, autoconf=0.
Pingall fails with all X cells¶
Symptom
pingall returns a matrix of X (failure) cells; nothing reaches
anything.
Cause. Most often, ARP isn't seeded. BMv2 has no ARP responder; without static neighbors, the first ICMP request triggers an ARP broadcast that the dataplane has no rule for.
Fix
The bundled examples seed ARP in setup(net):
h1.exec(["ip", "neigh", "replace", "10.0.0.2",
"lladdr", "00:00:00:00:00:02",
"dev", "h1-eth0", "nud", "permanent"])
Either do this for every host pair, or implement an ARP responder in the P4 program (e.g. via the CPU-port punt path).
Compiler cache stale after editing P4 source¶
Symptom
Edits to a .p4 file aren't reflected in the running pipeline.
Cause. Almost never the cache (which keys on source bytes), but worth checking if it ever surfaces.
Fix
rm -rf ~/.cache/p4net/compiler/ to nuke it. The next run will
repopulate. If the problem persists, it's not the cache —
confirm the right .p4 path is referenced in add_switch(p4_src=...).
xterm fails with "DISPLAY not set"¶
Symptom
<host> xterm returns error: NetworkError: cannot spawn xterm:
$DISPLAY is unset.
Cause. No X server is reachable from the orchestrator process.
xterm needs $DISPLAY to be set and an X socket reachable.
Fix
On a desktop Linux session, this works out of the box. Over SSH,
use ssh -X user@host and pass -Y if your X server requires
trusted forwarding. On WSL2, install an X server (VcXsrv,
XQuartz over forwarded display) and set DISPLAY accordingly.
On a headless server, xterm simply doesn't apply — use
<host> cmd <argv> for one-shot commands.
"Address already in use" on BMv2 gRPC port¶
Symptom
BMv2 fails to start with Could not bind: Address already in use
in <log_dir>/<switch>.log.
Cause. Another simple_switch_grpc (or any other process) is
already bound to the port the orchestrator picked.
Fix
Identify and kill the holder:
The orchestrator picks ports starting at 50051; if you have a
persistent collision, override per switch:
topology.add_switch("s1", ..., grpc_port=50061).
topology graph reports "graphviz dot binary not found"¶
Symptom
topology graph /tmp/topo.png returns error: TopologyError:
graphviz \dot` binary not found on PATH`.
Cause. graphviz isn't installed.
Fix
Install it (sudo apt install graphviz), or use format=dot to
emit the source verbatim:
Then render externally with any DOT-aware tool (online viewer,
gxl2dot, etc.).
Nothing here covers your problem¶
Open an issue at https://github.com/zhh2001/p4net/issues with the
output of python -c "import p4net; print(p4net.__version__)",
p4c --version, simple_switch_grpc --version, the topology
file (or a minimal reproducer), and the relevant log lines from
<log_dir>/<switch>.log.