INT (in-band telemetry)¶
Single-switch demo where the P4 pipeline embeds a 14-byte INT (in-band network telemetry) shim header into every forwarded IPv4 packet. The shim carries the switch identifier, ingress timestamp, egress port, queue depth, and the original etherType. A raw-socket listener inside the receiving host's namespace decodes the shim and prints structured per-packet telemetry.
What this demonstrates¶
- Wire-level header insertion: the P4 deparser emits a new header between Ethernet and IPv4.
- EtherType swap: outer etherType becomes
0x88B6(the INT shim identifier) so kernels and packet captures can tell INT frames apart. - Original etherType preservation: the shim's
next_protofield carries the original etherType (0x0800for IPv4) so receivers can recover the inner header chain. - Raw-socket decoding: the user-space listener reads frames via
AF_PACKET, parses the shim by byte offsets, and prints one line per frame.
Topology¶
examples/int/topology.py:
"""Two hosts, one switch, IPv4 forwarding with INT shim insertion.
The P4 program (`int.p4`) inserts a 14-byte INT shim header between the
Ethernet and IPv4 headers on every forwarded packet. The shim carries the
switch identifier, ingress timestamp, egress port, queue depth, and the
original etherType (so a receiver can recover the inner IPv4 header).
Run as root:
sudo p4net examples/int/topology.py
Then in a separate terminal on h2 (e.g. ``h2 xterm`` from the CLI):
sudo python3 /path/to/examples/int/listener.py --iface h2-eth0
And from another terminal:
sudo ip netns exec h1 ping -c 3 -W 1 10.0.0.2
The listener prints one structured line per INT-stamped frame.
"""
from __future__ import annotations
from pathlib import Path
from p4net import Network
from p4net.topo import Topology
HERE = Path(__file__).resolve().parent
topology = Topology()
h1 = topology.add_host("h1", ip="10.0.0.1/24", mac="00:00:00:00:00:01")
h2 = topology.add_host("h2", ip="10.0.0.2/24", mac="00:00:00:00:00:02")
s1 = topology.add_switch("s1", p4_src=HERE / "int.p4")
topology.add_link(h1, s1, port_b=1)
topology.add_link(h2, s1, port_b=2)
def setup(net: Network) -> None:
"""Static ARP both sides; LPM entries; write the switch_id register."""
h1 = net.host("h1")
h2 = net.host("h2")
h1.exec(
[
"ip",
"neigh",
"replace",
"10.0.0.2",
"lladdr",
"00:00:00:00:00:02",
"dev",
"h1-eth0",
"nud",
"permanent",
]
)
h2.exec(
[
"ip",
"neigh",
"replace",
"10.0.0.1",
"lladdr",
"00:00:00:00:00:01",
"dev",
"h2-eth0",
"nud",
"permanent",
]
)
s1 = net.switch("s1")
s1.client.insert_table_entry(
table="MyIngress.ipv4_lpm",
match={"hdr.ipv4.dstAddr": "10.0.0.1/32"},
action="MyIngress.set_egress_port",
params={"port": 1},
)
s1.client.insert_table_entry(
table="MyIngress.ipv4_lpm",
match={"hdr.ipv4.dstAddr": "10.0.0.2/32"},
action="MyIngress.set_egress_port",
params={"port": 2},
)
# Assign this switch's INT identifier. The INT shim stamps every
# forwarded packet with this value. For multi-switch topologies,
# give each switch a distinct id.
s1.client.write_register("MyIngress.switch_id", index=0, value=1)
if __name__ == "__main__":
from p4net.cli.main import main
raise SystemExit(main([__file__]))
Two hosts, one switch, IPv4 forwarding programmed via P4Runtime plus static ARP.
P4 program¶
examples/int/int.p4:
/* In-band Network Telemetry (INT) — single-switch demo.
*
* For every IPv4 packet that the LPM table forwards, the switch inserts a
* 14-byte INT shim header between the Ethernet header and the IPv4 payload.
*
* Wire layout produced by the deparser:
*
* [ Ethernet (14 B, etherType=0x88B6) ]
* [ INT shim (14 B) ]
* [ IPv4 + payload ]
*
* INT shim layout (most-significant bit first, total 14 bytes):
*
* +--------+--------+--------+--------+--------+--------+--------+
* | swid | ingress_timestamp_us (48 bits) |
* | (8) | |
* +--------+----------------+-----------------+-----------------+
* | egress_port (16) | queue_depth (16) | next_proto (16) |
* +-------------------------+-------------------+-------------+
* | reserved (8) |
* +----------------+
*
* The shim's `next_proto` field carries the original etherType (0x0800
* for IPv4) so the receiver can recover the inner IPv4 header. A
* user-space listener on the receiving host parses the shim from a raw
* AF_PACKET socket; see `examples/int/listener.py`.
*
* Pairs with `examples/int/topology.py`, which programs `ipv4_lpm`,
* writes the ``switch_id`` register, and pre-seeds static ARP entries.
*/
#include <core.p4>
#include <v1model.p4>
const bit<16> ETHERTYPE_IPV4 = 0x0800;
const bit<16> ETHERTYPE_INT = 0x88B6;
header ethernet_t {
bit<48> dstAddr;
bit<48> srcAddr;
bit<16> etherType;
}
header int_shim_t {
bit<8> switch_id;
bit<48> ingress_timestamp_us;
bit<16> egress_port;
bit<16> queue_depth;
bit<16> next_proto;
bit<8> reserved;
}
header ipv4_t {
bit<4> version;
bit<4> ihl;
bit<8> diffserv;
bit<16> totalLen;
bit<16> identification;
bit<3> flags;
bit<13> fragOffset;
bit<8> ttl;
bit<8> protocol;
bit<16> hdrChecksum;
bit<32> srcAddr;
bit<32> dstAddr;
}
struct headers {
ethernet_t ethernet;
int_shim_t int_shim;
ipv4_t ipv4;
}
struct metadata {}
parser MyParser(packet_in pkt, out headers hdr, inout metadata meta,
inout standard_metadata_t std) {
state start {
pkt.extract(hdr.ethernet);
transition select(hdr.ethernet.etherType) {
ETHERTYPE_IPV4: parse_ipv4;
default: accept;
}
}
state parse_ipv4 {
pkt.extract(hdr.ipv4);
transition accept;
}
}
control MyVerifyChecksum(inout headers hdr, inout metadata meta) { apply {} }
control MyIngress(inout headers hdr, inout metadata meta,
inout standard_metadata_t std) {
/* One-element register holding the configured switch identifier.
* The control plane writes this at start via
* ``client.write_register("MyIngress.switch_id", index=0, value=N)``. */
register<bit<8>>(1) switch_id;
action drop() {
mark_to_drop(std);
}
action set_egress_port(bit<9> port) {
std.egress_spec = port;
}
table ipv4_lpm {
key = {
hdr.ipv4.dstAddr: lpm;
}
actions = {
drop;
set_egress_port;
NoAction;
}
default_action = NoAction();
size = 1024;
}
apply {
if (hdr.ipv4.isValid()) {
ipv4_lpm.apply();
/* Only stamp INT shim on packets actually being forwarded. */
if (std.egress_spec != 0) {
bit<8> sid;
switch_id.read(sid, 0);
hdr.int_shim.setValid();
hdr.int_shim.switch_id = sid;
hdr.int_shim.ingress_timestamp_us = (bit<48>) std.ingress_global_timestamp;
hdr.int_shim.egress_port = (bit<16>) std.egress_spec;
hdr.int_shim.queue_depth = (bit<16>) std.deq_qdepth;
hdr.int_shim.next_proto = hdr.ethernet.etherType;
hdr.int_shim.reserved = 0;
hdr.ethernet.etherType = ETHERTYPE_INT;
}
}
}
}
control MyEgress(inout headers hdr, inout metadata meta,
inout standard_metadata_t std) { apply {} }
control MyComputeChecksum(inout headers hdr, inout metadata meta) { apply {} }
control MyDeparser(packet_out pkt, in headers hdr) {
apply {
pkt.emit(hdr.ethernet);
pkt.emit(hdr.int_shim);
pkt.emit(hdr.ipv4);
}
}
V1Switch(MyParser(), MyVerifyChecksum(), MyIngress(), MyEgress(),
MyComputeChecksum(), MyDeparser()) main;
Key points:
- The shim header is declared statically; the deparser emits it conditionally on its valid bit.
- The ingress control populates the shim from
standard_metadataafter the LPM table has setstd.egress_spec. switch_idis now register-backed (register<bit<8>>(1) switch_id;). The topology'ssetup(net)callss1.client.write_register("MyIngress.switch_id", index=0, value=1); multi-switch INT deployments can assign distinct identifiers without recompiling.
The listener¶
examples/int/listener.py:
"""INT shim listener — runs inside a host namespace, prints per-frame INT data.
Usage (must be run as root because AF_PACKET sockets are privileged):
sudo ip netns exec h2 python3 listener.py --iface h2-eth0
Or from the p4net interactive shell:
h2 xterm
# in the spawned xterm:
sudo python3 examples/int/listener.py --iface h2-eth0
The script opens a raw AF_PACKET socket, filters by EtherType 0x88B6 (the
INT shim), and decodes the 14-byte shim that follows the Ethernet header.
Wire layout (matches the deparser in int.p4):
[ Ethernet (14 B, etherType = 0x88B6) ]
[ INT shim (14 B): ]
switch_id uint8
ingress_timestamp_us uint48 (big-endian, packed in 6 bytes; BMv2 reports microseconds)
egress_port uint16
queue_depth uint16
next_proto uint16 (= 0x0800 for IPv4)
reserved uint8
[ IPv4 + payload ]
"""
from __future__ import annotations
import argparse
import socket
import struct
import sys
ETH_P_ALL = 0x0003
ETHERTYPE_INT = 0x88B6
SHIM_LEN = 14
def _decode_int_shim(buf: bytes) -> dict[str, int]:
"""Decode a 14-byte INT shim into a dict."""
if len(buf) < SHIM_LEN:
raise ValueError(f"INT shim truncated: got {len(buf)} bytes, need {SHIM_LEN}")
switch_id = buf[0]
# 48-bit big-endian timestamp packed in 6 bytes.
ts = int.from_bytes(buf[1:7], "big")
egress_port, queue_depth, next_proto = struct.unpack("!HHH", buf[7:13])
reserved = buf[13]
return {
"switch_id": switch_id,
"ingress_timestamp_us": ts,
"egress_port": egress_port,
"queue_depth": queue_depth,
"next_proto": next_proto,
"reserved": reserved,
}
def _decode_ipv4_addrs(buf: bytes) -> tuple[str, str] | None:
"""Pull src/dst from a buffer beginning at the IPv4 header. Returns None on truncation."""
if len(buf) < 20:
return None
src = socket.inet_ntoa(buf[12:16])
dst = socket.inet_ntoa(buf[16:20])
return src, dst
def main() -> int:
parser = argparse.ArgumentParser(
description="Decode INT shim headers from a raw AF_PACKET socket."
)
parser.add_argument(
"--iface",
required=True,
help="Interface name to bind to (e.g. h2-eth0).",
)
parser.add_argument(
"--count",
type=int,
default=0,
help="Exit after printing this many INT frames (0 = forever).",
)
args = parser.parse_args()
sock = socket.socket(socket.AF_PACKET, socket.SOCK_RAW, socket.htons(ETH_P_ALL))
sock.bind((args.iface, 0))
sys.stdout.write(f"[listener] bound on {args.iface}, waiting for INT frames\n")
sys.stdout.flush()
seen = 0
while True:
frame, _addr = sock.recvfrom(65535)
if len(frame) < 14 + SHIM_LEN:
continue
etype = int.from_bytes(frame[12:14], "big")
if etype != ETHERTYPE_INT:
continue
shim = _decode_int_shim(frame[14 : 14 + SHIM_LEN])
inner = frame[14 + SHIM_LEN :]
addrs = _decode_ipv4_addrs(inner) if shim["next_proto"] == 0x0800 else None
flow = f" {addrs[0]} -> {addrs[1]}" if addrs else ""
sys.stdout.write(
f"[switch={shim['switch_id']} "
f"ts={shim['ingress_timestamp_us']}us "
f"egress={shim['egress_port']} "
f"queue={shim['queue_depth']} "
f"next_proto=0x{shim['next_proto']:04x}]{flow}\n"
)
sys.stdout.flush()
seen += 1
if args.count and seen >= args.count:
return 0
if __name__ == "__main__":
raise SystemExit(main())
The listener opens a raw AF_PACKET socket, filters by
etherType == 0x88B6, decodes the 14-byte shim by byte offset, and
prints structured output.
Run it¶
In one terminal:
The setup(net) hook installs the LPM entries and pre-seeds static
ARP. You're dropped into the p4net> shell.
In a second terminal (or via h2 xterm from the shell):
In a third terminal, send some traffic:
The listener prints one line per INT-stamped frame that crossed the switch:
[listener] bound on h2-eth0, waiting for INT frames
[switch=1 ts=745907us egress=2 queue=0 next_proto=0x0800] 10.0.0.1 -> 10.0.0.2
[switch=1 ts=1750021us egress=2 queue=0 next_proto=0x0800] 10.0.0.1 -> 10.0.0.2
[switch=1 ts=2754336us egress=2 queue=0 next_proto=0x0800] 10.0.0.1 -> 10.0.0.2
Caveats¶
queue_depthis almost always 0 with BMv2's default queueing. The field is wired in but stays at zero unless the egress queue actually backs up — which doesn't happen at this demo's traffic level.- Single hop only. Real INT stacks one shim per traversed hop; multi-hop is left as an extension exercise.
- Switch identifier is register-backed. Change the
write_register("MyIngress.switch_id", index=0, value=N)call in the topology'ssetup(net)to relabel; multi-switch deployments give each switch a distinctNwith no recompile required.
Variations to try¶
- Add a second switch, write its
switch_idregister to2, and chain h1 → s1 → s2 → h2. Extend the listener (or the P4 pipeline) to handle a shim stack. - Pipe the listener's output to a file and post-process to compute
per-flow latency deltas from
ingress_timestamp_us. - Add
delay="50ms"orloss_pct=2.0to one of the h↔s links and verify the timestamps and packet counts respond as expected.