Making your own private Internet
blows dust off blog
It’s not that I haven’t been busy! I’ve built and upgraded, and broken, a CNC mill. I’ve converted a 70’s toy to accept WiFi and speak Python. I even converted another PS2 keyboard to USB .. kinda. I just haven’t written any of it up on my blog.
I knew I had to write up the even older stuff about FPGAs etc., and I think that was holding me back.
This is not that post.
This is a post about using WireGuard, VXLANs and Linux bridge devices to make your own private network between hosts that can neccessarily all talk to each other.
The WireGuard quickstart is pretty comprehensive, so I’m not going to duplicate it here. For my part, I have five machines:
Machine | Location |
---|---|
rho | Paris, FR |
epsilon | Eastern US |
vorke | Ottawa, CA |
bob | Stafford, UK |
pi0 | Roaming |
Rho, Epsilon and Bob have static IP addresses and are reachable from the outside. Vorke has a dynamic IP address which is reachable from the outside. Pi0 could be anywhere, has a dynamic IP and is usually behind NAT.
I checked out the WireGuard source and built the kernel module. Because I’m a
lesson to others, I am using a mix of x86
, amd64
and armel
. I’m also
using a mix of Debian, Ubuntu and VoidLinux, and my Paris machine is a Marvell
Armada SoC. Don’t be like me.
Let’s assume that
cd WireGuard/src ; make ; sudo make install
does the right things for you. You’ll have a wg
utility on your path, a new
kernel module in the directory that matches your running kernel and an
/etc/wireguard
directory, waiting for a config.
I changed to /etc/wireguard
on each machine and generated my keys:
wg genkey | tee privatekey | wg pubkey > publickey
Then I assigned 10.88.88.0/24
out of thin air. (It’s in an RFC1918 network,
so this is fine. Anything starting with 10.
is fair game as long as no other
network you are connected to is using it).
Each machine has a config which lists the other nodes in it:
# Rho
[Interface]
PrivateKey = REDACTED
ListenPort = 56560
# Bob
[Peer]
PublicKey = REDACTED
AllowedIPs = 10.88.88.1/32
Endpoint = 86.188.161.69:56560
# Epsilon
[Peer]
PublicKey = REDACTED
AllowedIPs = 10.88.88.4/32
Endpoint = 104.196.99.86:56560
# Vorke
[Peer]
PublicKey = REDACTED
AllowedIPs = 10.88.88.3/32
# Pi0
[Peer]
PublicKey = REDACTED
AllowedIPs = 10.88.88.5/32
You don’t have to always list every node in the config, only the other nodes
that you expect that machine will talk to. For example, I’ve only put Pi0
in
Rho
’s config, because those two machines only talk to each other via WireGuard.
You’ll notice that Endpoint
is only filled in for machines which are
publically reachable on a static IP. The other machines will initiate a
connection out to the static ones. Once that happens, the static ones know to
use that existing UDP socket pair to talk back to them.
I create and configure the WireGuard network interfaces on every machine:
modprobe ipv6
modprobe udp_tunnel
modprobe ip6_udp_tunnel
ip link add dev wg0 type wireguard
wg setconf wg0 /etc/wireguard/config
ip link set up dev wg0
ip addr add 10.88.88.4/24 dev wg0 # pick a unique IP for each machine
You can verify that everything is working now by pinging from place to place.
If you’re okay with the wg
kernel module making routing decisions for you,
and having to have all nodes be able to talk to all other nodes, you could stop
now.
I wasn’t happy with this, and I also wanted to deal with the issue of MTU. On
my network, the wg0
device has an MTU of 1420. This should be fine,
because we have path-MTU discovery, but we live in crappy times and between
overzealous filtering of ICMP, refusal to route fragmented packets and anycast
IPs that do the wrong thing, this will cause problems at some point.
My solution for this is to run VXLANs over the encrypted point-to-point tunnels that WireGuard have given us. They are effectively VLANs which are implemented in UDP instead of at layer 2.
These act more like regular network devices, their routing (and switching) decisions work in standard ways, and I can tell the kernel to make the devices have 1500-byte MTUs and just send fragmented packets over WireGuard. It won’t neccessarily be efficient, but it will work.
Each of these VXLANs is going to form a point to point network of their own. I like to think of them as “virtual wires”. Or cloud Ethernet. Or something. Given a bunch of virtual wires which connect between each other but don’t form a complete mesh, I thought of a few ways to make this work:
- Static routing over the IPs. I don’t have a lot of hosts, but I have enough for this to become annoying, and it wouldn’t provide any form of redundancy.
- Use a dynamic routing protocol (like BGP). Because the hosts don’t form a full mesh, they couldn’t live inside the same autonomous system, but I could allocate a bunch of ASes from the test range (64512 and above). This could be cool because I could join my BGP based Calico network in. It does mean configuring some routing software (probably Quagga). I may still revisit this, but it wasn’t what I chose to do.
- Solve this at layer 2 by using Linux bridges and the spanning tree protocol.
Spanning tree will mean that I really can just treat these VXLANs like cables – connect them all to a core switch, connect them to each other, and let STP avoid switching loops.
In real network gear, if you have two connections between Switch A
and
Switch B
, you would cause a switching loop – packets going to Switch B
from A
would end up going back to A
and then back to B
and bad, bad
things would start to happen.
To avoid this, when a layer 1 connection comes up on an STP enabled switch it sends some broadcast packets called BPDUs. If it receives that packet back on another interface, it will disable one of the two interfaces to avoid a loop. No real traffic can flow until this process has run its course, which takes around 30 seconds.
Apart from avoiding pain when connecting network equipment together, STP also gives you a layer of redundancy – if the active port stops sending packets, your switch can attempt to bring the port which was disconnected (‘Blocked’ in STP speak) into to use (‘Forwarding’).
This is going to be great for my internetwork, because if one of the nodes is unavailable then all of the rest of the nodes which have cross connects will eventually notice and reconfigure themselves into a mostly working network.
Because it doesn’t require every node to talk to every other, connections like Pi0 – which only has one upstream connection, are treated just like an access port on a switch. They have no redundancy, but they are considered down-stream of which every Linux bridge they are connected to.
brctl addbr internet
brctl stp internet on
case $(uname -n) in
epsilon)
ip addr add 10.99.99.4/24 dev internet
ip link add vorke type vxlan remote 10.88.88.3 id 1 dstport 4789
ip link add bob type vxlan remote 10.88.88.1 id 2 dstport 4789
ip link add rho type vxlan remote 10.88.88.2 id 4 dstport 4789
;;
vorke)
ip addr add 10.99.99.3/24 dev internet
ip link add bob type vxlan remote 10.88.88.1 id 3 dstport 4789
ip link add epsilon type vxlan remote 10.88.88.4 id 1 dstport 4789
ip link add rho type vxlan remote 10.88.88.2 id 5 dstport 4789
;;
sudo ip link set up dev internet
for i in epsilon bob vorke rho pi0; do
ip link set up $i
brctl addif internet $i
ethtool -K $i tx off
done
The above establishes VXLANs between the different hosts (only two are included, for brevity), adds them to an STP enabled bridge and configures IPs on the bridge devices.
Because of a bug … somewhere (I suspect WireGuard) I had to disable hardware
accelerated tx checksums, that’s what the ethtool
line is doing.
Epilogue
We can view the status of things with brctl showstp internet
:
EPSILON:~$ sudo brctl showstp internet
internet
bridge id 8000.5299c5e0d97b
designated root 8000.16500a8e632a
root port 1 path cost 100
max age 20.00 bridge max age 20.00
hello time 2.00 bridge hello time 2.00
forward delay 15.00 bridge forward delay 15.00
ageing time 300.00
hello timer 0.00 tcn timer 0.00
topology change timer 0.00 gc timer 276.72
flags
bob (1)
port id 8001 state forwarding
designated root 8000.16500a8e632a path cost 100
designated bridge 8000.16500a8e632a message age timer 19.86
designated port 8002 forward delay timer 0.00
designated cost 0 hold timer 0.00
flags
rho (3)
port id 8003 state blocking
designated root 8000.16500a8e632a path cost 100
designated bridge 8000.3a3bee4c8584 message age timer 19.87
designated port 8002 forward delay timer 0.00
designated cost 100 hold timer 0.00
flags
vorke (2)
port id 8002 state blocking
designated root 8000.16500a8e632a path cost 100
designated bridge 8000.42031e2df8ce message age timer 19.88
designated port 8002 forward delay timer 0.00
designated cost 100 hold timer 0.00
flags
EPSILON:~$ ping vorke.vpn.insom.me.uk
PING vorke.vpn.insom.me.uk (10.99.99.3) 56(84) bytes of data.
64 bytes from 10.99.99.3: icmp_seq=1 ttl=64 time=196 ms
64 bytes from 10.99.99.3: icmp_seq=2 ttl=64 time=196 ms
64 bytes from 10.99.99.3: icmp_seq=3 ttl=64 time=197 ms
You can see from the above the Epsilon (US) isn’t using its connection to either Rho (FR) or Vorke (CA). It’s only using Bob (UK). And that means when I use ping, even though Canada and the US share a land mass, my packets take nearly 200ms to return: the traffic is going over to England and back, crossing the Atlantic twice.
The full script for this is available in this gist.