Building a simple Linux multipath router

2022-05-03

I am occasionally running LAN parties. The available internet bandwidth is often an issue. I wanted a simple way to balance traffic over multiple internet uplinks. It is assumed that there will be NAT on or after the egress interface of the router.

This first version tries to be simple, while not being optimal in all cases.

The basic feature we will be using is multipath routing. Instead of a simple default route our router will have a default route with multiple nexthops. This can be added via iproute2:

ip route add 0.0.0.0/0 \
	nexthop via 172.21.0.1 dev bond0.300 weight 3 \
	nexthop via 172.21.101.1 dev eno1 weight 1

This route will balance the traffic to the nexthops 172.21.0.1 and 172.21.101.1 over the specified interfaces. The weight setting will balance the flows (statistically) 3 to 1 between the links. You can adjust that according to your available bandwidth.

You also might want to check, that there is no other default route with a lower metric, because that will be preferred.

L4 Hashing

By default the load balancing is only done on L3 (IP) headers. A hash is calculated over these fields and that is then used to assign the flow to a nexthop. So given a big enough number of flows you should see a flow distribution according to your weights. To use the L4 headers (which will make it possible to balance flows of the same client over different uplinks) we have to set the sysctl

net.ipv4.fib_multipath_hash_policy = 1

Caveats

There are some issues with this setup:

  1. It is not tested in production yet.
  2. There might be issues with protocols that use multiple ports in parallel (e.g. ftp) or services that need to track users by IP address. This might be "solved" by sticking to L3 header hashing only.
  3. This setup in this naive form does not consider link latency and special traffic classification
  4. This does not handle traffic shaping. If you want to rate limit the traffic, you have to do this via tc, nft or whatever you prefer.
  5. In the case that you are not running NAT but have a public address space that is reachable via all of your uplinks you can only control the way packets are leaving your network. The ingress way is out of your control.
  6. Keep in mind that different flows might cause different amounts of traffic. If you have 100 flows and a 1:1 split, then you will see ~50 flows per nexthop. If you are unlucky the 50 flows for one nexthop might need a lot of traffic and fill up that link while the other 50 are not enough to fill the other link. Although this scenario is rather unlikely.

Testing the setup

To test the setup you have to send traffic from an address that is not on the transit networks to your nexthops, because otherwise the route selection algorithm might influence the route decision

You can use mtr -I $iface 9.9.9.9 to check which nexthop you get. Just select a appropriate interface or use a second computer behind the router. In the later case you can skip the -I $iface parameter. When you use several different destination addresses you should see the different nexthops you specified.

Alternatively ip route get $target from $address should return you different next hops for different targets.

References