VPN kill switch: How to do it on Linux

2023-12-29

Kill switch is a mechanism that prohibits any outgoing traffic unless a VPN is active. In this article we discuss how to implement such a mechanism using Linux policy-based routing for a wide range of VPNs.

Linux IP packet routing tour.
VPN kill switch with policy-based routing.
Conclusion.

Linux IP packet routing tour

Before diving into how to implement a kill switch we need to get familiar with how Linux IP packet routing works in general. The best way to do that is to run ip route command that shows a routing table. On my computer this command outputs the following.

# "ip route" output
default via 10.65.0.1 dev wlan0 proto dhcp src 10.65.0.11 metric 305
10.33.0.0/16 dev vpn1 scope link
10.65.0.0/20 dev wlan0 proto dhcp scope link metric 305

CIDR notation, gateways and broadcast addresses

Each row in the table is a rule that matches a particular packet destination in CIDR notation. For example, 10.33.0.0/16 matches any packet with a destination 10.33.X.X where X is arbitrary number from 0 to 255. Usually the first address 10.33.0.1 is the gateway — default packet destination if no rules match the current packet destination — and the last address 10.33.255.255 is broadcast address — if you send a packet to this address all nodes in the network will receive it. The reality is more complicated though: you can use any address as the gateway and set any address as broadcast address. Plus broadcast packets are usually only sent to the nodes that are connected to the same network switch and these packets are usually blocked by the network router to prevent accidental flooding.

How to read the routing table

Now we can go back to the table to study the rules. In the output default is another way of spelling 0.0.0.0/0, and this rule matches any packet destination.

The first rule says «forward the packet to a gateway with address 10.65.0.1 if none of other rules match».
The second rule says «forward the packet to network device vpn1 if the destination matches 10.33.0.0/16». In this case the device driver or a program that is attached to this device will handle the packet.
The third rule says «forward the packet to network device wlan0 if the destination matches 10.65.0.0/16». There is no gateway in this rule because the packet's destination is in the same network as the gateway, and Linux sends the packet directly to the destination.

Is there only one routing table?

As you may have guessed there are many routing tables in the system. There is default, local and main table. Each table has the id and the name. The mapping between them is stored in /etc/iproute2/rt_tables file. Counterintuitively the default table is main. To see the contents of other tables use the following commands.

$ ip route show table main
...
$ ip route show table local
...
$ ip route show table default
...

On my computer default table does not exist. The local table lists local and broadcast addresses associated with network devices.

$ ip route show table local
local 10.33.0.41 dev vpn1 proto kernel scope host src 10.33.0.41
local 10.65.0.11 dev wlan0 proto kernel scope host src 10.65.0.11
broadcast 10.65.15.255 dev wlan0 proto kernel scope link src 10.65.0.11
local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1
local 127.0.0.1 dev lo proto kernel scope host src 127.0.0.1
broadcast 127.255.255.255 dev lo proto kernel scope link src 127.0.0.1

Policy-based routing

Linux has another set of rules that define how the table is selected. These rules also have priorities, so if a packet matches multiple rules then the rule with lowest priority is selected. To see all the rules use ip rule command.

$ ip rule
0:      from all lookup local
32766:  from all lookup main
32767:  from all lookup default

On my computer each rule matches any packet (from all clause in the output), and local table has lower priority than main. This is what we will leverage to create a VPN kill switch.

VPN kill switch with policy-based routing

VPN kill switch requires routing all outgoing traffic through a VPN except for the local traffic and VPN internal traffic. This means that if a VPN uses port 1234, then the traffic from this port should go through the default gateway or directly to the node in the local network. To implement that we will create a separate table and a rule that uses this table for all non-VPN and non-local packets.

Custom routing table

First edit /etc/iproute2/rt_tables file and add the following line that defines our new table.

83 vpn1

Now create rules in the new table. The table itself is created automatically.

# remove existing rules if any
$ ip route flush table vpn1
# add default route via gateway node from VPN network
$ ip route add default dev vpn1 via 10.83.0.1 table vpn1 metric 100
# add blackhole route (this is the actual kill switch)
$ ip route add blackhole default table vpn1 metric 200
# check that the rule has been added
$ ip route show table vpn1
default via 10.83.0.1 dev vpn1 metric 100
blackhole default metric 200

To summarize, we added new routing table called vpn1, we added default route via gateway node from VPN and we added so-called black hole route. Default route is preferred over black hole route because of the lower metric. The black hole route is used only when the default is not present in the table. Device vpn1 is automatically deleted whenever VPN is stopped and the corresponding rules are deleted as well, however, black hole route stays intact.

Custom routing rules

Now we will link vpn1 to the main routing table.

# route all packets except the ones from source port 1234 using the rules from table vpn1
$ ip rule add not sport 1234 table vpn1
# prefer specific rules in table "main" over the rules in other tables
$ ip rule add table main suppress_prefixlength 0
# check the rules
$ ip rule
0:      from all lookup local
32764:  from all lookup main suppress_prefixlength 0
32765:  not from all sport 9376 lookup vpn1
32766:  from all lookup main
32767:  from all lookup default

The first rule is self-explanatory, you can check out all possible alternatives to not sport in the documentation. According to the documentation suppress_prefixlength N option means «reject routing decisions that have a prefix length of N or less». Prefix length equals zero means default route, hence this rule means «reject routing decisions that match default route in table main». So, suppress_prefixlength 0 is a fancy way of saying «ignore the default route from the main routing table». Since the next table in the list is vpn1, then all the traffic except for local networks will go through the vpn1 network interface.

Any alternatives?

We tested policy-based VPN kill switch with Wireguard (do not forget to specify the port in the configuration) and Staex. Both VPNs use only one port for their internal traffic. It should be possible to match the traffic of a centralized VPN by source/destination in CIDR notation (from and to options of ip rule command). In general the exact packets can be marked using iptables and then matched by the same mark in the routing rules (see markiptables module). This article discusses various approaches within the context of Wireguard.

Multiple VPNs

The nature of a kill switch does not play well with multiple VPNs. Probably the only way to exclude multiple ports from the default route is to use firewall marks. We have not evaluated this approach yet.

Conclusion

We conceived kill switch to be a simple VPN feature, however, we underestimated the complexity of Linux networking. Linux has multiple layers of IP packet routing rules, built-in firewall and network namespaces. VPNs do not make this task simpler either: they might use several ports for the internal communication or you might want to run multiple VPNs on a single node.

Staex is a secure public network for IoT devices that can not run a VPN such as smart meters, IP cameras, and EV chargers. Staex encrypts legacy protocols, reduces mobile data usage, and simplifies building networks with complex topologies through its unique multi-hop architecture. Staex is fully zero-trust meaning that no traffic is allowed unless specified by the device owner which makes it more secure than even some private networks. With this, Staex creates an additional separation layer to provide more security for IoT devices on the Internet, also protecting other Internet services from DDoS attacks that are usually executed on millions of IoT machines.

To stay up to date subscribe to our newsletter, follow us on LinkedIn and Twitter for updates and subscribe to our YouTube channel.