Robust Wi-Fi roaming with Staex

2022-12-05

Staex adds Wi-Fi roaming to their features allowing mobile robots in large warehouses to switch between multiple Wi-Fi access points for full network coverage without having to change the network card and the driver provided by the operating system.

Staex adds Wi-Fi roaming to their features allowing mobile robots in large warehouses to switch between multiple Wi-Fi access points for full network coverage without having to change the network card and the driver provided by the operating system.

What is Wi-Fi roaming?

Wi-Fi roaming is automatic re-connection to the nearest Wi-Fi router (access point) by the clients (stations). Roaming solves the problem of network coverage for large spaces where you have to use multiple Wi-Fi access points in order to cover the whole area of a large warehouse. Also, you want your mobile robots and handheld devices to automatically switch between these access points. In an ideal world automatic switching should be transparent for the applications as they may not handle disconnects gracefully. However, in the real world this is not always the case. Applications may lose connections, hang or fail in various other ways during roaming. This makes them inefficient and in some cases almost unusable in an environment with multiple access points.

What options do we have to solve this problem? Since the problem partially lies on the application side, we can tune the device that it runs on. We can use a high-quality network card or a network card driver that is optimized for roaming. These are tempting options, but what if we can't change the card and the driver provided by the operating system? In this case a software-based overlay network is a good fit.

How Staex improves application robustness

Software-based overlay networks easily handle abrupt disconnects from the access points that may occur due to router load balancing, hardware malfunction, or signal loss. Staex overlay network takes one step further and implements general-purpose network roaming: when a client fully disconnects from the current network and connects to a completely different network. It doesn't matter which technology this network is based on: it can be Wi-Fi, mobile network, Ethernet etc. — any IP-based network will work with Staex provided there is an operating system driver for it.

During the disconnect Staex maintains the connection between the client and the daemon that runs locally, and tries to establish a new connection when the new network becomes available. The application that uses this connection doesn't see any changes, from the application perspective nothing happened except some messages took longer than usual to reach the server.

When we first encountered the roaming problem we didn't know if our software could handle this or not. We decided to challenge ourselves and run a benchmark. In this benchmark we measure how much time is needed to recover from abrupt disconnects and how much data is lost during the re-connection. This is so-called hard roaming.

Network roaming benchmark

We have three nodes: a bootnode with public IP-address, a sender node and a receiver node. The sender and the receiver nodes are connected using a Wi-Fi network, and there is no other physical network that connects them. These nodes are also indirectly connected in Staex network via the bootnode, since the bootnode has a public IP-address and acts as a relay in Staex network.

Benchmark.

We test the following scenario:

  • Limit the bandwidth of sender and receiver nodes to 1Mbit/s using tc's netem module.
  • Start a Python HTTP server on the sender node that serves a randomly-generated 3MB file.
  • Start a Python HTTP client on the receiver node that downloads this file.
  • The client reports each second download speed and total bytes received.
  • After five seconds the client disconnects from the current Wi-Fi network and connects to another one (this network should also have internet access to be able to connect to the bootnode).
  • The client continues to download the file without any action from our side.