WebRTC security: Are truly decentralized and private calls possible?

2024-02-04

WebRTC emerged as a technology for peer-to-peer calls, chats and file sharing for browsers. However, is it fully peer-to-peer and private? In this article we share our experience of using this technology to build decentralized peer-to-peer call web application that we called Copybara.

«Capybara looks at the mirror oil painting» by DALL-E.
«Capybara looks at the mirror oil painting» by DALL-E.

Table of contents

What is WebRTC?

Photo by Compare Fibre on Unsplash.
Photo by Compare Fibre on Unsplash.

WebRTC is a piece of technology that allows computers make direct connections to each other without using central server. These connections can be used for video and audio streaming as well as to share files (actually you can send/stream any data that you like). Probably the best thing about WebRTC is that it is implemented in all major web browsers and thus is available to almost anyone who uses the Internet. This allowed WebRTC became the communication standard for video meetings, and all major meeting platforms eventually migrated to this technology from their home-grown solutions.

WebRTC is secure by default. Media streams use SRTP and data streams use DTLS for encryption. Simply put DTLS is a TLS implementation for protocols that do not guarantee packet delivery (UDP, SCTP etc.), and SRTP is encrypted and authenticated variation of RTP protocol which itself typically runs on top of UDP.

WebRTC uses STUN or TURN servers to establish connections between computers behind NAT. Such computers can not establish direct connection to each other without a relay or NAT traversal, e.g. they are located in different local area networks, the firewall is too restrictive etc. To oversimplify, WebRTC uses STUN servers to check that NAT traversal is possible, and if it is not, then TURN servers are used as relays to establish peer-to-peer connections. Successful NAT traversal means that the traffic will go from one peer to another bypassing STUN server, and communication via relay means that the traffic will pass the relay.

Why WebRTC is not private and fully decentralized?

Photo by Katerina Pavlyuchkova on Unsplash.
Photo by Katerina Pavlyuchkova on Unsplash.

Traffic passing the relay means more work for TURN server, however, this also means more privacy to the user. This is due to the fact that the real IP address of each peer using a relay is hidden from other peers, and only the relay knows the real IP addresses of all peers. STUN server uses real IP addresses for the communication, and at least in the context of WebRTC this means that all peers know each other's real IP addresses.

Another problem is that WebRTC standard does not include signalling server, and most of the implementations use centralized signalling server. This server is used to exchange the initial handshake the between the peers. This handshake include a list of available IP addresses and protocol combinations for each peer. This means that signalling server also know the real IP addresses of the peers. Some WebRTC implementations may leak real IP addresses of the peers through the browser API.

To summarize, WebRTC is secure by default (which is a good thing!), however, it may leak real IP addresses of the peers due to the fact it tries to establish direct connections between the peers. Another problem is that signalling server (which is an essential part of WebRTC) is centralized, and this makes it an attractive target for DDoS and other hacker attacks.

Improving privacy and making it decentralized

Photo by Markus Spiske on Unsplash.
Photo by Markus Spiske on Unsplash.

Our goal with Copybara project was to improve WebRTC calls privacy and make them decentralized (i.e. make signalling server a less attractive target for hacker attacks). To do that we

  • replaced the initial handshake over signalling server with manual handshake over possibly non-private channel (e.g. messenger, email),
  • replaced DNS with AZERO.ID — a decentralized name resolution alternative,
  • replaced Internet with Staex public network to provide global addresses for peers (and thus eliminate STUN/TURN servers).

Eliminating signalling server

WebRTC initial handshake is used to publish and discover peer's addresses and other communication options using the signalling server. The handshake includes SDP description of the peer's media streams and a list of ICE candidates — special strings (RFC 5245) that describe how to connect to the peer.

This is how SDP string generated by my browser looks like.

o=mozilla...THIS_IS_SDPARTA-99.0 1676368899866546158 0 IN IP4 0.0.0.0
s=-
t=0 0
a=fingerprint:sha-256 40:0A:51:ED:9D:B6:BA:A8:21:2E:32:28:63:0D:0F:49:4A:E0:C9:8B:46:9C:27:4A:54:42:56:4A:05:2E:60:49
a=group:BUNDLE 0 1 2
a=ice-options:trickle
a=msid-semantic:WMS *
m=audio 9 UDP/TLS/RTP/SAVPF 109 9 0 8 101
c=IN IP4 0.0.0.0
a=sendrecv
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=extmap:2/recvonly urn:ietf:params:rtp-hdrext:csrc-audio-level
a=extmap:3 urn:ietf:params:rtp-hdrext:sdes:mid
a=fmtp:109 maxplaybackrate=48000;stereo=1;useinbandfec=1
a=fmtp:101 0-15
a=ice-pwd:44159564e974788c91d0561890ca7138
a=ice-ufrag:1ad1f3f2
a=mid:0
a=msid:{9838f430-8e28-4bc6-bebe-4b9add04895b} {125e306f-5935-493e-9989-9ad56fd57f19}
a=rtcp-mux
a=rtpmap:109 opus/48000/2
a=rtpmap:9 G722/8000/1
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:101 telephone-event/8000/1
a=setup:actpass
a=ssrc:1324956293 cname:{a5e60e98-6974-419d-b3c8-0501c1d32852}
m=video 9 UDP/TLS/RTP/SAVPF 120 124 121 125 126 127 97 98
c=IN IP4 0.0.0.0
a=sendrecv
a=extmap:3 urn:ietf:params:rtp-hdrext:sdes:mid
a=extmap:4 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time
a=extmap:5 urn:ietf:params:rtp-hdrext:toffset
a=extmap:6/recvonly http://www.webrtc.org/experiments/rtp-hdrext/playout-delay
a=extmap:7 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01
a=fmtp:126 profile-level-id=42e01f;level-asymmetry-allowed=1;packetization-mode=1
a=fmtp:97 profile-level-id=42e01f;level-asymmetry-allowed=1
a=fmtp:120 max-fs=12288;max-fr=60
a=fmtp:124 apt=120
a=fmtp:121 max-fs=12288;max-fr=60
a=fmtp:125 apt=121
a=fmtp:127 apt=126
a=fmtp:98 apt=97
a=ice-pwd:44159564e974788c91d0561890ca7138
a=ice-ufrag:1ad1f3f2
a=mid:1
a=msid:{9838f430-8e28-4bc6-bebe-4b9add04895b} {8c4342d0-f7b8-41b6-9459-af31ef4aa1d8}
a=rtcp-fb:120 nack
a=rtcp-fb:120 nack pli
a=rtcp-fb:120 ccm fir
a=rtcp-fb:120 goog-remb
a=rtcp-fb:120 transport-cc
a=rtcp-fb:121 nack
a=rtcp-fb:121 nack pli
a=rtcp-fb:121 ccm fir
a=rtcp-fb:121 goog-remb
a=rtcp-fb:121 transport-cc
a=rtcp-fb:126 nack
a=rtcp-fb:126 nack pli
a=rtcp-fb:126 ccm fir
a=rtcp-fb:126 goog-remb
a=rtcp-fb:126 transport-cc
a=rtcp-fb:97 nack
a=rtcp-fb:97 nack pli
a=rtcp-fb:97 ccm fir
a=rtcp-fb:97 goog-remb
a=rtcp-fb:97 transport-cc
a=rtcp-mux
a=rtcp-rsize
a=rtpmap:120 VP8/90000
a=rtpmap:124 rtx/90000
a=rtpmap:121 VP9/90000
a=rtpmap:125 rtx/90000
a=rtpmap:126 H264/90000
a=rtpmap:127 rtx/90000
a=rtpmap:97 H264/90000
a=rtpmap:98 rtx/90000
a=setup:actpass
a=ssrc:1809954171 cname:{a5e60e98-6974-419d-b3c8-0501c1d32852}
a=ssrc:3259788821 cname:{a5e60e98-6974-419d-b3c8-0501c1d32852}
a=ssrc-group:FID 1809954171 3259788821
m=application 9 UDP/DTLS/SCTP webrtc-datachannel
c=IN IP4 0.0.0.0
a=sendrecv
a=ice-pwd:44159564e974788c91d0561890ca7138
a=ice-ufrag:1ad1f3f2
a=mid:2
a=setup:actpass
a=sctp-port:5000
a=max-message-size:1073741823

As we can see the IP address is already hidden, and probably the only sensitive data that we have here are per-stream passwords ice-pwd that WebRTC uses for integrity checks.

The following output shows how ICE candidate string look like in my browser.

{
    "candidate": "candidate:0 1 UDP 2121990399 X.X.X.X 51884 typ host",
    "sdpMid": "0",
    "sdpMLineIndex": 0,
    "usernameFragment": "eccde21d"
}

This string is generated for each IP address (public and private) a peer has, each protocol (UDP, TCP), each stream (video, audio, data), and each component id (RTP, RTCP). In my case I had a total 60 (!) candidates generated. Here the IP address X.X.X.X is not hidden, and some candidates use public IP addresses whereas others use private ones.

Normally the SDP and all connections strings are sent to the signalling server, that forwards them to each peer. In response each peer sends its own SDP string and a list of ICE candidates. Our idea was to encode them into URLs and exchange them over a end-to-end encrypted secret chat (Telegram, WhatsApp etc.). This is more work for the user but also more privacy, since we do not send IP adresses and session passwords to the central signalling server.

Hiding real IP addresses

To remove the real IP addresses from ICE candidates, we integrated our web application into Staex public network. This network uses node's public keys as their addresses, and end-to-end encrypts the traffic between each node using these keys. Also Staex is a multi-hop network, meaning that the nodes that use another node as a relay do not know the real IP addresses of each other — they only know the public keys.

In order to map public keys to IP addresses and vice versa (to actually send the packet over underlying IP network) Staex uses node-local temporary IPv4 addresses. These addresses can only be spoofed by actually infiltrating the node, and the same public key may map to different such addresses on different nodes. In other words, revealing such addresses do not reveal any sensitive data about the WebRTC peer.

Here one might argue that we replaced a STUN/TURN server with Staex public network node which does the same thing but slightly differently. Staex public node indeed knows the real IP addresses of each direct peer (this is how IP networks work anyway), but these addresses are not sent anywhere. For additional privacy one can connect to the public node over another node to hide its address from the public node as well.

To hide real IP addresses we removed any ICE candidates that did not have node-local temporary addresses from the list, and then added source public key to the handshake. We also included node-local temporary source/destination addresses in the request and response to replace possibly different IP addresses in the ICE candidate strings.

# request
{
    ...,
    "sourceNodeId": "h8syf1xcv3rveegh00p89ew8sqbzkr4ckpjz3zs4xxr5tvf4hs4g",
    "destinationNodeId": "ctnsaj8135fsnm5t1fqda6093g95pjzfdhms77trjk6fy1ks6a80",
    "sourceIp": "10.83.0.1"
}

# response
{
    ...,
    "destinationIp":"10.83.0.1"
}

The callee validates that destination node id (this is how we call node public keys internally) matches the current node id and then resolves source node id to the node-local temporary address via node-local Staex HTTP API that runs on loopback network (127.0.0.0/8).

If someone opens the request link on another computer with spoofed destination node id, then the connection will fail because the node has different node private key. If it has the same private key, then the private was somehow stolen.

Stealing ice-pwd will not allow someone to decrypt the traffic because Staex does end-to-end encryption for every packet.

Now you can exchange request/response links over non-private channel because instead of real IP addresses they contain node-local ones, and spoofing these addresses will not allow an attacker to connect to the call because the attacker does not have node private key.

Resolve node ids via decentralized AZERO.ID

Remembering public keys of each of your friend does not scale, and resolving them from human-readable names is much more convenient. We have implemented just that using AZERO.ID — fully decentralized, blockchain-based name resolution system. This system allows to associate any string with a name ending with .azero. To do that we have registered a few domains in AZERO.ID and added staex-id field with the value of node id. Now instead of typing long hard-to-remember node ids, one can type staex.azero or copybara.azero.

One might argue that we could have implemented DNS name resolution ourselves, and we probably have one since node ids are resolved to IP addresses and vice versa.The truth is that we have multiple DNS systems: one for mapping between node-local IP addresses and node public keys, and another one for DNS name mapping. However, the second one is only available for Staex private networks because it was designed for the networks operated by only one organization. Staex public network is different and requires a federated DNS system that we plan to release in the future.

Conclusion

Photo by Greg Shield on Unsplash.
Photo by Greg Shield on Unsplash.

WebRTC is secure by default, however, there is a room for improving its privacy. We hid the real IP addresses of the peers participating in the call by replacing them with temporary node-local addresses generated by Staex public network, and finally made them human-readable by resolving them from AZERO.ID names.

The resulting system does not require STUN/TURN server, signalling server, but requires manual handshake by exchanging the request/response links over some channel (e.g. messenger, email) that does not need to be private. Security and privacy in this case is guaranteed by Staex that does not allow spoofing node addresses by design.

About Staex

Staex is a secure public network for IoT devices that can not run a VPN such as smart meters, IP cameras, and EV chargers. Staex encrypts legacy protocols, reduces mobile data usage, and simplifies building networks with complex topologies through its unique multi-hop architecture. Staex is fully zero-trust meaning that no traffic is allowed unless specified by the device owner which makes it more secure than even some private networks. With this, Staex creates an additional separation layer to provide more security for IoT devices on the Internet, also protecting other Internet services from DDoS attacks that are usually executed on millions of IoT machines.

To stay up to date subscribe to our newsletter, follow us on LinkedIn and Twitter for updates and subscribe to our YouTube channel.