this post was submitted on 24 Sep 2023
70 points (98.6% liked)

Selfhosted

40113 readers
1693 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago
MODERATORS
 

Made a quick test of mesh VPN clients. Test was performed between host and a VM, both running Kubuntu 23.04. VM ran on KVM with a virtio network adapter.

Test machine is oldish laptop with an i5-2540M, so VPN performance was probably CPU bound. Still, tests help to understand how different mesh VPNs compare against each other.

Tailscale surprisingly was the fastest, even faster than plain Wireguard, despite being userspace. But it also consumed more memory (245 MB after the iperf3 test!) and CPU.

Netbird's CPU usage is so low I almost doubt if that's fair comparison, most usage might be by kernel, since it uses kernel Wireguard. I don't know how to measure that better. Memory usage is moderate. For some reason it wasn't as fast as plain Wireguard.

Zerotier has the lowest memory usage, but is the slowest. Although this would probably only matter on LAN. Hope upcoming v2 closes the gap.

Tailscale Netbird Zerotier Wireguard Raw
Version 1.48.2 0.23.3 1.12.2
Idle, PSS, MB 66 36 12
iperf3, PSS, MB 245 36 12
Idle, CPU time, s / real minute 0.505 0.120 0.297
iperf3, CPU time, s / real minute 115.23 0.14 78.72
iperf3, Mb/s 860 630 360 730 9600

Same table as an image with best / worst results highlighted


UPD: since this got more attention, questions and suggestions than I expected, I've made more tests.

Tailscale Netbird Zerotier Wireguard Raw
Pre / post start, Δ mem avail, MB 48 43 12 22
Pre / during iperf3 tests, avg Δ mem avail, MB 130 8 8 7 9
Total mem usage under load (sum of the above), MB 178 51 20 29 9
CPU usage during iperf3 tests, avg % 62 77 57 77 27
iperf3, Mb/s 935 946 414 988 10340
iperf3 bidirectional, total, Mb/s 427 853 480 1029 10260

Same table as an image with best / worst results highlighted

This time I've measured total system memory usage to account for kernel usage. No surprises here, just a bit more data. Tailscale confirmed to consume a lot more memory under load.

How I've measured memory and CPU usage:

  • All commands and measurements are done in the VM, the only thing on host is VM itself and iperf3 client
  • Boot the VM, start iperf3 server, all mesh VPN services are disabled beforehand
  • Then, for every service:
    • Run sar -r ALL 2 5 to determine the baseline, note the average kbavail (it accounts for the fact that not all caches are actually reclaimable, see man)
    • Start the service
    • Run sar -r ALL 2 5, note the average kbavail, difference with the baseline goes to the "Pre / post start, Δ mem avail" row
    • Run iperf3 client on the host: iperf3 -c *IP* -t 90 or iperf3 -c *IP* -t 90 --bidir
    • During the test, run sar -u ALL -r ALL 5 10 , note the average kbavail and CPU idle %
    • After all tests are done, average of all kbavail during tests goes to the "Pre / during iperf3 tests, avg Δ mem avail" row; and 100% minus average of all CPU idle % during tests goes to the "CPU usage during iperf3 tests" row
    • Stop the service
    • Run sync; sudo sysctl -q vm.drop_caches=3

Here's raw data, so you can recalculate using anything else instead of kbavail

I've minimized number of running apps on the host that could affect performance (looking at you, Firefox). This resulted in some overall performance increase, and both Netbird and Wireguard performed almost identically to Tailscale in iperf3, overperforming Zerotier by ~2.4 times (the same ratio as in my initial test).

I've also added bidirectional iperf3 test, so that both ends transmit and receive data. That didn't significantly affect the performance, except for Tailscale. Tailscale performance halved in this test.

all 16 comments
sorted by: hot top controversial new old
[–] [email protected] 31 points 1 year ago

It's worth noting that Tailscale optimized Wireguard-go to the point where they made it faster than the kernel version: https://tailscale.com/blog/more-throughput/

[–] [email protected] 15 points 1 year ago* (last edited 1 year ago) (2 children)

Tailscale surprisingly was the fastest, even faster than plain Wireguard, despite being userspace. But it also consumed more memory (245 MB after the iperf3 test!) and CPU.

Do we know if this is a variation due to the test protocol or Tailscale is using wireguard with specific settings to improve, slightly, its speed?

[–] [email protected] 13 points 1 year ago (1 children)

Another user posted the blog where they discuss their speedup techniques: https://tailscale.com/blog/more-throughput/

It's likely that the kernel version can use similar techniques to surpass the performance of the userspace version that tailscale uses, but no one has put in the work to to make the kernel implementation as sophisticated as the userspace one.

[–] [email protected] 4 points 1 year ago

That's nice, I hope the upstream pull request goes through.

[–] [email protected] 9 points 1 year ago (1 children)

The Kernel you are running for this test is already over 6 months old. While the clients for Tailscale, Netbird and Zerotier are just a few days old. A lot can happen in half a year since all projects are some what 'new'.

For a propercomparisinon you should pick one point in timr and then choose the latest releases for it.

[–] [email protected] 11 points 1 year ago

I think it’s a fair comparison, most people are using a relatively recent version of the userspace tools but the kernel can easily be more than a year old.

[–] [email protected] 5 points 1 year ago (1 children)

Nice test! I’m a Tailscale user and I liked it being faster than others. I don’t care about memory usage but curious why there is a big gap 🤔 Like its using 20x more memory than Zerotier.

[–] [email protected] 8 points 1 year ago* (last edited 1 year ago) (2 children)

Tailscale is written in Go with lots of dependencies. It also has a lot more features, to the point some would call that too much 😅 Zerotier is pretty lean and in C. That would explain those 55 MB idle memory usage difference. But those 245 MB after iperf3 test though... I can't explain, but it's consistent and repeatable.

[–] [email protected] 5 points 1 year ago (1 children)

Is it possible the others are using the Wireguard kernel module? In that case, a lot of the memory usage will be in kernel/system memory, and just looking at the app's memory usage won't be the full story.

[–] [email protected] 3 points 1 year ago* (last edited 1 year ago) (1 children)

Netbird uses kernel Wireguard module, right. Is there a way to measure kernel memory / CPU usage attributed to Wireguard? Zerotier, which has the lowest memory usage, does not use Wireguard at all, they have their own custom protocol and it's userspace AFAIK.

[–] [email protected] 2 points 1 year ago (1 children)

Is there a way to measure kernel memory / CPU usage attributed to Wireguard?

Not that I'm aware of, unfortunately.

[–] [email protected] 2 points 1 year ago* (last edited 1 year ago)

I might just compare free output before / during / after iperf3 test. Will do that later today.

UPD: done

[–] [email protected] 2 points 1 year ago (1 children)

Does memory usage go down again after the load test or does it stay that high?

[–] [email protected] 2 points 1 year ago

After 10 minutes PSS was back to ~74 MB