Sticky wireguard connection with WAN fail-over
Sticky wireguard connection with WAN fail-over
I have a configuration with 2 WAN connections. The first is a fiber optic connection, the second is a USB 4G dongle. The desire is that if there is a disruption on the primary fiber connection, all connections switch to the 4G backup line. When the primary connections become available again, all connections should switch back.
The router has a Wireguard connection to a remote pfSense installation.
When the primary connection fails (by removing the cable from the WAN port), the router switches to the 4G connection very quickly. Within 1 second, the Wireguard tunnel is also active again. When the primary connection is restored, all connections go through the primary WAN connection again, except for the Wireguard tunnel which remains active on the 4G WAN connection.
Is this a bug or do I need to adjust a configuration somewhere?
I am using the cloud controller.
- Copy Link
- Subscribe
- Bookmark
- Report Inappropriate Content
Hi @GTiMy77
Thanks for posting in our business forum.
GTiMy77 wrote
Creating a Wireshark of the WAN interface is no problem. I will do so when I'm back on site tomorrow.
Could you please instruct me how to mirror the USB LTE modem within the router? I'm not able to select the WAN interfaces to be mirrored. I don't know any other method to capture traffic on this interface
Interesting question about the interface status. As soon as I reboot the router, both WAN and USB LTE interface are in the state connected. The USB LTE interface only goes to the disconnected state when I unplug the USB dongle. The WAN interface goes to disconnected as soon as I remove the network cable. Both quite obvious :-). Otherwise the reported state is always connected:
So set up a static routing like this.
- Copy Link
- Report Inappropriate Content
Hi @GTiMy77
Thanks for posting in our business forum.
So, you enabled link backup. Now your WAN switches to the primary by Always Link Primary. The WG stays in the backup link. Does it still work? I think not. Backup WAN should be offline when it switches back to primary.
How do you verify that the WG is on the backup? Do you have a screenshot of it? I am curious about the verification process.
- Copy Link
- Report Inappropriate Content
I have a continuous ping running from a LAN port of the router to an IP address on the other side of the VPN tunnel. This gives me an indication if the tunnel is working at any given time. I see the ping being briefly interrupted when switching from primary to secondary WAN. That looks good!
The WireGuard tunnel is established to a pfSense instance. PfSense offers within their software the ability to show the status of the WireGuard peers. This status includes the peers IP address, which is then the external IP address of the WAN connection. When switching from primary to secondary WAN, you see this IP address change immediately. This is not observed when switching back.
Through https://www.whatismyip.com/, I check which WAN port I am using to go out at any given time. It neatly switches from primary to secondary and back to primary upon restoration of that connection.
I hope to have provided some insight into my tests and setup.
Thanks in advance for the responses!
- Copy Link
- Report Inappropriate Content
Hi @GTiMy77
Thanks for posting in our business forum.
GTiMy77 wrote
I have a continuous ping running from a LAN port of the router to an IP address on the other side of the VPN tunnel. This gives me an indication if the tunnel is working at any given time. I see the ping being briefly interrupted when switching from primary to secondary WAN. That looks good!
The WireGuard tunnel is established to a pfSense instance. PfSense offers within their software the ability to show the status of the WireGuard peers. This status includes the peers IP address, which is then the external IP address of the WAN connection. When switching from primary to secondary WAN, you see this IP address change immediately. This is not observed when switching back.
Through https://www.whatismyip.com/, I check which WAN port I am using to go out at any given time. It neatly switches from primary to secondary and back to primary upon restoration of that connection.
I hope to have provided some insight into my tests and setup.
Thanks in advance for the responses!
Instead of using whatmyip, I don't trust it at all. What's the tracert result from the PC in the Omada router's peer?
What I am saying is, if it switches to the primary, the backup WAN will remain offline. What you said contradicts what I know about our router.
- Copy Link
- Report Inappropriate Content
Many thanks for your reply. I replaced the whatmyip test with a traceroute test, but I observe the same behavior:
1. Router restarts
-> Wireguard connection via primary WAN
-> Traceroute via primary WAN
2. Primary WAN connection down; cable removed
-> Wireguard connection via secondary WAN
-> Traceroute via secondary WAN
3. Primary WAN restores
-> Wireguard connection via secondary WAN
-> Traceroute via primary WAN
Indeed, I would also expect the entire secondary/backup WAN to go offline. Except for the Wireguard connection, that does indeed happen. Could I have found a bug?
- Copy Link
- Report Inappropriate Content
Hi @GTiMy77
Thanks for posting in our business forum.
GTiMy77 wrote
Many thanks for your reply. I replaced the whatmyip test with a traceroute test, but I observe the same behavior:
1. Router restarts
-> Wireguard connection via primary WAN
-> Traceroute via primary WAN
2. Primary WAN connection down; cable removed
-> Wireguard connection via secondary WAN
-> Traceroute via secondary WAN
3. Primary WAN restores
-> Wireguard connection via secondary WAN
-> Traceroute via primary WAN
Indeed, I would also expect the entire secondary/backup WAN to go offline. Except for the Wireguard connection, that does indeed happen. Could I have found a bug?
If WG stays in secondary WAN, then it is not through. Tracert goes to Primary WAN. So, you are tracerting a remote IP in the peer? If so, it went through the primary WAN, and that means a good connection and normal behavior to me.
Can you show me the screenshot of the last test? Naming the IP and what are they.
Please mosaic your sensitive information. Here is a list of information considered sensitive:
1. Public IP address on your WAN if your WAN is.
2. Real MAC address of your device.
3. Your personal information including address, domain name, and credentials.
For troubleshooting purposes, when a WAN IP is needed, please leave some values visible for identification.
- Copy Link
- Report Inappropriate Content
Many thanks for your reply and patience. I've repeated the tests and included the results:
Used IP addresses:
- 95.99.xx.yy --> WAN IP address of TP-link router (local site)
- 91.87.xx.yy --> LTE IP address of TP-link router (local site)
- 213.136.xx.yy --> WAN IP address of pfSense router (remote site)
- 192.168.xx.yy --> computer on LAN interface of TP-link router (local site) | source of traceroute
- 10.254.xx.yy --> LAN interface of pfSense router (remote site)
The Wireguard tunnel uses IP port 9001.
Test 1:
Router restarted. WAN connected and online.
Expected behavior: all traffic goes out via WAN.
Traceroute to WAN IP address of pfSense router:
Tracing route to 213.136.xx.yy over a maximum of 30 hops
1 2 ms 1 ms 2 ms 192.168.xx.yy
2 5 ms 4 ms 5 ms 31.187.200.1
3 7 ms 7 ms 7 ms 10.227.133.252
4 7 ms 7 ms 7 ms 10.226.10.6
5 * 9 ms 9 ms 213.136.2.14
6 8 ms 8 ms 9 ms 80.249.208.35
7 8 ms 8 ms 8 ms 213.136.2.14
8 9 ms 8 ms 8 ms 213.136.xx.yy
Trace complete.
Packet capture on pfSense router of ping between computer on LAN and LAN interface of pfSense router -> via VPN:
20:37:32.687947 IP 95.99.xx.yy.51820 > 10.254.xx.yy.9001: UDP, length 96
20:37:32.688089 IP 10.254.xx.yy.9001 > 95.99.xx.yy.51820: UDP, length 96
20:37:33.706923 IP 95.99.xx.yy.51820 > 10.254.xx.yy.9001: UDP, length 96
20:37:33.707016 IP 10.254.xx.yy.9001 > 95.99.xx.yy.51820: UDP, length 96
20:37:34.738402 IP 95.99.xx.yy.51820 > 10.254.xx.yy.9001: UDP, length 96
20:37:34.738573 IP 10.254.xx.yy.9001 > 95.99.xx.yy.51820: UDP, length 96
20:37:35.757121 IP 95.99.xx.yy.51820 > 10.254.xx.yy.9001: UDP, length 96
20:37:35.757272 IP 10.254.xx.yy.9001 > 95.99.xx.yy.51820: UDP, length 96
Test result: OK
Test 2:
WAN connection disconnected by removing the cable.
Expected behavior: all traffic goes out via LTE.
Traceroute to WAN IP address of pfSense router:
Tracing route to 213.136.xx.yy over a maximum of 30 hops
1 1 ms 1 ms 1 ms 192.168.xx.yy
2 2 ms 2 ms 2 ms 192.168.254.1
3 105 ms 218 ms 197 ms 10.225.27.1
4 69 ms 76 ms 85 ms 10.128.110.13
5 66 ms 55 ms 60 ms 172.31.87.14
6 88 ms 80 ms 71 ms 172.31.87.13
7 * 82 ms 73 ms 172.16.188.33
8 111 ms 95 ms 77 ms 172.16.188.76
9 82 ms 76 ms 72 ms 81.52.186.121
10 105 ms 89 ms 80 ms 213.136.xx.yy
Trace complete.
Packet capture on pfSense router of ping between computer on LAN and LAN interface of pfSense router -> via VPN:
20:43:28.252683 IP 91.87.xx.yy.12142 > 10.254.xx.yy.9001: UDP, length 96
20:43:28.252780 IP 10.254.xx.yy.9001 > 91.87.xx.yy.12142: UDP, length 96
20:43:29.272751 IP 91.87.xx.yy.12142 > 10.254.xx.yy.9001: UDP, length 96
20:43:29.272897 IP 10.254.xx.yy.9001 > 91.87.xx.yy.12142: UDP, length 96
20:43:30.312687 IP 91.87.xx.yy.12142 > 10.254.xx.yy.9001: UDP, length 96
20:43:30.312769 IP 10.254.xx.yy.9001 > 91.87.xx.yy.12142: UDP, length 96
20:43:31.360757 IP 91.87.xx.yy.12142 > 10.254.xx.yy.9001: UDP, length 96
20:43:31.360880 IP 10.254.xx.yy.9001 > 91.87.xx.yy.12142: UDP, length 96
Test result: OK
Test 3:
WAN connection restored by reinserting the cable.
Expected behavior: all traffic goes out via WAN again.
Traceroute to WAN IP address of pfSense router:
Tracing route to 213.136.xx.yy over a maximum of 30 hops
1 1 ms 1 ms 1 ms 192.168.xx.yy
2 4 ms 4 ms 5 ms 31.187.200.1
3 7 ms 6 ms 7 ms 10.227.133.252
4 7 ms 6 ms 7 ms 10.226.10.6
5 49 ms * * 213.136.2.14
6 8 ms 8 ms 8 ms 80.249.208.35
7 8 ms 8 ms 8 ms 213.136.2.14
8 8 ms 8 ms 8 ms 213.136.xx.yy
Trace complete.
Packet capture on pfSense router of ping between computer on LAN and LAN interface of pfSense router -> via VPN:
20:49:59.224553 IP 91.87.xx.yy.12142 > 10.254.xx.yy.9001: UDP, length 96
20:49:59.224741 IP 10.254.xx.yy.9001 > 91.87.xx.yy.12142: UDP, length 96
20:50:00.245919 IP 91.87.xx.yy.12142 > 10.254.xx.yy.9001: UDP, length 96
20:50:00.246072 IP 10.254.xx.yy.9001 > 91.87.xx.yy.12142: UDP, length 96
20:50:01.263741 IP 91.87.xx.yy.12142 > 10.254.xx.yy.9001: UDP, length 96
20:50:01.263897 IP 10.254.xx.yy.9001 > 91.87.xx.yy.12142: UDP, length 96
20:50:02.283747 IP 91.87.xx.yy.12142 > 10.254.xx.yy.9001: UDP, length 96
20:50:02.283883 IP 10.254.xx.yy.9001 > 91.87.xx.yy.12142: UDP, length 96
Test result: not OK.
Traceroute to WAN IP address of pfSense router goes via WAN interface of TP-link router --> OK
Wireguard traffic arrives at pfSense router from LTE IP address of TP-link router --> this should have gone via WAN interface --> Not OK
- Copy Link
- Report Inappropriate Content
Hi @GTiMy77
Thanks for posting in our business forum.
GTiMy77 wrote
Many thanks for your reply and patience. I've repeated the tests and included the results:
Used IP addresses:
- 95.99.xx.yy --> WAN IP address of TP-link router (local site)
- 91.87.xx.yy --> LTE IP address of TP-link router (local site)
- 213.136.xx.yy --> WAN IP address of pfSense router (remote site)
- 192.168.xx.yy --> computer on LAN interface of TP-link router (local site) | source of traceroute
- 10.254.xx.yy --> LAN interface of pfSense router (remote site)
The Wireguard tunnel uses IP port 9001.
Test 1:
Router restarted. WAN connected and online.
Expected behavior: all traffic goes out via WAN.
Traceroute to WAN IP address of pfSense router:
Tracing route to 213.136.xx.yy over a maximum of 30 hops
1 2 ms 1 ms 2 ms 192.168.xx.yy
2 5 ms 4 ms 5 ms 31.187.200.1
3 7 ms 7 ms 7 ms 10.227.133.252
4 7 ms 7 ms 7 ms 10.226.10.6
5 * 9 ms 9 ms 213.136.2.14
6 8 ms 8 ms 9 ms 80.249.208.35
7 8 ms 8 ms 8 ms 213.136.2.14
8 9 ms 8 ms 8 ms 213.136.xx.yy
Trace complete.
Packet capture on pfSense router of ping between computer on LAN and LAN interface of pfSense router -> via VPN:
20:37:32.687947 IP 95.99.xx.yy.51820 > 10.254.xx.yy.9001: UDP, length 96
20:37:32.688089 IP 10.254.xx.yy.9001 > 95.99.xx.yy.51820: UDP, length 96
20:37:33.706923 IP 95.99.xx.yy.51820 > 10.254.xx.yy.9001: UDP, length 96
20:37:33.707016 IP 10.254.xx.yy.9001 > 95.99.xx.yy.51820: UDP, length 96
20:37:34.738402 IP 95.99.xx.yy.51820 > 10.254.xx.yy.9001: UDP, length 96
20:37:34.738573 IP 10.254.xx.yy.9001 > 95.99.xx.yy.51820: UDP, length 96
20:37:35.757121 IP 95.99.xx.yy.51820 > 10.254.xx.yy.9001: UDP, length 96
20:37:35.757272 IP 10.254.xx.yy.9001 > 95.99.xx.yy.51820: UDP, length 96
Test result: OK
Test 2:
WAN connection disconnected by removing the cable.
Expected behavior: all traffic goes out via LTE.
Traceroute to WAN IP address of pfSense router:
Tracing route to 213.136.xx.yy over a maximum of 30 hops
1 1 ms 1 ms 1 ms 192.168.xx.yy
2 2 ms 2 ms 2 ms 192.168.254.1
3 105 ms 218 ms 197 ms 10.225.27.1
4 69 ms 76 ms 85 ms 10.128.110.13
5 66 ms 55 ms 60 ms 172.31.87.14
6 88 ms 80 ms 71 ms 172.31.87.13
7 * 82 ms 73 ms 172.16.188.33
8 111 ms 95 ms 77 ms 172.16.188.76
9 82 ms 76 ms 72 ms 81.52.186.121
10 105 ms 89 ms 80 ms 213.136.xx.yy
Trace complete.
Packet capture on pfSense router of ping between computer on LAN and LAN interface of pfSense router -> via VPN:
20:43:28.252683 IP 91.87.xx.yy.12142 > 10.254.xx.yy.9001: UDP, length 96
20:43:28.252780 IP 10.254.xx.yy.9001 > 91.87.xx.yy.12142: UDP, length 96
20:43:29.272751 IP 91.87.xx.yy.12142 > 10.254.xx.yy.9001: UDP, length 96
20:43:29.272897 IP 10.254.xx.yy.9001 > 91.87.xx.yy.12142: UDP, length 96
20:43:30.312687 IP 91.87.xx.yy.12142 > 10.254.xx.yy.9001: UDP, length 96
20:43:30.312769 IP 10.254.xx.yy.9001 > 91.87.xx.yy.12142: UDP, length 96
20:43:31.360757 IP 91.87.xx.yy.12142 > 10.254.xx.yy.9001: UDP, length 96
20:43:31.360880 IP 10.254.xx.yy.9001 > 91.87.xx.yy.12142: UDP, length 96
Test result: OK
Test 3:
WAN connection restored by reinserting the cable.
Expected behavior: all traffic goes out via WAN again.
Traceroute to WAN IP address of pfSense router:
Tracing route to 213.136.xx.yy over a maximum of 30 hops
1 1 ms 1 ms 1 ms 192.168.xx.yy
2 4 ms 4 ms 5 ms 31.187.200.1
3 7 ms 6 ms 7 ms 10.227.133.252
4 7 ms 6 ms 7 ms 10.226.10.6
5 49 ms * * 213.136.2.14
6 8 ms 8 ms 8 ms 80.249.208.35
7 8 ms 8 ms 8 ms 213.136.2.14
8 8 ms 8 ms 8 ms 213.136.xx.yy
Trace complete.
Packet capture on pfSense router of ping between computer on LAN and LAN interface of pfSense router -> via VPN:
20:49:59.224553 IP 91.87.xx.yy.12142 > 10.254.xx.yy.9001: UDP, length 96
20:49:59.224741 IP 10.254.xx.yy.9001 > 91.87.xx.yy.12142: UDP, length 96
20:50:00.245919 IP 91.87.xx.yy.12142 > 10.254.xx.yy.9001: UDP, length 96
20:50:00.246072 IP 10.254.xx.yy.9001 > 91.87.xx.yy.12142: UDP, length 96
20:50:01.263741 IP 91.87.xx.yy.12142 > 10.254.xx.yy.9001: UDP, length 96
20:50:01.263897 IP 10.254.xx.yy.9001 > 91.87.xx.yy.12142: UDP, length 96
20:50:02.283747 IP 91.87.xx.yy.12142 > 10.254.xx.yy.9001: UDP, length 96
20:50:02.283883 IP 10.254.xx.yy.9001 > 91.87.xx.yy.12142: UDP, length 96
Test result: not OK.
Traceroute to WAN IP address of pfSense router goes via WAN interface of TP-link router --> OK
Wireguard traffic arrives at pfSense router from LTE IP address of TP-link router --> this should have gone via WAN interface --> Not OK
I still don't believe it.
Can you Wireshark from the ER605? From its WAN and try to mirror the LTE see if you can do it.
Mirror the WAN and capture it on your PC to see if there is WG packets flowing through the WAN.
I cannot believe when LTE is offline, it still sends over the traffic. I think it might be a problem with your pfsense.
And the routing table clearly shows and matches the WAN's traceroute. I cannot believe what is said here. So, monitor the WAN port directly. You don't see WG details because it is encrypted but you can still see how it works and flows.
How to capture packets using Wireshark on SMB router or switch
How to Use Port Mirror to Capture Packets in the Controller
And in addition, simple enough, when you perform test 1 and 2, after that, run test 3, it shows it is using LTE, and you believe the ER605 is the fault. Simple, run test 4 while now LTE is unplugged, pfsense and Wireshark again. Will you see the same IP like test 3? If yes, that's a problem with your pfsense.
- Copy Link
- Report Inappropriate Content
Thank you for the response. Test 4 is a good option which I executed right away with the result a bit lower in this reaction.
Port mirroring could give an even more detailed sight, but I have looked into the controller (we use the Omada cloud controller) and I see no way how to mirror the USB LTE adapter nor the WAN interface (I can see how to mirror other LAN ports, but the WAN ports are greyed out). If needed I can alter the setup and place an external tap on the WAN Interface and do some Wireshark capturing of that interface, but then I still won't be able to capture traffic passing through the USB LTE adapter unfortunally.
Test 4:
When I run test 4, the remote IP-adres shown up in the Wireguard status page of the pfSense router (the remote site) changes, very quickly even, to 95.99.xx.yy which is the fiber connection on the WAN interface of the TP-link.
After that I plugged the USB LTE dongle back in and then it stays idle. So that is working as expceted and correct.
In my previous post, the packet captures are on the ingress of the WAN interface on the pfSense unit. In there the LTE IP-adres is seen as the remote IP. I cannot see any manners on which the pfSense unit has any influence on the outgoing interface of the TP-link router. And therefor I'm very sure it isn't an issue on the pfSense side, test 4 is making this statement even stronger I believe.
Just to be sure, I added a screenshot of the load-balancing configuration of the WAN interfaces and the configuration of the USB Modem:
Kind regards
- Copy Link
- Report Inappropriate Content
Hi @GTiMy77
Thanks for posting in our business forum.
GTiMy77 wrote
Thank you for the response. Test 4 is a good option which I executed right away with the result a bit lower in this reaction.
Port mirroring could give an even more detailed sight, but I have looked into the controller (we use the Omada cloud controller) and I see no way how to mirror the USB LTE adapter nor the WAN interface (I can see how to mirror other LAN ports, but the WAN ports are greyed out). If needed I can alter the setup and place an external tap on the WAN Interface and do some Wireshark capturing of that interface, but then I still won't be able to capture traffic passing through the USB LTE adapter unfortunally.
Test 4:
When I run test 4, the remote IP-adres shown up in the Wireguard status page of the pfSense router (the remote site) changes, very quickly even, to 95.99.xx.yy which is the fiber connection on the WAN interface of the TP-link.
After that I plugged the USB LTE dongle back in and then it stays idle. So that is working as expceted and correct.
In my previous post, the packet captures are on the ingress of the WAN interface on the pfSense unit. In there the LTE IP-adres is seen as the remote IP. I cannot see any manners on which the pfSense unit has any influence on the outgoing interface of the TP-link router. And therefor I'm very sure it isn't an issue on the pfSense side, test 4 is making this statement even stronger I believe.
Just to be sure, I added a screenshot of the load-balancing configuration of the WAN interfaces and the configuration of the USB Modem:
Kind regards
Can you Wireshark as stated above and check both WAN and USB modem WAN? Under the condition of test 3 when it is stuck on USB WAN. I need this to concrete the evidence.
I need to see a result of WG stays on the USB WAN so Wireshark shows the traffic of WG while the primary WAN has changed as online.
I also need a picture of the primary WAN is up(online) and USB as offline on the router status.
- Copy Link
- Report Inappropriate Content
Creating a Wireshark of the WAN interface is no problem. I will do so when I'm back on site tomorrow.
Could you please instruct me how to mirror the USB LTE modem within the router? I'm not able to select the WAN interfaces to be mirrored. I don't know any other method to capture traffic on this interface
Interesting question about the interface status. As soon as I reboot the router, both WAN and USB LTE interface are in the state connected. The USB LTE interface only goes to the disconnected state when I unplug the USB dongle. The WAN interface goes to disconnected as soon as I remove the network cable. Both quite obvious :-). Otherwise the reported state is always connected:
- Copy Link
- Report Inappropriate Content
Information
Helpful: 0
Views: 1283
Replies: 14
Voters 0
No one has voted for it yet.