ER605 stops responding after some days after power reset
ER605 is on small office having about 20 devices on the network. After it was installed on mid-March 2023, to switch cabinet, it worked flawlessly for 4 months. Since mid-July, it has been freezing every now and then. LAN lights are blinking even during the freeze. The device does not respond to anything from network. It does not serve DHCP and it does not open the web UI. The only way to get it working is to make power reset, i.e. disconnect the power supply for 10 seconds. After this, it works again, until next freeze.
This device and rest of network devices are controlled with software Omada controller, current version 5.9.31. There is nothing on Omada log except lines "gateway (the name of the device) was disconnected". WiFi APs seem to be disconnected about 4 minutes before ER605 is disconnected.
Is there any way to enable some logging to find out what is the reason for freezing?
- Copy Link
- Subscribe
- Bookmark
- Report Inappropriate Content
Hi, this is exactly the issue I've been chasing for over a year and multiple support tickets. I actually think it is a problem with ALL TPlink routers, but because the ER605v1 has such limited memory, it happens much more quickly on this device than others.
The issue is that, even as recently as 1.3.0 (and a beta version beyond that) the memory usage of the ER605 increases linearly with time, until due to memory contraints, processes start to die. This is usually omadad, which causes the many many 'Disconnect' threads on this forum (for those with controllers), but it can also affect dhcpd, which results in no new IP's being handed out which is usually when the 'controllerless' people notice the issue. You may also notice periods of very slow responsiveness as the router's processor gets tied in knots trying to swap memory around to get things done.
The work around is to schedule a weekly 3AM reboot of the router, which keeps the memory consumption from getting critical in 99%+ of the cases.
Second Router via Controller, also running 1.3.0
- Copy Link
- Report Inappropriate Content
Hi @KTuulos
Thanks for posting in our business forum.
Is there a recent update to the firmware? The router.
Any changes been made recently? Config or physical wires.
AP disconnected 4 minutes before the Router shut down. Strange. Do you have a managed switch capable of blocking the loop by STP?
Is there any chance for you to test the router alone? I am thinking about that you probably run into a loop issue. Because a loop can eventually cause a network shutdown. It's gradual. Oamda Switch can record this in the log. But not sure about your switch.
You can probably remove the switch and test the router alone and check if the router can stably work for a day or two. Or this "dead" issue never occurs when it's used alone without the switch. We gotta pinpoint who's the fault.
Can you also verify that DHCP is down? Try to use ipconfig and ping to verify that. Do you get 169.254.X.X? Ping gateway, is that working?
You can probably export the running log and analyze it yourself. Running log shows the MAC address and interaction between devices.
- Copy Link
- Report Inappropriate Content
@Clive_A Thank you for thorough response.
ER605's SW version has not been updated since the initial installing.
DHCP server was down, that was checked by plugging a "new" computer directly to ER605's LAN port. It did not receive IP address even when I tried manually force the renew with ipconfig. Also, when I manually configured the IP address, I was not able to connect to office server, meaning ER605 stopped delivering data inside its switch feature. When I connected to switch, then I was able to connect to office server..
The "Export Running Logs" produced very detailed logs. I increased the amount of logging on the office server (Debian 12, which is also running virtual machine for Omada controller), so when/if the ER605 stops next time, I could compare server logs with ER605 logs.
I'll get back to this next time when ER605 stops. Before that, I don't make any changes to network HW.
- Copy Link
- Report Inappropriate Content
Hi @KTuulos
Thanks for posting in our business forum.
Sure. It's kinda strange that the router seems to be completely "dead". Keep your firmware updated to date. You don't have to update to the beta. Let me know if this happens again. I can probably escalate your case for further diagnostics if necessary.
- Copy Link
- Report Inappropriate Content
Hi, this is exactly the issue I've been chasing for over a year and multiple support tickets. I actually think it is a problem with ALL TPlink routers, but because the ER605v1 has such limited memory, it happens much more quickly on this device than others.
The issue is that, even as recently as 1.3.0 (and a beta version beyond that) the memory usage of the ER605 increases linearly with time, until due to memory contraints, processes start to die. This is usually omadad, which causes the many many 'Disconnect' threads on this forum (for those with controllers), but it can also affect dhcpd, which results in no new IP's being handed out which is usually when the 'controllerless' people notice the issue. You may also notice periods of very slow responsiveness as the router's processor gets tied in knots trying to swap memory around to get things done.
The work around is to schedule a weekly 3AM reboot of the router, which keeps the memory consumption from getting critical in 99%+ of the cases.
Second Router via Controller, also running 1.3.0
- Copy Link
- Report Inappropriate Content
@d0ugmac1 Thanks for verification of my suspicions. I also activated scheduled reboot for the router and since that, there hasn't been any freeze-ups.
How are you monitoring the router memory status? My network is otherwise monitored with Zabbix, but TP-Link has very limited support for that.
- Copy Link
- Report Inappropriate Content
The graph is from LibreNMS which is a freeware SNMP polling and logging tool. It provides much better granularity than the limited output from the Omada controller. I use it to monitor the Omada gear, Synology NAS, Pi Clone controller etc.
I'll probably shut it down since it doesn't look like TPlink is going to be able to 'fix' their memory leak anytime soon, and I'm just thrashing the disks in the NAS with all the polling and logging. I may keep it going a bit longer to see if the ER707 I'm swapping in has the same linear increase though...
- Copy Link
- Report Inappropriate Content
@d0ugmac1 Which SNMP objects you are monitoring to get memory status? The default TP-link Zabbix SNMP object set monitors only network related aspects.
If possible, could you please list all SNMP objects you were able to monitor from the ER605?
Thanks.
- Copy Link
- Report Inappropriate Content
- Copy Link
- Report Inappropriate Content
Information
Helpful: 0
Views: 1752
Replies: 8
Voters 0
No one has voted for it yet.