ER7206 high latency, dropping packages - only from time to time
Hi!
I've been using Omada for couple of months now. Today's night I've observed my IoT devices miss internet access.
Firstly I though it's ISP, but after debug I found out ISP works stable. Further debug shown that there is a problem with ER7206 gateway.
The problem occurs in waves where gateway drops majority of packages - works perfectly fine for couple of minutes, and then fails for minute or two.
I've tried to restart all devices in my setup - it kept working for 10-15 minutes. Problem is back. Factory reset and adaptation didn't fix the issue.
After another restart it seems stable - very random. I didn't touch the infrastructure (cables, devices, etc.) for weeks.
Using one PC I've connected to the Gateway (directly to one of it's ports) and the ISP's router using separate network cards. This gave me high confidence it's about gateway.
Latency to ISP's router is stable (less than 1ms, 0 packages dropped).
And here is the latency for ER7206:
- Copy Link
- Subscribe
- Bookmark
- Report Inappropriate Content
This is quite interesting, and this could be it. I can't say what exactly caused such increasing activity.
As I get it - DEV1 spammed the network with something.
I'll post if I'll figure it out.
Thank you!
- Copy Link
- Report Inappropriate Content
- Copy Link
- Report Inappropriate Content
Have you tried disconnecting all other devices and only using a PC directly in the router? there may be a loop or a broadcast storm in the network that is causing problems. I have an ER7206 with the same firmware and I have 1 ms when I ping the router
you should also check if you have a device with the same IP as the router, 192.168.0.1 is a very common IP that is standard on many devices.
- Copy Link
- Report Inappropriate Content
Any messages in the logs? Does the CPU %'ge change between working normally and super-slow mode?
- Copy Link
- Report Inappropriate Content
Thank you all for your suggestions!
I've restarted every device in my whole setup (1 gateway, 3 switches, 8 EAPs). So far so good (couple of hours).
I've checked the consumption this morning (for 24h) - but it didn't take my attention as it was quite ok (based on % values).
But, it looks strange - memory building up. Still, 27% in peak doesn't looks bad.
7 days chart:
Last 24h (1min resolution):
First outage was captured around 00:30 - which aligns with CPU peak. and IoT kept restarting for 3-4 hours. It was super unstable.
No alerts/event is Omada in the 00:30 - 09:00 timeframe.
(I mean I've seen alerts but that was just after I've factory reset gateway and restarted devices - I expected them to happen)
I'll then monitor further and in case of problem I'll check CPU and mem consumption.
Worth to mention - I don't use VPN (it's configured, but no users) and ACL.
Thank you,
Jan
- Copy Link
- Report Inappropriate Content
This is quite interesting, and this could be it. I can't say what exactly caused such increasing activity.
As I get it - DEV1 spammed the network with something.
I'll post if I'll figure it out.
Thank you!
- Copy Link
- Report Inappropriate Content
Looks like a spanning tree issue between ports 1 and 3. Spiralling packet storm leading to port shut down, cooling off, and re-enabling.
There should be something in the Events pointing to this (unless you've disabled messaging for such events).
- Copy Link
- Report Inappropriate Content
I see nothing interesting in Events - only that devices were connected/disconnected frequently.
The cut off at ~3:45 (chart time) was a restart of the gateway - which I did by hand. This solved the issue for a while as problematic devices (clients) couldn't recover the connection. And then after ~9:30 when I started work, the traffic was back - as devices woke up.
I have something like a cluster setup which involves server-to-server communication (within a LAN). This must be it. It just went wild.
In between I've restarted client devices and this restarted all services in the OS.
Thank you all for the support!
- Copy Link
- Report Inappropriate Content
- Copy Link
- Report Inappropriate Content
@folfix dev1 link is up to something, most of the time nothing malicious, broken equpiment installed that behaves like a old hub, broken cabling, assume incompetence rather than malice.
- Copy Link
- Report Inappropriate Content
Hi!
Issue is gone permanently.
I've verified as much as I could and I doubt it was a DDOS attack.
DEV1, DEV2, and DEV3 are connected together. They're part of the cluster. When I've restarted the cluster services - all works.
Thank you all!
- Copy Link
- Report Inappropriate Content
Information
Helpful: 0
Views: 886
Replies: 10
Voters 0
No one has voted for it yet.