ER605 Needs weekly Reboot Issue - does it have a memory leak?
So I have had my fleet of ER605's on a weekly reboot diet now for over a year. Because if I don't do that, they eventually permanently lose contact with the Controller (hardware or software). I have recently started experimenting with LibreNMS and I've found a few things, but the most interesting was the continuous ramping of memory usage over time. I only have about 4 days since I set this system up (running in a Docker on my Synology) but here's what I see. I am curious if this is normal, or if the platform has a memory leak.
Left graph is last 24h, Right graph is last week.
You can also see the resetting impact of my 3AM reboot
- Copy Link
- Subscribe
- Bookmark
- Report Inappropriate Content
Conclusion. OpenVPN logging resulting in memory starvation. After spending some quality time online with the TPlink Support team last night, it appeared that I had amassed ~30MB of OpenVPN logs in just 3 weeks (a problem on a device with only 128M of RAM), despite not a single vpn connection during that time...ironic if you think how many people would have like to have seen what was in those logs, but regardless, it looks to be the culprit. Interestingly, it was only HTTP and the Controller interface that went offline, SNMP and SSH were working fine. I have since deleted my OpenVPN settings from the router and have rebooted. TPlink should be issuing a beta firmware version with the verbosity dialed back and hopefully some kind of log rotation increase as well. I'll monitor for another few weeks just to be sure, but I think we can close the book on this 'colourful' episode.
- Copy Link
- Report Inappropriate Content
Hello @d0ugmac1
To better assist you, I've created a support ticket via your registered email address, and escalated it to our support engineer to look into the issue. The ticket ID is TKID230247563, please check your email box and ensure the support email is well received. Thanks!
- Copy Link
- Report Inappropriate Content
- Copy Link
- Report Inappropriate Content
Nobody has reached out yet, but there is a clear pattern of increase. I now have a week's worth of data:
- Copy Link
- Report Inappropriate Content
Hello @d0ugmac1
d0ugmac1 wrote
Nobody has reached out yet, but there is a clear pattern of increase. I now have a week's worth of data:
I've already reminded the support engineer to handle your case as soon as possible yesterday. If somehow you still haven't receive a reply yet, please feel free to reply to the support email to remind the engineer for follow-up. Thanks for your great patience!
- Copy Link
- Report Inappropriate Content
Hi, just for information. That ramping of memory is mentioned in further threads. I actually only found one of them:
https://community.tp-link.com/en/business/forum/topic/584242?replyId=1137850 post # 10.
So this ramping of memory seams not to be a single issue.
Greetings
- Copy Link
- Report Inappropriate Content
(support reached out yesterday evening)
Here's the breakout, all 4 show increase over time.
Keep in mind these are 5 minute samples...a lot of things can happen in this timeframe!
- Copy Link
- Report Inappropriate Content
- Copy Link
- Report Inappropriate Content
@d0ugmac1 Another related post just added today, ticket ID is TKID230312710
https://community.tp-link.com/en/business/forum/topic/601540
- Copy Link
- Report Inappropriate Content
@d0ugmac1 2 WEEK CHECKIN - I have about 1 more week before it will die on me
Memory continues to increase relatively linearly:
But this is perhaps more worrying...is it running out of buffers, or just not using as much? This is on track to go to 0 about the time my router will become uncontrollable:
- Copy Link
- Report Inappropriate Content
@d0ugmac1 I think I'm getting pretty close to a failure on the A-end router. Here are the stats between the A-end and B-end (there is an L2TP tunnel between them, but no regular traffic). The differences between sites A and B are that B is pretty quiet from a traffic point of view and B is still getting reset weekly. A is running 1.2.2 and Linux Controller 5.9.9 and B is running 1.2.1 and OC200 5.7.6
Here is B's 'Stats', note the memory ~38% and CPU peaking ~8%
and the corresponding memory buffer state holding steady at ~5.8% for the last 10 days
Now here is A's stats, Memory ~58% and CPU peaking at 30% (I believe this is due to resource starvation, it's not doing a lot otherwise)
and its memory buffers that have dropped continuously and are now below 0.3%
- Copy Link
- Report Inappropriate Content
Information
Helpful: 1
Views: 2839
Replies: 26
Voters 0
No one has voted for it yet.