Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate memory leak #13

Closed
Krinkle opened this issue Oct 29, 2013 · 42 comments
Closed

Investigate memory leak #13

Krinkle opened this issue Oct 29, 2013 · 42 comments

Comments

@Krinkle
Copy link
Member

Krinkle commented Oct 29, 2013

The bots are increasingly building up memory usage when running. Over a weeks time of running half a dozen bots, it accumulates to several gigabytes of memory being allocated.

@Krinkle
Copy link
Member Author

Krinkle commented Jul 12, 2017

Closing for now. Haven't noticed a leak in the monitoring for at least 6 months, even on the busiest bots for SWMT channels, en.wikipedia.org, and wikidata.org.

@Krinkle Krinkle closed this as completed Jul 12, 2017
@Krinkle Krinkle reopened this May 9, 2020
@Krinkle
Copy link
Member Author

Krinkle commented May 9, 2020

Meant to re-open this months years ago. Definitely still an issue.

And since about November or December, it seems to cause at least once a week that the bots come to a halt until we manually restart them.

See cvn.wmflabs.org/#monitoring-nagf

graphite-labs wikimedia org  2png
graphite-labs wikimedia org

@Krinkle
Copy link
Member Author

Krinkle commented Jun 7, 2020

To get started locally, clone this repo and build the code (either via Visual Studio, or from the command-line with msbuild and mono), then tell the bot via IRC to monitor Wikipedia or Wikidata (the most active wikis and thus triggers the leak the quickest).

By default the bot joins #cvn-sandbox on Freenode. To instruct the bot to monitor a wiki you need to be operator in that channel, so either ask someone there to +op you, or edit CVNBot.ini and change feedchannel to ##<yourname> which if you join will be auto-created on Freenode with you as operator. Once you're op in a channel with the bot alongside you, say:

YourCVNBot load en.wikipedia

@NickCraver
Copy link

Hey all, just seeing if I can be of any assistance here.

First checking constraints: is Mono the only available runtime, or is something more recent like .NET Core an option? I ask because there's a lot of string parsing involved here and that's where a lot of allocations can be removed and performance improved in .NET Core, if running it is an option.

I'm also curious about security implications here, there are some areas that allow SQL injection to the local database which would be good to fix. If .NET Core above is an option, is using additional libraries like Dapper an option? If so, we can clean those up in short order.

I'm taking a look through now to see if I readily see any leak locations, but a memory dump would be the most useful if you're able to take one. If you are, I can show you how to run a script on it to analyze what the big leak items are, or if there's no sensitive data and it's okay to upload somewhere, I'd happily do such analysis.

@Krinkle
Copy link
Member Author

Krinkle commented Jun 8, 2020

@NickCraver Hey, thanks for taking a look. Much appreciated.

This is running on a VM in Wikimedia Cloud with Debian 9 Stretch and the Mono 5.16 that comes with that (via Wikimedia's APT mirror). I work at Wikimedia Foundation on the LAMP stack for Wikipedia, but CVNBot and its operation is something I do in my volunteer capacity, and I'm very much a beginner here. I've inherited this and done my best to keep it running over the years to account for changes in the source feed and MediaWiki APIs it calls to, but beyond that I'm very much a beginner with C# and .NET/Mono in general.

I don't have much of a preference here, other than to work within the contraints of FLOSS (as required for Wikimedia Cloud), and ideally only dependencies that are in Debian stable (for operational simplicity/flexibility). Other than that, pretty much fine with anything and everything.

It does not surprise me that there are SQL vectors here. I've patched a few of those before this was moved to Git, but it's not a high area of concern given only trusted users are given access to "bot commands" and those same users are also the main consumer of the data so it would be a "footgun" situation. Having said that, obviously fixing them would be awesome, but less worrisome than the memory leak at this point (unless it's attackable by non-voiced users of the output feed, or through the input source).

Regarding memory dump, sure thing. I do need help with how to capture it though.

The only sensitive data is the bot's NickServ password, but I can simply rotate that to make things easier.

@NickCraver
Copy link

@Krinkle Got it - thanks! I'm honestly not sure what the process is to properly dump this on Debian, I'll ask for backup there - happy to analyze it though. From looking through the code, I'd currently be looking at unbounded project list growth. It seems to vary if a project is ever removed, and there are some races down that path with multiple threads.

It wouldn't be completely trivial work, but again .NET Core with async/await bits and new APIs after the .NET 4.5 level targeted here would help remove the races (we can "await" until a needed dependency completes). If .NET Core is an option, that's the way to go here for an easier time maintaining it, etc. Here are the install docs to see if running it if feasible (only the feed and SDK parts are needed here): https://docs.microsoft.com/en-us/dotnet/core/install/linux-package-manager-debian9#install-the-net-core-sdk The license is MIT, same for Dapper which would we could make the Sqlite bits quickly secure with.

That's long-term though, let's see if we can get some help memory dumping and confirm our culprit.

@KirillOsenkov
Copy link

Hey all, with Mono 5.16 you can switch to target net472 and start using async/await and all the latest goodness.

@Krinkle
Copy link
Member Author

Krinkle commented Jun 8, 2020

The project loader is rare used, only once when an instance is first set up, and then it stays like that. A restart every now and then after an upgrade, but for the most part processes start, live for weeks and get restarted without ever seeing a load, drop, or other command that modifies the projects list.

The Projects.xml file for CVNBot18/wikidata:
<projects>
<project><projectName>wikidata.wikipedia</projectName><interwikiLink /><rooturl>https://www.wikidata.org/</rooturl><speciallog>Special:.+?/(.+)</speciallog><namespaces>&lt;?xml version="1.0"?&gt;&lt;api batchcomplete=""&gt;&lt;query&gt;&lt;namespaces&gt;&lt;ns _idx="-2" id="-2" case="first-letter" canonical="Media" xml:space="preserve"&gt;Media&lt;/ns&gt;&lt;ns _idx="-1" id="-1" case="first-letter" canonical="Special" xml:space="preserve"&gt;Special&lt;/ns&gt;&lt;ns _idx="0" id="0" case="first-letter" content="" defaultcontentmodel="wikibase-item" xml:space="preserve" /&gt;&lt;ns _idx="1" id="1" case="first-letter" subpages="" canonical="Talk" xml:space="preserve"&gt;Talk&lt;/ns&gt;&lt;ns _idx="2" id="2" case="first-letter" subpages="" canonical="User" xml:space="preserve"&gt;User&lt;/ns&gt;&lt;ns _idx="3" id="3" case="first-letter" subpages="" canonical="User talk" xml:space="preserve"&gt;User talk&lt;/ns&gt;&lt;ns _idx="4" id="4" case="first-letter" subpages="" canonical="Project" xml:space="preserve"&gt;Wikidata&lt;/ns&gt;&lt;ns _idx="5" id="5" case="first-letter" subpages="" canonical="Project talk" xml:space="preserve"&gt;Wikidata talk&lt;/ns&gt;&lt;ns _idx="6" id="6" case="first-letter" canonical="File" xml:space="preserve"&gt;File&lt;/ns&gt;&lt;ns _idx="7" id="7" case="first-letter" subpages="" canonical="File talk" xml:space="preserve"&gt;File talk&lt;/ns&gt;&lt;ns _idx="8" id="8" case="first-letter" subpages="" canonical="MediaWiki" xml:space="preserve"&gt;MediaWiki&lt;/ns&gt;&lt;ns _idx="9" id="9" case="first-letter" subpages="" canonical="MediaWiki talk" xml:space="preserve"&gt;MediaWiki talk&lt;/ns&gt;&lt;ns _idx="10" id="10" case="first-letter" subpages="" canonical="Template" xml:space="preserve"&gt;Template&lt;/ns&gt;&lt;ns _idx="11" id="11" case="first-letter" subpages="" canonical="Template talk" xml:space="preserve"&gt;Template talk&lt;/ns&gt;&lt;ns _idx="12" id="12" case="first-letter" subpages="" canonical="Help" xml:space="preserve"&gt;Help&lt;/ns&gt;&lt;ns _idx="13" id="13" case="first-letter" subpages="" canonical="Help talk" xml:space="preserve"&gt;Help talk&lt;/ns&gt;&lt;ns _idx="14" id="14" case="first-letter" canonical="Category" xml:space="preserve"&gt;Category&lt;/ns&gt;&lt;ns _idx="15" id="15" case="first-letter" subpages="" canonical="Category talk" xml:space="preserve"&gt;Category talk&lt;/ns&gt;&lt;ns _idx="120" id="120" case="first-letter" canonical="Property" defaultcontentmodel="wikibase-property" xml:space="preserve"&gt;Property&lt;/ns&gt;&lt;ns _idx="121" id="121" case="first-letter" subpages="" canonical="Property talk" xml:space="preserve"&gt;Property talk&lt;/ns&gt;&lt;ns _idx="122" id="122" case="first-letter" canonical="Query" xml:space="preserve"&gt;Query&lt;/ns&gt;&lt;ns _idx="123" id="123" case="first-letter" canonical="Query talk" xml:space="preserve"&gt;Query talk&lt;/ns&gt;&lt;ns _idx="828" id="828" case="first-letter" subpages="" canonical="Module" xml:space="preserve"&gt;Module&lt;/ns&gt;&lt;ns _idx="829" id="829" case="first-letter" subpages="" canonical="Module talk" xml:space="preserve"&gt;Module talk&lt;/ns&gt;&lt;ns _idx="1198" id="1198" case="first-letter" subpages="" canonical="Translations" xml:space="preserve"&gt;Translations&lt;/ns&gt;&lt;ns _idx="1199" id="1199" case="first-letter" subpages="" canonical="Translations talk" xml:space="preserve"&gt;Translations talk&lt;/ns&gt;&lt;ns _idx="2300" id="2300" case="first-letter" canonical="Gadget" xml:space="preserve"&gt;Gadget&lt;/ns&gt;&lt;ns _idx="2301" id="2301" case="first-letter" canonical="Gadget talk" xml:space="preserve"&gt;Gadget talk&lt;/ns&gt;&lt;ns _idx="2302" id="2302" case="case-sensitive" canonical="Gadget definition" defaultcontentmodel="GadgetDefinition" xml:space="preserve"&gt;Gadget definition&lt;/ns&gt;&lt;ns _idx="2303" id="2303" case="case-sensitive" canonical="Gadget definition talk" xml:space="preserve"&gt;Gadget definition talk&lt;/ns&gt;&lt;ns _idx="2600" id="2600" case="first-letter" canonical="Topic" defaultcontentmodel="flow-board" xml:space="preserve"&gt;Topic&lt;/ns&gt;&lt;/namespaces&gt;&lt;/query&gt;&lt;/api&gt;</namespaces><restoreRegex>^restored "\[\[(?&lt;item1&gt;.+?)\]\]"(?:: (?&lt;comment&gt;.*?))?$</restoreRegex><deleteRegex>^deleted "\[\[(?&lt;item1&gt;.+?)\]\]"(?:: (?&lt;comment&gt;.*?))?$</deleteRegex><protectRegex>^protected "\[\[(?&lt;item1&gt;.+?)\]\]"(?:: (?&lt;comment&gt;.*?))?$</protectRegex><unprotectRegex>^removed protection from "\[\[(?&lt;item1&gt;.+?)\]\]"(?:: (?&lt;comment&gt;.*?))?$</unprotectRegex><modifyprotectRegex>^changed protection level for "\[\[(?&lt;item1&gt;.+?)\]\]"(?:: (?&lt;comment&gt;.*?))?$</modifyprotectRegex><uploadRegex>^uploaded "\[\[(?&lt;item1&gt;.+?)\]\]"(?:: (?&lt;comment&gt;.*?))?$</uploadRegex><moveRegex>^moved \[\[(?&lt;item1&gt;.+?)\]\] to \[\[(?&lt;item2&gt;.+?)\]\](?:: (?&lt;comment&gt;.*?))?$</moveRegex><moveredirRegex>^moved \[\[(?&lt;item1&gt;.+?)\]\] to \[\[(?&lt;item2&gt;.+?)\]\] over redirect(?:: (?&lt;comment&gt;.*?))?$</moveredirRegex><blockRegex>^blocked \[\[(?&lt;item1&gt;.+?)\]\] with an expiration time of (?&lt;item2&gt;.+?) \((?&lt;item3&gt;.+?)\)(?:: (?&lt;comment&gt;.*?))?$</blockRegex><unblockRegex>^unblocked (?&lt;item1&gt;.+?)(?:: (?&lt;comment&gt;.*?))?$</unblockRegex><reblockRegex>^changed block settings for \[\[(?&lt;item1&gt;.+?)\]\] with an expiration time of (?&lt;item2&gt;.+?) (?&lt;item3&gt;.+?)(?:: (?&lt;comment&gt;.*?))?$</reblockRegex><autosummBlank>^Blanked the page(?:: (?&lt;comment&gt;.*?))?$</autosummBlank><autosummReplace>^Replaced content with "(?&lt;item1&gt;.+?)"(?:: (?&lt;comment&gt;.*?))?$</autosummReplace></project>
</projects>

@KirillOsenkov
Copy link

Could it be a leak in the underlying IRC library? (Meebey.SmartIrc4net.dll)

@Krinkle
Copy link
Member Author

Krinkle commented Jun 8, 2020

@NickCraver The VMs currently run Mono 5.16 from mono-project.com's apt source (link to server provision config). I've added Microsoft's apt source as well on a similar instance and installed dotnet-sdk-3.1.

Is this an SDK you'd like to make us of within the CVNBot code, which Mono or MSBuild would pick up in the current workflow, or does this provide a binary I would use instead of Mono to run the CVNBot.exe process?

As for .NET version, I'd be fine with moving to a newer version – provided it can still be built with VS2019 on Mac, and build/run on Debian Linux.

@kashifkhan
Copy link
Contributor

@KirillOsenkov possible, that code hasn't been updated in a while either.

@Krinkle Krinkle pinned this issue Jun 8, 2020
@KirillOsenkov
Copy link

To run with .NET Core you would just type dotnet CvnBot.exe in terminal (and it might even work without converting to target .NET Core). But hard to say without trying.

First thing it should be safe to change this line to net472:
https://github.com/countervandalism/CVNBot/blob/90017b6e61bfe84541a7fb0ca767b3b1990089d8/src/CVNBot/CVNBot.csproj#L4

If you switch to net472 and you see build errors, we can help investigate those. When you build the .sln I'd use msbuild /bl src\CVNBot.sln - this way if you have errors in the build that you need to investigate you can use https://msbuildlog.com or share the binlog with us if you have questions. Careful though, the resulting msbuild.binlog will contain all environment variables.

@KirillOsenkov
Copy link

Here's the source for the SmartIRC library:
https://github.com/meebey/SmartIrc4net

It's on NuGet at:
https://www.nuget.org/packages/SmartIrc4net

The .dll version currently used is 0.4.5.0.

@Krinkle
Copy link
Member Author

Krinkle commented Jun 8, 2020

I'm seeing a few instances pop in and out of #cvn-sandbox on Freenode. (Totally fine!) In case that's one of you and you're wondering why it gets flood-kicked... the message buffering and flood prevention code have been tuned based on the higher thresholds our servers get from Freenode. That might make local reproducing of the memory leak harder and tweaking the buffer might cause a different/predictable leak to appear as it would likely not be able to keep up. Something to keep in mind – I don't have a good answer for that, other than that monitoring a smaller wiki might work for now (assuming that will still leak).

I've edited my previous comment to use en.wikipedia as example, which still quite busy but slightly less so. Other wiks to consider if those are still too busy: commons.wikimedia, or meta.wikimedia.

@KirillOsenkov
Copy link

One easy thing to do would be to update SmartIrc4net to latest in the hopes that the leak was there and it was fixed.

@KirillOsenkov
Copy link

For example, could it be this?
meebey/SmartIrc4net@13398fc

@KirillOsenkov
Copy link

cc @meebey have you had any known memory leaks in SmartIRC?

@kashifkhan
Copy link
Contributor

One easy thing to do would be to update SmartIrc4net to latest in the hopes that the leak was there and it was fixed.

Saw this check in on the bot, its already up to date with v 1.1
e2d92e7

@Krinkle
Copy link
Member Author

Krinkle commented Jun 8, 2020

First thing it should be safe to change this line to net472:

https://github.com/countervandalism/CVNBot/blob/90017b6e61bfe84541a7fb0ca767b3b1990089d8/src/CVNBot/CVNBot.csproj#L4

The code seems to compile without any warnings (both with msbuild and in VS2019), and run without issue from what I can tell.

I do note though, that there is also the following in the exe-config file:

<supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.5" />

I'm not sure how the two relate, or whether I made a mistake by having version and sku be different...

@KirillOsenkov
Copy link

<supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.5" /> - you can safely ignore, leave as is, it doesn't matter.

@alefranz
Copy link

alefranz commented Jun 8, 2020

@NickCraver The VMs currently run Mono 5.16 from mono-project.com's apt source (link to server provision config). I've added Microsoft's apt source as well on a similar instance and installed dotnet-sdk-3.1.

Is this an SDK you'd like to make us of within the CVNBot code, which Mono or MSBuild would pick up in the current workflow, or does this provide a binary I would use instead of Mono to run the CVNBot.exe process?

With .NET Core you can generate an exe so that it can be invoked directly. It can have the runtime self-contained so it is not even necessary to install the .NET Core runtime or SDK.
I imagine you are currently building the application on a build agent of sort before deploying it to the servers? So you would need to install the .NET Core SDK on that build agent but nothing needs to be installed on the servers if you go that self-contained route, which is probably preferable in your scenario so you don't have to install anything on the server and you only have to worry to keep up to date the build agent. Alternatively, you need to install the runtime, no need for the full SDK.

@Krinkle Krinkle removed the declined label Jun 8, 2020
@reedy
Copy link
Member

reedy commented Jun 8, 2020

Depending on how replicable this is, you've also got access to JetBrains ReSharper, and more importantly/relevantly dotTrace/dotMemory via our subscription

@Krinkle
Copy link
Member Author

Krinkle commented Jun 8, 2020

First thing it should be safe to change this line to net472:
https://github.com/countervandalism/CVNBot/blob/90017b6e61bfe84541a7fb0ca767b3b1990089d8/src/CVNBot/CVNBot.csproj#L4

The code seems to compile without any warnings (both with msbuild and in VS2019), and run without issue from what I can tell.

.. but support for .NET 4.7.2 was only added in Mono 5.18, whereas the servers still run Mono 5.16. However I see now that the mono-project apt source from which I installed Mono 5.16 are continuously updated even for older Debian distros. I didn't realise that they did that (I assumed they'd pin the version there for each Debian release). These apt sources noiw provide Mono 6.8 for Debian 9 Stretch, so I can do a rolling update for that first and then roll out the .NET 4.7.2 update after that.

Krinkle added a commit that referenced this issue Jun 8, 2020
Support for .NET 4.7.2 was added in Mono 5.18 per
<https://www.mono-project.com/docs/about-mono/releases/5.18.0/>.

The main CVN servers in Wikimedia Cloud run on Mono 5.16
from mono-project's apt sources, but those same sources have
been updated since we provisioned the server and now provide
latest Mono 6.8 even for Debian 9 Stretch.

This change is blocked on upgrading the servers first.

Ref #13.
@n3wjack
Copy link

n3wjack commented Jun 8, 2020

A memory dump from the process when it's having a lot of leaked memory might help in speeding up figuring out what's going on.
Using windbg you can find out what objects are in memory, and figure out what's causing the leak. Using DebugDiag you might even be able to find the leak without having to dig through it yourself with windbg : https://www.microsoft.com/en-us/download/details.aspx?id=49924

Krinkle added a commit to wikimedia/labs-countervandalism-cvn-infrastructure that referenced this issue Jun 8, 2020
Ref wikimedia/labs-countervandalism-CVNBot#13

The same apt source now provides Mono 6.8, I just ran `apt-get update` and `apt-get install mono-complete`.

> Unpacking mono-complete `6.8.0.123-0xamarin2+debian9b1`
> over `5.16.0.179-0xamarin1+debian9b1`.
Krinkle added a commit that referenced this issue Jun 8, 2020
Support for .NET 4.7.2 was added in Mono 5.18+.
For issue #13 we want to try moving from .NET 4.5 to 4.7.2.

I've upgraded the servers to Mono 6.8 and confirmed that
CVNBot v3.1 still compiles, builds and runs fine.

Ref #13.
Ref wikimedia/labs-countervandalism-cvn-infrastructure@b33a9739e7.
Krinkle added a commit that referenced this issue Jun 8, 2020
Support for .NET 4.7.2 was added in Mono 5.18 per
<https://www.mono-project.com/docs/about-mono/releases/5.18.0/>.

The servers were upgraded from Mono 5.16 to 6.8 earlier today.

Ref #13.
@kashifkhan
Copy link
Contributor

Itll be interesting to see if the updates to Mono 6.8 & going to .net 4.7 will alleviate the issue.

@kashifkhan
Copy link
Contributor

This bug that is closed in Mono 6.8 mono/mono#8689 looks interesting.

@Krinkle
Copy link
Member Author

Krinkle commented Jun 8, 2020

Updating the servers was complicated by an issue, but eventually completed. The mono-complete package refused to install on its own due to conflicts within two of its dependencies over how msbuild is organised (looks similar to mono/mono#15709).

$ apt-get install mono-complete
[…]

Preparing to unpack .../msbuild_1%3a16.5+xamarinxplat.2020.02.20.11.54-0xamarin2+debian9b1_all.deb ...
Unpacking msbuild (1:16.5+xamarinxplat.2020.02.20.11.54-0xamarin2+debian9b1) ...
dpkg: error processing archive /var/cache/apt/archives/msbuild_1%3a16.5+xamarinxplat.2020.02.20.11.54-0xamarin2+debian9b1_all.deb (--unpack):
 trying to overwrite '/usr/lib/mono/msbuild/15.0', which is also in package msbuild-sdkresolver 1:15.8+xamarinxplat.2018.07.31.22.43-0xamarin5+debian9b1

After manually uninstalling all traces of mono and msbuild, I tried again but this time install msbuild as its own package first before installing the rest of mono-complete. That worked, as described in mono/mono#15709.

I first redeployed CVNBot v3.1 (.NET 4.5) on Mono 6.8 by itself, which seemed to work fine. I left the fleet running long enough to make sure no obvious issues would emerge, but not looking at its memory footprint.

I've now deployed CVNBot v4.0 (.NET 4.7.2), let's see how that fares on Mono 6.8 for a few hours in terms of memory :)

@rxy
Copy link
Contributor

rxy commented Jun 9, 2020

if needed, We can use docker for isolate environment to using other solution ( avoid constraint for debian stretch package )

@meebey
Copy link

meebey commented Jun 10, 2020

cc @meebey have you had any known memory leaks in SmartIRC?

@KirillOsenkov if you are running version 1.1, then there are no known leaks. SmartIr4net is a mature library and is used by quite a few projects for long-running bots (e.g. LIOC), so I would be surprised to find a leak in it. But yes, in the past it had a few.

Any luck with a memory profiler to locate the leak?

@KirillOsenkov
Copy link

I'm not sure how to grab a memory dump on Linux and Mono. If this ran on .NET framework or .NET Core on Windows it would be very easy to investigate.

@KirillOsenkov
Copy link

I've managed to get it building and running on Windows. Just needed to add these PackageReferences:
#66

It's running now, I'll let it run overnight and see if I can reproduce a leak. If it leaks, it'll now be very easy to investigate.

@KirillOsenkov
Copy link

Repro instructions for Windows:

  1. git clone https://github.com/countervandalism/CVNBot
  2. apply my PR Switch to NuGet packages for references. #66
  3. cd CVNBot\src
  4. msbuild /r /bl CVNBot.sln

Running:

  1. go to https://webchat.freenode.net, enter a username and make a new channel
  2. edit CVNBot.ini to mention your channel name for feedchannel
  3. run the bot and wait for it to join your channel
  4. in the channel type YourCVNBot load en.wikipedia

@meebey
Copy link

meebey commented Jun 10, 2020

@KirillOsenkov this thread mentions the bot uses SQLite which caused a small dejavu. I ran into memory leaks with SQLite in https://github.com/meebey/smuxi in the past, check this commit meebey/smuxi@4e5f4c5#diff-256230ab72fa1386ce4aad3c8928e5ea

@kashifkhan
Copy link
Contributor

I updated my PR with a few more IDisposable objects wrapped in using. @meebey I think you are on to something here. @Krinkle there is dbcon in ListManager.cs that is a global and used throughout the file which could be the culprit, because its not wrapped in a using and possibly left open.

If that is refactored out from a global to just open as needed, I think it can alleviate at least that problem.

@Krinkle
Copy link
Member Author

Krinkle commented Jun 10, 2020

@kashifkhan The sqlite database is used continously by the bot in the execution of its primary function (which is reading events from RCReader, applying information from the database, and then deciding whether and how to output it in the feed channel). I have not confirmed it for many years, but suspect that closing/re-opening the database could cause it to no longer be able to keep up with the RC feeds in real-time.

I don't believe there are cases where the bot is working correctly and not need of the database connection being open.

However, it would certainly be good to make sure that when things reconnects/restart that we don't open duplicate connections etc.

Also, if there is potential for the open connection to leak memory if it is open for too long, perhaps we should periodically recycle it?

@KirillOsenkov
Copy link

So I ran CVNBot overnight on Windows, and I didn't notice any memory increase at all. It did however have a socket exception and a disconnect and I'm not sure what that was about. I restarted and it's running again now, keeps steady at 40MB.

@kashifkhan
Copy link
Contributor

@Krinkle yeah that is a good question for the overall group if they can answer with regards to Sqlite.

It usual rule of thumb is to keep a connection open for the unit of work that its supposed to do and then close it. In a chatty app like this one, where its doing stuff constantly a persistent connection might serve you better.

Im not sure how Sqlite would react to connections opening/closing constantly vs a long running persistent connection

Luckily the code in ListManager is structured as individual units already and would be quite easy to do a quick refactor to test, depending on current priorities.

Would love to hear from others in the group though on thoughts

@n3wjack
Copy link

n3wjack commented Jun 10, 2020

@kashifkhan I don't know if SqlLite connections are also pooled like the ones ADO.NET opens for e.g. an MS SQL DB, but because of this connection pooling, open & closing connections is fast (because they don't actually open & close, you're just getting an open connection from the pool).
When it comes to memory leaks I think an open SqlLite connection that is shared in the app won't be causing a leak. Creating new ones every time and forgetting to dispose them would.
But since the code sticks to a single object, it should be OK.
Unless there is a memory leak building up in the connection object itself over time of course, but that would be a problem inside the SqlLite code.

@DavidKarlas
Copy link
Contributor

@Krinkle am I reading graphs at https://tools.wmflabs.org/nagf/?project=cvn#h_cvn-app8_memory wrong or did bumping Mono to 6.8 fix leaking?

@Krinkle
Copy link
Member Author

Krinkle commented Jun 11, 2020

@DavidKarlas It would seem so, yes. I'm not yet ready to close this ticket however as there seem to be one or two bugs I'm observing that seem to be new and may be masking the memory isssue:

  1. The most active feeds are getting flood kicked even from the production instance. This results in an automatic restart. It's possible that the message buffering/back-off mechanism has stopped working correctly, or that newer Mono is faster and that it was already broken before but naturally spread out due to worse performance.
  2. The bot seems to occasionally loose track of the RCReader channel, whilst still being fine in the feed channel on Freenode and nothing in the logs to explain this. It is not resulting in an automatic restart since the bot is still responsive on the feed/Freenode side, but as it is no longer receiving anything from irc-wikimedia, it is therefore also not doing anything and thus likely not increasing memory. I've been restarting the one or two bigger ones affected by this manually every 20 hours or so in the past two days.

It's still miles better than before, though. This has been a huge improvement!

@KirillOsenkov
Copy link

I’ve seen a random SocketException where the bot has disconnected from the channel, not sure what that was about but possibly related. I left the bot running under debugger and then saw it in the morning.

@Krinkle
Copy link
Member Author

Krinkle commented Jun 27, 2020

graphite-labs wikimedia org
cvn-wmcloud-nagf-stats

Declaring victory on this one. I know this was quite a simple fix, but I couldn't have done it without y'all. Thanks! ❤️

@Krinkle Krinkle closed this as completed Jun 27, 2020
@Krinkle Krinkle unpinned this issue Sep 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

10 participants