Modify

Opened 2 months ago

Closed 2 months ago

Last modified 2 months ago

#55 closed defect (invalid)

9.172.6543.39951 memory leak on Win10

Reported by: anonymous Owned by: evacomaroski
Priority: major Component: Corrade Progressive
Keywords: memory-leak Cc:

Description

I'm running Win10 (fully patched). 9.170 seemed to have a memory leak. I checked for system updates (had some), then attempted to stitch to 9.172. Was unable to stitch so updated by hand. 9.172.6543.39951 seems to have the memory leak, see screenshot at https://snag.gy/hymg2u.jpg for info.

Attachments (2)

secondlife_scripted_agents_corrade_benchmarks_windows.png (4.1 KB) - added by office 2 months ago.
Corrade.log (114.2 KB) - added by aphroditeatlas@… 2 months ago.
corrade.log

Download all attachments as: .zip

Change History (12)

comment:1 Changed 2 months ago by dgothtly@…

I ran Stitch v2.1 on Windows 7 Ultimate and was able to update Corrade to version 9.172.6544.17952 (see https://gyazo.com/816b2680eb715c7798c8421e243edb75). It also exhibits the memory leak. See Task Manager screenshot at https://gyazo.com/d5c43e75840df921a6bac536708ccc8a

comment:2 Changed 2 months ago by dgothtly@…

Several hours later see Task Manager screenshot at https://gyazo.com/717fb7034aecadcf5947c4721db58afe

comment:3 Changed 2 months ago by evacomaroski

Owner: set to evacomaroski
Status: newaccepted

Hello and thanks for reporting! We will run the necessary checks and get back to you as soon as possible.

comment:4 Changed 2 months ago by office

Resolution: invalid
Status: acceptedclosed

To address this issue, we have performed a long run test of Corrade running the latest Stitch version on Ectogram Resident (the bot that manages the [Wizardry and Steamworks]:Support group. A bot that is highly active, at least one autopilot command per 5 seconds and additionally commands being sent in through Corrade's internal HTTP server for the demonstration page.

During a 24 hours run, every minute the memory usage of Corrade had been sampled and then plotted on the graph attached to this ticket:

.

As you can observe, there are fluctuations in memory usage: sometimes lower and sometimes higher than the mean average; which is to be expected from a garbage collector.

Perhaps the most interesting is how during certain time periods of the 24 hour run, Corrade seems to allocate and deallocate memory more than in other times of the day. Most likely, this is due to SecondLife? being vacant compared to other times of day. In any case, the results show an overall constant memory usage.

Furthermore, from you own results, Corrade's (instantaneous) memory usage is within the design parameters. In case you want to lower the overall memory usage, please see the scripting considerations page which should allow you to lower the memory usage considerably.

Thank you for reporting; this issue is mightily popular and we were slow in updating the benchmarks page - at some point, we will offer some captured data. We are closing down this ticket but if you think we're in error or if the issue resurfaces, please feel free to reopen.

Changed 2 months ago by aphroditeatlas@…

Attachment: Corrade.log added

corrade.log

comment:5 Changed 2 months ago by anonymous

See attached corrade.log. As of the moment I uploaded the log my bot had been off the grid but still running as a service for about 17 hours. Note the gradual increase in allocated RAM with every heartbeat report.

comment:6 Changed 2 months ago by anonymous

Resolution: invalid
Status: closedreopened

comment:7 Changed 2 months ago by office

Resolution: invalid
Status: reopenedclosed

Hello, and thanks for reporting. The following sequence:

09-12-2017 04:01:22 : Wooly Morgwain : Heartbeat : CPU: 0% RAM: 143MiB Uptime: 5d:5h:24m Commands: 1 Behaviours: 0
09-12-2017 05:19:20 : Wooly Morgwain : Heartbeat : CPU: 2% RAM: 596MiB Uptime: 4d:10h:35m Commands: 8 Behaviours: 0

makes no sense - how come the bot was online for 5 days and then, an hour later, it was online for only 4 days without disconnecting?

The logs show that some state files had been changed - given the many reloads. For instance:

14:17:18 : Wooly Morgwain : group soft bans file modified
14:17:18 : Wooly Morgwain : group feeds file modified
14:17:18 : Wooly Morgwain : notifications file modified
14:17:36 : Wooly Morgwain : notifications file modified
14:17:36 : Wooly Morgwain : group feeds file modified
14:17:36 : Wooly Morgwain : group soft bans file modified
14:17:50 : Wooly Morgwain : group feeds file modified
14:17:50 : Wooly Morgwain : notifications file modified
14:17:50 : Wooly Morgwain : group soft bans file modified
14:18:07 : Wooly Morgwain : group soft bans file modified
14:18:07 : Wooly Morgwain : group feeds file modified
14:18:07 : Wooly Morgwain : notifications file modified
14:18:23 : Wooly Morgwain : group feeds file modified
14:18:23 : Wooly Morgwain : notifications file modified
14:18:23 : Wooly Morgwain : group soft bans file modified
14:18:38 : Wooly Morgwain : group soft bans file modified
14:18:38 : Wooly Morgwain : group feeds file modified
14:18:38 : Wooly Morgwain : notifications file modified
14:18:54 : Wooly Morgwain : notifications file modified
14:18:54 : Wooly Morgwain : group feeds file modified
14:18:54 : Wooly Morgwain : group soft bans file modified
14:19:13 : Wooly Morgwain : group soft bans file modified
14:19:13 : Wooly Morgwain : notifications file modified
14:19:13 : Wooly Morgwain : group feeds file modified
14:19:57 : Wooly Morgwain : notifications file modified

Is there some process writing to those state files?

Another issue to check would be the network connection:

12-12-2017 08:30:26 : Wooly Morgwain : simulator disconnected : Jergans : NetworkTimeout

is an error that occurs multiple times in the log file indicating that the bot has lost connection to one (or more) simulators on the grid. This usually happens after a teleport but losing all connections to the simulator, as indicated many times by lines such as:

16-12-2017 10:42:27 : Wooly Morgwain : all simulators disconnected

can happen due to a bad network connection (or grid restarts, but that would not be the case in the same day). Perhaps there are network connection issues? Linden Lab recommends cable not wireless, not sure if that is the case. . .

There are very many instances of:

10-12-2017 13:34:11 : Wooly Morgwain : Heartbeat : CPU: 0% RAM: 266MiB Uptime: 1d:7h:48m Commands: 1 Behaviours: 0
10-12-2017 13:49:48 : Wooly Morgwain : unable to load group members state :

which would indicate corruption of the group members cache files. If this is the latest Corrade version, then the group members cache file is written progressively as Corrade runs.

Are you sure you are using the latest Corrade version? The latest is 9.172.6544.17952.

Furthermore:

14-12-2017 19:33:19 : Wooly Morgwain : error updating inventory :

Could indicate some massive resource starvation: whether network or other processes interfering - although network seems more plausible. Are there other processes using up CPU time and starving the machine?

Perhaps the following questions would help:

  • Are you running Corrade as a service? Console mode is only for debugging.
  • Are there any processes writing to the configuration or state files?
  • Is the bot placed on an extremely laggy simulator - why are there so many disconnects during the same day? Are you using cable instead of wireless?
  • Perhaps you could try disabling the Multiple Simulator Connections option and check for a change?
  • Lastly, please see the performance considerations page and disable what is not needed.

Unfortunately, not many conclusions can be drawn from the log files: it just looks like a bot that is under heavy abuse, performing invites, teleporting to other simulators, being starved of resources, disconnecting now and then for no reason that would appear in the log files, state files being modified externally a bunch of times (for no reason?), it's amazing that it still keeps going. . .

Given the command usage, perhaps the following idea may help:

  • use a single bot for performing group management and keep that Corrade out of the way on a simulator that does not have issues (preferably, a homestead).
  • create a different bot to perform other commands such as teleporting around, flying or querying primitives.

If you would like further support on this issue, please feel free to open a separate ticket since this could just be conjecture.

Last edited 2 months ago by office (previous) (diff)

comment:8 Changed 2 months ago by dgothly@…

Regardless of the above comments, my bot does still leak memory. I have changed the setting for Multiple Simulators (which is not documented anywhere on this site btw .. nor can Google find a reference for where to change it) and made sure it is disabled. One thing to note, at startup of the Bot (as a service and yes the latest version) there is a giant bunch of System Exception reports in the first part of the log as follows:

System.Exception: Could not deserialize to SerializableDictionary`2 ---> System.InvalidOperationException: There is an error in XML document (0, 0). ---> System.Xml.XmlException: Root element is missing.
   at System.Xml.XmlTextReaderImpl.ThrowWithoutLineInfo(String res)
   at System.Xml.XmlTextReaderImpl.ParseDocumentContent()
   at System.Xml.XmlReader.MoveToContent()
   at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderSerializableDictionary2.Read1_Item()
   --- End of inner exception stack trace ---
   at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle, XmlDeserializationEvents events)
   at wasSharpNET.Serialization.XmlSerializerCache.Deserialize[T](String XmlData, Type[] ExtraTypes) in C:\Users\office\Desktop\corrade\wasSharpNET\Serialization\XmlSerializerCache.cs:line 104
   --- End of inner exception stack trace ---
   at wasSharpNET.Serialization.XmlSerializerCache.Deserialize[T](String XmlData, Type[] ExtraTypes) in C:\Users\office\Desktop\corrade\wasSharpNET\Serialization\XmlSerializerCache.cs:line 112
   at Corrade.Corrade.<>c.<.cctor>b__253_84() in C:\Users\office\Desktop\corrade\Corrade\Corrade.cs:line 1352
--------------------------------------------------------------------------------

It still remains to be seen over the long haul if this resolved the issue but the System Exception report has remained since day one of running the bot.

Initial snapshot of memory usage: https://gyazo.com/8e4900f3d6564ad80f3c752a4a39aee7

comment:9 Changed 2 months ago by office

To reiterate the previously mentioned points, hopefully more succinctly this time:

  • the memory usage in the provided screenshot is within design parameters,
  • instantaneous memory consumption as illustrated in the screenshot is entirely irrelevant and does not imply any leaks; only memory consumption over time (also see the previous point)
  • Multiple Simulator Connections can be disabled from the Client tab of the Configurator.exe program, sorry for that omission, it will be added to the documentation.
Last edited 2 months ago by office (previous) (diff)

comment:10 Changed 2 months ago by dgothly@…

Since disabling the Multiple Simulator Connections option (which requires Advanced configuration mode to see) the memory usage for my bot has stabilized within an acceptable range. Whereas before it would have climbed to over 200MB by now, it has remained in the (roughly) 87-130MB range all day long. I am content to say .. my issue is resolved. I would recommend setting that option off by default in future releases.

Latest memory snapshop = https://gyazo.com/ddc85044382809629cf39a6ad08a4bdb

Modify Ticket

Action
as closed The owner will remain evacomaroski.
The resolution will be deleted. Next status will be 'reopened'.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.