From scott.parkerson at gmail.com Fri Mar 5 09:34:17 2010 From: scott.parkerson at gmail.com (Scott Parkerson) Date: Fri, 05 Mar 2010 09:34:17 -0500 Subject: Excessive memory consumption reading large gzipped changesets Message-ID: <1267799657.16040.25.camel@xyzzy.ateb.com> Folks, It's been known for a long time that Conary can eat a lot of memory when doing large updates to a system. By "a lot of memory", I mean more than 300 MiB RSS to do a system update with a peak RSS of nearly 500+MiB. This was discovered for a "modest" update from foresight.rpath.org=:2-devel with only 55 update jobs. I spent some time tracking this problem down, and I believe the problem with the peak memory usage lies here: http://bitbucket.org/rpathsync/conary/src/tip/conary/repository/changeset.py#cl-1802 This line of code reads the entire uncompressed frozen changeset into memory in order to be passed around to StreamSet (in cstreams) to be parsed. I've been thinking about a number of approaches, and I would like to bounce one particular thought off of you: I'd like to enhance StreamSet to be able to directly manipulate the file (gzipped or not). Then, the __init__ method of StreamSet (in streamset.c) will be able to parse the file directly without having to read the whole thing into RAM, potentially saving a rather large memory allocation. I know that this would complicate the interface to ChangeSet (as right now, you just pass it a blob of data to initialize), but I think it might be more efficient. At the very least, I believe it would prevent the initial memory spike from occurring. Of course, I'd like to see if we could scale back the "necessary" size of the changeset; it's still hard for me to believe that the merged changeset required for an update needs to have 200 MiB of RAM, but I've not drilled into what is being kept in RAM, exactly, nor whether or not there are any leaks. I do know that PackageKit, which has to run through its update twice, will eat 2x that RAM. Admittedly, that could be bad memory management on the part of the PK Conary backend, which I am also looking into as I prepare a fix for doniphon + friends. What do you think? --sgp -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.rpath.com/pipermail/conary-list/attachments/20100305/645915c3/attachment.html