Date: Mon, 22 Mar 2010 10:04:46 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Daniel Braniss <danny@cs.huji.ac.il> Cc: bug-followup@FreeBSD.org, freebsd-fs@FreeBSD.org, Kai Kockro <kkockro@web.de> Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs Message-ID: <Pine.GSO.4.63.1003220949490.11799@muncher.cs.uoguelph.ca> In-Reply-To: <E1NtfW6-0008E7-9q@kabab.cs.huji.ac.il> References: <201003171120.o2HBK3CV082081@freefall.freebsd.org> <20100317113953.GA14582@icarus.home.lan> <Pine.GSO.4.63.1003171844120.20254@muncher.cs.uoguelph.ca> <86tys9eqo6.fsf@kopusha.onet> <Pine.GSO.4.63.1003212018180.28991@muncher.cs.uoguelph.ca> <E1NtfW6-0008E7-9q@kabab.cs.huji.ac.il>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 22 Mar 2010, Daniel Braniss wrote: > > well, it's much better!, but no cookies yet :-) > Well, that's good news. I'll try and get dfr to review it and then commit it. Thanks Mikolaj, for finding this. > from comparing graphs in > ftp://ftp.cs.huji.ac.il/users/danny/freebsd/mbuf-leak/ > store-01-e.ps: a production server running newfsd - now up almost 20 days > notice that the average used mbuf is below 1000! > > store-02.ps: kernel without last patch, classic nfsd > the leak is huge. > > store-02++.ps: with latest patch > the leak is much smaller but I see 2 issues: > - the initial leap to over 2000, then a smaller leak. The initial leap doesn't worry me. That's just a design constraint. A slow leak after that is still a problem. (I might have seen the slow leak in testing here. I'll poke at it and see if I can reproduce that.) > > could someone explain replay_prune() to me? > I just looked at it and I think it does the following: - when it thinks the cache is too big (either too many entries or too much mbuf data) it loops around until: - no longer too much or can't free any more (when an entry is free'd, rc_size and rc_count are reduced) (the loop is from the end of the tailq, so it is freeing the least recently used entries) - the test for rce_repmsg.rm_xid != 0 avoids freeing ones that are in progress, since rce_repmsg is all zeroed until the reply has been generated I did notice that the call to replay_prune() from replay_setsize() does not lock the mutex before calling it, so it doesn't look smp safe to me for this case, but I doubt that would cause a slow leak. (I think this is only called when the number of mbuf clusters in the kernel changes and might cause a kernel crash if the tailq wasn't in a consistent state as it rattled through the list in the loop.) rick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.63.1003220949490.11799>