Date: Mon, 22 Mar 2010 15:10:03 GMT From: Daniel Braniss <danny@cs.huji.ac.il> To: freebsd-fs@FreeBSD.org Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs Message-ID: <201003221510.o2MFA3Ft024915@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/144330; it has been noted by GNATS. From: Daniel Braniss <danny@cs.huji.ac.il> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: Mikolaj Golub <to.my.trociny@gmail.com>, Jeremy Chadwick <freebsd@jdc.parodius.com>, freebsd-fs@FreeBSD.org, Kai Kockro <kkockro@web.de>, bug-followup@FreeBSD.org, gerrit@pmp.uni-hannover.de Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs Date: Mon, 22 Mar 2010 17:04:40 +0200 > > > On Mon, 22 Mar 2010, Daniel Braniss wrote: > > > > > well, it's much better!, but no cookies yet :-) > > > > Well, that's good news. I'll try and get dfr to review it and then > commit it. Thanks Mikolaj, for finding this. > > > from comparing graphs in > > ftp://ftp.cs.huji.ac.il/users/danny/freebsd/mbuf-leak/ > > store-01-e.ps: a production server running newfsd - now up almost 20 days > > notice that the average used mbuf is below 1000! > > > > store-02.ps: kernel without last patch, classic nfsd > > the leak is huge. > > > > store-02++.ps: with latest patch > > the leak is much smaller but I see 2 issues: > > - the initial leap to over 2000, then a smaller leak. > > The initial leap doesn't worry me. That's just a design constraint. yes, but new-nsfd does it better. > A slow leak after that is still a problem. (I might have seen the > slow leak in testing here. I'll poke at it and see if I can reproduce > that.) all I do is mount upd on a client and start a write process. > > > > > could someone explain replay_prune() to me? > > > I just looked at it and I think it does the following: > - when it thinks the cache is too big (either too many entries > or too much mbuf data) it loops around until: > - no longer too much or can't free any more > (when an entry is free'd, rc_size and rc_count are > reduced) > (the loop is from the end of the tailq, so it is freeing > the least recently used entries) > - the test for rce_repmsg.rm_xid != 0 avoids freeing ones > that are in progress, since rce_repmsg is all zeroed until > the reply has been generated thanks for the information, it's what i thought, but the coding made it look as something else could happen - why else start the search of the queue after each match?> > I did notice that the call to replay_prune() from replay_setsize() does > not lock the mutex before calling it, so it doesn't look smp safe to me > for this case, but I doubt that would cause a slow leak. (I think this is > only called when the number of mbuf clusters in the kernel changes and > might cause a kernel crash if the tailq wasn't in a consistent state as > it rattled through the list in the loop.) > there seems to be an NFSLOCK involved before calling replay_setsize ... well, the server is a 2 cpu quad nehalem, so maybe I should try several clients ... > rick > btw, the new-nfsd has been running on a production server for almost 20 days and all seeems fine. anyways, things are looking better, cheers, danny
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201003221510.o2MFA3Ft024915>