From owner-freebsd-fs@FreeBSD.ORG Fri Mar 5 21:49:45 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A507C106566B; Fri, 5 Mar 2010 21:49:45 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 1BB968FC1E; Fri, 5 Mar 2010 21:49:44 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAN8KkUuDaFvG/2dsb2JhbACbSXO3CIR3BIMXiyM X-IronPort-AV: E=Sophos;i="4.49,589,1262581200"; d="scan'208";a="67972920" Received: from amazon.cs.uoguelph.ca ([131.104.91.198]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 05 Mar 2010 16:49:44 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by amazon.cs.uoguelph.ca (Postfix) with ESMTP id 28A5E35000A; Fri, 5 Mar 2010 16:49:44 -0500 (EST) X-Virus-Scanned: amavisd-new at amazon.cs.uoguelph.ca Received: from amazon.cs.uoguelph.ca ([127.0.0.1]) by localhost (amazon.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FUOKkYKroi93; Fri, 5 Mar 2010 16:49:42 -0500 (EST) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by amazon.cs.uoguelph.ca (Postfix) with ESMTP id 6D38E350005; Fri, 5 Mar 2010 16:49:42 -0500 (EST) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id o25M1xj13862; Fri, 5 Mar 2010 17:01:59 -0500 (EST) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Fri, 5 Mar 2010 17:01:59 -0500 (EST) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: Daniel Braniss In-Reply-To: Message-ID: References: <20100226174021.8feadad9.gerrit@pmp.uni-hannover.de> <20100226224320.8c4259bf.gerrit@pmp.uni-hannover.de> <4B884757.9040001@digiware.nl> <20100227080220.ac6a2e4d.gerrit@pmp.uni-hannover.de> <4B892918.4080701@digiware.nl> <20100227202105.f31cbef7.gerrit@pmp.uni-hannover.de> <20100227193819.GA60576@icarus.home.lan> <4B89943C.70704@digiware.nl> <20100227220310.GA65110@icarus.home.lan> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: stable@freebsd.org, freebsd-fs@freebsd.org, Willem Jan Withagen , =?utf-8?B?RWlyaWsgw5h2ZXJieQ==?= , rwatson@freebsd.org, Jeremy Chadwick Subject: Re: mbuf leakage with nfs/zfs? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Mar 2010 21:49:45 -0000 On Fri, 5 Mar 2010, Daniel Braniss wrote: >> >> >> On Tue, 2 Mar 2010, Daniel Braniss wrote: >> >>> >>> just keep sending insights/pointers and enjoy life >>> >> >> >> You could try this patch for sys/rpc/replay.c. Completely untested and >> just typed into email (so don't give it to "patch", just edit the file). >> >> - try adding these 2 lines just before the end of replay_setreply() in >> sys/rpc/replay.c: >> >> - } >> + } else if (m) >> + m_freem(m); >> mtx_unlock(&rc->rc_lock); >> } >> >> It's the only place I can see in replay.c that might leak, rick >> > this is what I did: > --- a/sys/rpc/replay.c Mon Mar 01 18:29:54 2010 +0200 > +++ b/sys/rpc/replay.c Fri Mar 05 09:24:17 2010 +0200 > @@ -243,6 +243,9 @@ > rce->rce_repbody = m; > if (m) > rc->rc_size += m_length(m, NULL); > + } else if (m) { > + printf("free m=%p ...\n", m); > + m_freem(m); > } > mtx_unlock(&rc->rc_lock); > } > > but it didn't help, it's not triggered > Hmm, well that's the only place I could see in replay.c that could leak (and it's a pretty straightforward piece of code). This is getting interesting. Just to confirm where we currently are... - replay cache disabled --> no leak - replay cache enabled (with or without the above patch) --> leak I'll take another look, but I doubt the leak is in replay.c so... maybe a reply from the cache is somehow handled incorrectly and that causes the leak elsewhere? (Just a random hunch at this point.) > Thanks for the explanation on the cache, things are begining to make sense. > If I understand, the reason for this cache is to prevent re-applying an > already performed rpc, which could lead to data corruption > Yep, you've got it. It is basically a bandaid for the poor transport semantics provided by UDP. Having fun with this one. Thanks for the help, rick