From owner-freebsd-fs@FreeBSD.ORG Sun Mar 21 13:43:29 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 24F051065672; Sun, 21 Mar 2010 13:43:29 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from ey-out-2122.google.com (ey-out-2122.google.com [74.125.78.24]) by mx1.freebsd.org (Postfix) with ESMTP id 7511D8FC15; Sun, 21 Mar 2010 13:43:28 +0000 (UTC) Received: by ey-out-2122.google.com with SMTP id d26so241473eyd.9 for ; Sun, 21 Mar 2010 06:43:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:to:cc:subject:references :organization:from:date:in-reply-to:message-id:user-agent :mime-version:content-type; bh=3GrOanLUokXyrqaW2uf/5/SaFG6t0x6kjWWWcbAeZKo=; b=V2y15MSgYhWfD7D5BvIfostj4wqIpO16IOwC6WwA1cW6YSvjtkzTgZjFBMU2D7Vu3z R+5DVcrtsEat0PJUUro7EL8l1+/qi6EsWp17p11uwvTHGDcPkq5rIpJdGQQZQrWxtsye TkxH8KnxjPzNTIrgaDFCtMAtIUKXABn1U/tQY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=to:cc:subject:references:organization:from:date:in-reply-to :message-id:user-agent:mime-version:content-type; b=mj4kqetNsSZYnwAjBYjgIT/eiSsu3/JHYbo0jSLXWkzGm1BvFu+iQ8PLL1bso9vZcm XFtOPYGKMCPfsfgTaUUnReNbTSx3lQFVS7g/+JUdXUiugBV34ZS1pPccnDN4LeNozeQV OTPIjB3zgR9LMQ5xLC85yF5A/tGCv0e6Uyz8w= Received: by 10.213.37.14 with SMTP id v14mr846258ebd.28.1269179006986; Sun, 21 Mar 2010 06:43:26 -0700 (PDT) Received: from localhost ([95.69.160.238]) by mx.google.com with ESMTPS id 16sm1167804ewy.3.2010.03.21.06.43.24 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sun, 21 Mar 2010 06:43:25 -0700 (PDT) To: Rick Macklem References: <201003171120.o2HBK3CV082081@freefall.freebsd.org> <20100317113953.GA14582@icarus.home.lan> Organization: TOA Ukraine From: Mikolaj Golub Date: Sun, 21 Mar 2010 15:43:21 +0200 In-Reply-To: (Rick Macklem's message of "Wed\, 17 Mar 2010 18\:46\:27 -0400 \(EDT\)") Message-ID: <86tys9eqo6.fsf@kopusha.onet> User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Cc: freebsd-fs@FreeBSD.org, Kai Kockro , bug-followup@FreeBSD.org Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Mar 2010 13:43:29 -0000 --=-=-= Having debian in VirtualBOX I had some problems reproducing this until I ran a script that dropped for shot periods of time traffic from the nfs server to the client (using pf): block out on $if from any to while sleep 3; do sudo pfctl -t nfs -vT add 10.0.0.217; sleep 2; sudo pfctl -t nfs -vT show; sudo pfctl -t nfs -vT delete 10.0.0.217; done The idea was to drop NFS server responses to make the client resend requests and make RPC reply comes from the cache. And mbufs usage growth started to observe: 09:30: 2806/1589/4395 mbufs in use (current/cache/total) 10:00: 5397/1068/6465 mbufs in use (current/cache/total) 10:30: 7945/1760/9705 mbufs in use (current/cache/total) 11:00: 9560/1435/10995 mbufs in use (current/cache/total) 11:30: 10337/2113/12450 mbufs in use (current/cache/total) Athough it might be another issue then reported in this pr :-). Reviewing rpc/svc.c:svc_getreq() it looks for me that for RS_DONE case args are nevere freed. Shouldn't it be like in the attached patch? Running the above test on the patched kernel the growth has not been observed so far: 13:00: 1501/2219/3720 mbufs in use (current/cache/total) 13:30: 1514/2971/4485 mbufs in use (current/cache/total) 14:00: 1096/3389/4485 mbufs in use (current/cache/total) 14:30: 1107/3378/4485 mbufs in use (current/cache/total) 15:00: 1105/3380/4485 mbufs in use (current/cache/total) 15:30: 1105/3380/4485 mbufs in use (current/cache/total) -- Mikolaj Golub --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=svc.c.svc_getreq.patch --- sys/rpc/svc.c.orig 2010-03-21 10:17:20.000000000 +0200 +++ sys/rpc/svc.c 2010-03-21 10:20:05.000000000 +0200 @@ -819,6 +819,7 @@ svc_getreq(SVCXPRT *xprt, struct svc_req free(r->rq_addr, M_SONAME); r->rq_addr = NULL; } + m_freem(args); goto call_done; default: --=-=-=-- From owner-freebsd-fs@FreeBSD.ORG Sun Mar 21 13:50:05 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 508C4106566B for ; Sun, 21 Mar 2010 13:50:05 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 25AC78FC12 for ; Sun, 21 Mar 2010 13:50:05 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o2LDo4p4079371 for ; Sun, 21 Mar 2010 13:50:04 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o2LDo4tI079370; Sun, 21 Mar 2010 13:50:04 GMT (envelope-from gnats) Date: Sun, 21 Mar 2010 13:50:04 GMT Message-Id: <201003211350.o2LDo4tI079370@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Mikolaj Golub Cc: Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Mikolaj Golub List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Mar 2010 13:50:05 -0000 The following reply was made to PR kern/144330; it has been noted by GNATS. From: Mikolaj Golub To: Rick Macklem Cc: Jeremy Chadwick , freebsd-fs@FreeBSD.org, Kai Kockro , bug-followup@FreeBSD.org, gerrit@pmp.uni-hannover.de Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs Date: Sun, 21 Mar 2010 15:43:21 +0200 --=-=-= Having debian in VirtualBOX I had some problems reproducing this until I ran a script that dropped for shot periods of time traffic from the nfs server to the client (using pf): block out on $if from any to while sleep 3; do sudo pfctl -t nfs -vT add 10.0.0.217; sleep 2; sudo pfctl -t nfs -vT show; sudo pfctl -t nfs -vT delete 10.0.0.217; done The idea was to drop NFS server responses to make the client resend requests and make RPC reply comes from the cache. And mbufs usage growth started to observe: 09:30: 2806/1589/4395 mbufs in use (current/cache/total) 10:00: 5397/1068/6465 mbufs in use (current/cache/total) 10:30: 7945/1760/9705 mbufs in use (current/cache/total) 11:00: 9560/1435/10995 mbufs in use (current/cache/total) 11:30: 10337/2113/12450 mbufs in use (current/cache/total) Athough it might be another issue then reported in this pr :-). Reviewing rpc/svc.c:svc_getreq() it looks for me that for RS_DONE case args are nevere freed. Shouldn't it be like in the attached patch? Running the above test on the patched kernel the growth has not been observed so far: 13:00: 1501/2219/3720 mbufs in use (current/cache/total) 13:30: 1514/2971/4485 mbufs in use (current/cache/total) 14:00: 1096/3389/4485 mbufs in use (current/cache/total) 14:30: 1107/3378/4485 mbufs in use (current/cache/total) 15:00: 1105/3380/4485 mbufs in use (current/cache/total) 15:30: 1105/3380/4485 mbufs in use (current/cache/total) -- Mikolaj Golub --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=svc.c.svc_getreq.patch --- sys/rpc/svc.c.orig 2010-03-21 10:17:20.000000000 +0200 +++ sys/rpc/svc.c 2010-03-21 10:20:05.000000000 +0200 @@ -819,6 +819,7 @@ svc_getreq(SVCXPRT *xprt, struct svc_req free(r->rq_addr, M_SONAME); r->rq_addr = NULL; } + m_freem(args); goto call_done; default: --=-=-=-- From owner-freebsd-fs@FreeBSD.ORG Sun Mar 21 15:54:19 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D1F5D1065740 for ; Sun, 21 Mar 2010 15:54:19 +0000 (UTC) (envelope-from morganw@chemikals.org) Received: from warped.bluecherry.net (unknown [IPv6:2001:440:eeee:fffb::2]) by mx1.freebsd.org (Postfix) with ESMTP id 907068FC13 for ; Sun, 21 Mar 2010 15:54:18 +0000 (UTC) Received: from volatile.chemikals.org (adsl-67-118-119.shv.bellsouth.net [98.67.118.119]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by warped.bluecherry.net (Postfix) with ESMTPSA id D9B1084FE412; Sun, 21 Mar 2010 10:34:48 -0500 (CDT) Received: from localhost (morganw@localhost [127.0.0.1]) by volatile.chemikals.org (8.14.4/8.14.4) with ESMTP id o2LFYjAG052561; Sun, 21 Mar 2010 10:34:45 -0500 (CDT) (envelope-from morganw@chemikals.org) Date: Sun, 21 Mar 2010 10:34:44 -0500 (CDT) From: Wes Morgan X-X-Sender: morganw@volatile To: Baldur Gislason In-Reply-To: <20100317214234.GF63370@gremlin.foo.is> Message-ID: References: <20100317214234.GF63370@gremlin.foo.is> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: clamav-milter 0.95.3 at warped X-Virus-Status: Clean Cc: freebsd-fs@freebsd.org Subject: Re: Frustration: replace not doing what I expected. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Mar 2010 15:54:19 -0000 On Wed, 17 Mar 2010, Baldur Gislason wrote: > A drive failed in a pool and I had to replace it. > I did zpool replace ad18 ad18, the pool resilvered for 5 hours > and finished but did not return from degraded mode. > I tried removing the cache file and reimporting the pool, no change, it > hasn't gotten rid of the old drive which does not exist anymore. Hmmm. I've successfully replaced a drive that way before, and I'm sure many other people have. Did you offline ad18 before doing both the physical drive replacement and the zpool replace? I can't recall if that is necessary or not. Can you send the relevant output from zpool history? The "old" device is part of the metadata on the drive labels, so there is no way to remove it like you're wanting without either zfs deciding to remove it or rewriting the labels by hand. > pool: zirconium > state: DEGRADED > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > zirconium DEGRADED 0 0 0 > raidz1 DEGRADED 0 0 0 > ad4 ONLINE 0 0 0 > ad6 ONLINE 0 0 0 > replacing DEGRADED 0 0 0 > 2614810928866691230 UNAVAIL 0 962 0 was /dev/ad18/old > ad18 ONLINE 0 0 0 > ad20 ONLINE 0 0 0 From owner-freebsd-fs@FreeBSD.ORG Sun Mar 21 16:10:53 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 18B59106564A for ; Sun, 21 Mar 2010 16:10:53 +0000 (UTC) (envelope-from baldur@foo.is) Received: from gremlin.foo.is (gremlin.foo.is [194.105.250.10]) by mx1.freebsd.org (Postfix) with ESMTP id D33638FC0A for ; Sun, 21 Mar 2010 16:10:52 +0000 (UTC) Received: by gremlin.foo.is (Postfix, from userid 1000) id 4734FDA855; Sun, 21 Mar 2010 16:10:51 +0000 (GMT) Date: Sun, 21 Mar 2010 16:10:51 +0000 From: Baldur Gislason To: freebsd-fs@freebsd.org Message-ID: <20100321161051.GM63370@gremlin.foo.is> References: <20100317214234.GF63370@gremlin.foo.is> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.18 (2008-05-17) Subject: Re: Frustration: replace not doing what I expected. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Mar 2010 16:10:53 -0000 I got it working, what I had to do was to delete a file that the resilver process reported as being corrupted. Then run a scrub again and it would upgrade the pool status to healthy. Baldur On Sun, Mar 21, 2010 at 10:34:44AM -0500, Wes Morgan wrote: > On Wed, 17 Mar 2010, Baldur Gislason wrote: > > > A drive failed in a pool and I had to replace it. > > I did zpool replace ad18 ad18, the pool resilvered for 5 hours > > and finished but did not return from degraded mode. > > I tried removing the cache file and reimporting the pool, no change, it > > hasn't gotten rid of the old drive which does not exist anymore. > > Hmmm. I've successfully replaced a drive that way before, and I'm sure > many other people have. Did you offline ad18 before doing both the > physical drive replacement and the zpool replace? I can't recall if that > is necessary or not. Can you send the relevant output from zpool history? > > The "old" device is part of the metadata on the drive labels, so there is > no way to remove it like you're wanting without either zfs deciding to > remove it or rewriting the labels by hand. > > > > pool: zirconium > > state: DEGRADED > > status: One or more devices has experienced an error resulting in data > > corruption. Applications may be affected. > > action: Restore the file in question if possible. Otherwise restore the > > entire pool from backup. > > see: http://www.sun.com/msg/ZFS-8000-8A > > scrub: none requested > > config: > > > > NAME STATE READ WRITE CKSUM > > zirconium DEGRADED 0 0 0 > > raidz1 DEGRADED 0 0 0 > > ad4 ONLINE 0 0 0 > > ad6 ONLINE 0 0 0 > > replacing DEGRADED 0 0 0 > > 2614810928866691230 UNAVAIL 0 962 0 was /dev/ad18/old > > ad18 ONLINE 0 0 0 > > ad20 ONLINE 0 0 0 From owner-freebsd-fs@FreeBSD.ORG Sun Mar 21 20:08:42 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7A71B1065670; Sun, 21 Mar 2010 20:08:42 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 513408FC15; Sun, 21 Mar 2010 20:08:42 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o2LK8gtc011280; Sun, 21 Mar 2010 20:08:42 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o2LK8ggU011276; Sun, 21 Mar 2010 20:08:42 GMT (envelope-from linimon) Date: Sun, 21 Mar 2010 20:08:42 GMT Message-Id: <201003212008.o2LK8ggU011276@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/144929: [ufs] [lor] vfs_bio.c + ufs_dirhash.c X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Mar 2010 20:08:42 -0000 Old Synopsis: [lor] vfs_bio.c + ufs_dirhash.c New Synopsis: [ufs] [lor] vfs_bio.c + ufs_dirhash.c Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Sun Mar 21 20:08:19 UTC 2010 Responsible-Changed-Why: Take a guess and turn this over to freebsd-fs@. http://www.freebsd.org/cgi/query-pr.cgi?pr=144929 From owner-freebsd-fs@FreeBSD.ORG Sun Mar 21 23:59:29 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 87BE71065677 for ; Sun, 21 Mar 2010 23:59:29 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 37FD68FC14 for ; Sun, 21 Mar 2010 23:59:28 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEACtPpkuDaFvG/2dsb2JhbACbOnO4bIR9BA X-IronPort-AV: E=Sophos;i="4.51,284,1267419600"; d="scan'208";a="69442471" Received: from amazon.cs.uoguelph.ca ([131.104.91.198]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 21 Mar 2010 19:59:28 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by amazon.cs.uoguelph.ca (Postfix) with ESMTP id 2FF71210167; Sun, 21 Mar 2010 19:59:28 -0400 (EDT) X-Virus-Scanned: amavisd-new at amazon.cs.uoguelph.ca Received: from amazon.cs.uoguelph.ca ([127.0.0.1]) by localhost (amazon.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kCuH0rivN19z; Sun, 21 Mar 2010 19:59:27 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by amazon.cs.uoguelph.ca (Postfix) with ESMTP id 748BE210119; Sun, 21 Mar 2010 19:59:27 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id o2M0CMd29736; Sun, 21 Mar 2010 20:12:22 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Sun, 21 Mar 2010 20:12:22 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: Mikolaj Golub In-Reply-To: <86tys9eqo6.fsf@kopusha.onet> Message-ID: References: <201003171120.o2HBK3CV082081@freefall.freebsd.org> <20100317113953.GA14582@icarus.home.lan> <86tys9eqo6.fsf@kopusha.onet> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@FreeBSD.org, Kai Kockro , bug-followup@FreeBSD.org Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Mar 2010 23:59:29 -0000 On Sun, 21 Mar 2010, Mikolaj Golub wrote: [good stuff snipped] > > Athough it might be another issue then reported in this pr :-). > I think it's the same one, since disabling the replay cache made the leak go away. > Reviewing rpc/svc.c:svc_getreq() it looks for me that for RS_DONE case args > are nevere freed. Shouldn't it be like in the attached patch? > Good catch!! It certainly looks like what would have caused the leak to me. Since r_args has not been set to args for that case, svc_freereq() wouldn't free args, just as you observed. Hopefully Jeremy can test this, but I suspect you've found/fixed the culprit. Sorry, I can't remember if you are a committer? (If not, I'll try and get dfr to review it and then get it committed.) Again, good job, rick ps: I was looking for a leak of the copy in the cache and didn't think of the request coming in. From owner-freebsd-fs@FreeBSD.ORG Mon Mar 22 00:08:49 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9B511106564A for ; Mon, 22 Mar 2010 00:08:49 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id 589818FC0C for ; Mon, 22 Mar 2010 00:08:49 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1NtVCI-0000Lc-Ox for freebsd-fs@freebsd.org; Mon, 22 Mar 2010 01:08:46 +0100 Received: from 78-1-190-173.adsl.net.t-com.hr ([78.1.190.173]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 22 Mar 2010 01:08:46 +0100 Received: from ivoras by 78-1-190-173.adsl.net.t-com.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 22 Mar 2010 01:08:46 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Ivan Voras Date: Mon, 22 Mar 2010 01:08:31 +0100 Lines: 9 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: 78-1-190-173.adsl.net.t-com.hr User-Agent: Thunderbird 2.0.0.21 (X11/20090612) Subject: UFS files in a directory limit? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 00:08:49 -0000 hi, What is the limit on the number of files in a directory on UFS? I always thought it is 32,767 (or near it) but now I see several directories on a server with more than 36,000 files (yes it's inefficient, that's not the point). On a similar topic, I presume there are no unexpected problems with increasing vfs.ufs.dirhash_mem to ridiculous amounts like 100 MB? :) From owner-freebsd-fs@FreeBSD.ORG Mon Mar 22 00:10:09 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 877CD1065675; Mon, 22 Mar 2010 00:10:09 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 238118FC27; Mon, 22 Mar 2010 00:10:09 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAK5SpkuDaFvH/2dsb2JhbACbOnO5AoR9BA X-IronPort-AV: E=Sophos;i="4.51,284,1267419600"; d="scan'208";a="69443151" Received: from danube.cs.uoguelph.ca ([131.104.91.199]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 21 Mar 2010 20:10:08 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by danube.cs.uoguelph.ca (Postfix) with ESMTP id 730A41084146; Sun, 21 Mar 2010 20:10:08 -0400 (EDT) X-Virus-Scanned: amavisd-new at danube.cs.uoguelph.ca Received: from danube.cs.uoguelph.ca ([127.0.0.1]) by localhost (danube.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id D3YsOcAqO5ts; Sun, 21 Mar 2010 20:10:08 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by danube.cs.uoguelph.ca (Postfix) with ESMTP id EAD231084138; Sun, 21 Mar 2010 20:10:07 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id o2M0N3501101; Sun, 21 Mar 2010 20:23:03 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Sun, 21 Mar 2010 20:23:02 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: Mikolaj Golub In-Reply-To: <86tys9eqo6.fsf@kopusha.onet> Message-ID: References: <201003171120.o2HBK3CV082081@freefall.freebsd.org> <20100317113953.GA14582@icarus.home.lan> <86tys9eqo6.fsf@kopusha.onet> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: bug-followup@FreeBSD.org, freebsd-fs@FreeBSD.org, Kai Kockro , danny@cs.huji.ac.il Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 00:10:09 -0000 On Sun, 21 Mar 2010, Mikolaj Golub wrote: > > Reviewing rpc/svc.c:svc_getreq() it looks for me that for RS_DONE case args > are nevere freed. Shouldn't it be like in the attached patch? > Oops, I meant to ask Daniel Braniss (not Jeremy) w.r.t testing the patch, since he can easily reproduce the problem. Of course, I'd appreciate anyone who can test it to do so and let us know how it goes. Daniel, here's the patch just in case you didn't see Mikolaj's email. rick Mikolaj's patch: --- sys/rpc/svc.c.orig 2010-03-21 10:17:20.000000000 +0200 +++ sys/rpc/svc.c 2010-03-21 10:20:05.000000000 +0200 @@ -819,6 +819,7 @@ svc_getreq(SVCXPRT *xprt, struct svc_req free(r->rq_addr, M_SONAME); r->rq_addr = NULL; } + m_freem(args); goto call_done; default: From owner-freebsd-fs@FreeBSD.ORG Mon Mar 22 00:12:38 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 17565106564A for ; Mon, 22 Mar 2010 00:12:38 +0000 (UTC) (envelope-from morganw@chemikals.org) Received: from warped.bluecherry.net (unknown [IPv6:2001:440:eeee:fffb::2]) by mx1.freebsd.org (Postfix) with ESMTP id 935A08FC2F for ; Mon, 22 Mar 2010 00:12:37 +0000 (UTC) Received: from volatile.chemikals.org (adsl-67-118-119.shv.bellsouth.net [98.67.118.119]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by warped.bluecherry.net (Postfix) with ESMTPSA id 029AF86D9B31; Sun, 21 Mar 2010 19:12:35 -0500 (CDT) Received: from localhost (morganw@localhost [127.0.0.1]) by volatile.chemikals.org (8.14.4/8.14.4) with ESMTP id o2M0CPeQ001025; Sun, 21 Mar 2010 19:12:25 -0500 (CDT) (envelope-from morganw@chemikals.org) Date: Sun, 21 Mar 2010 19:12:25 -0500 (CDT) From: Wes Morgan X-X-Sender: morganw@volatile To: Baldur Gislason In-Reply-To: <20100321161051.GM63370@gremlin.foo.is> Message-ID: References: <20100317214234.GF63370@gremlin.foo.is> <20100321161051.GM63370@gremlin.foo.is> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: clamav-milter 0.95.3 at warped X-Virus-Status: Clean Cc: freebsd-fs@freebsd.org Subject: Re: Frustration: replace not doing what I expected. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 00:12:38 -0000 On Sun, 21 Mar 2010, Baldur Gislason wrote: > I got it working, what I had to do was to delete a file that the > resilver process reported as being corrupted. Then run a scrub again > and it would upgrade the pool status to healthy. > Good, I'm glad it worked out. Those errors can be extremely frustrating. One thing that I am curious about, though. A single device failure on a raidz1 shouldn't have resulted in corruption. Were you running degraded for a long period of time? You might want to check your other disks if it was a read failure. > Baldur > > On Sun, Mar 21, 2010 at 10:34:44AM -0500, Wes Morgan wrote: > > On Wed, 17 Mar 2010, Baldur Gislason wrote: > > > > > A drive failed in a pool and I had to replace it. > > > I did zpool replace ad18 ad18, the pool resilvered for 5 hours > > > and finished but did not return from degraded mode. > > > I tried removing the cache file and reimporting the pool, no change, it > > > hasn't gotten rid of the old drive which does not exist anymore. > > > > Hmmm. I've successfully replaced a drive that way before, and I'm sure > > many other people have. Did you offline ad18 before doing both the > > physical drive replacement and the zpool replace? I can't recall if that > > is necessary or not. Can you send the relevant output from zpool history? > > > > The "old" device is part of the metadata on the drive labels, so there is > > no way to remove it like you're wanting without either zfs deciding to > > remove it or rewriting the labels by hand. > > > > > > > pool: zirconium > > > state: DEGRADED > > > status: One or more devices has experienced an error resulting in data > > > corruption. Applications may be affected. > > > action: Restore the file in question if possible. Otherwise restore the > > > entire pool from backup. > > > see: http://www.sun.com/msg/ZFS-8000-8A > > > scrub: none requested > > > config: > > > > > > NAME STATE READ WRITE CKSUM > > > zirconium DEGRADED 0 0 0 > > > raidz1 DEGRADED 0 0 0 > > > ad4 ONLINE 0 0 0 > > > ad6 ONLINE 0 0 0 > > > replacing DEGRADED 0 0 0 > > > 2614810928866691230 UNAVAIL 0 962 0 was /dev/ad18/old > > > ad18 ONLINE 0 0 0 > > > ad20 ONLINE 0 0 0 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@FreeBSD.ORG Mon Mar 22 00:20:04 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 56ADC106564A for ; Mon, 22 Mar 2010 00:20:04 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 46BF68FC15 for ; Mon, 22 Mar 2010 00:20:04 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o2M0K4W2025880 for ; Mon, 22 Mar 2010 00:20:04 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o2M0K4Wd025879; Mon, 22 Mar 2010 00:20:04 GMT (envelope-from gnats) Date: Mon, 22 Mar 2010 00:20:04 GMT Message-Id: <201003220020.o2M0K4Wd025879@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Rick Macklem Cc: Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Rick Macklem List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 00:20:04 -0000 The following reply was made to PR kern/144330; it has been noted by GNATS. From: Rick Macklem To: Mikolaj Golub Cc: Jeremy Chadwick , freebsd-fs@FreeBSD.org, Kai Kockro , bug-followup@FreeBSD.org, gerrit@pmp.uni-hannover.de, danny@cs.huji.ac.il Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs Date: Sun, 21 Mar 2010 20:23:02 -0400 (EDT) On Sun, 21 Mar 2010, Mikolaj Golub wrote: > > Reviewing rpc/svc.c:svc_getreq() it looks for me that for RS_DONE case args > are nevere freed. Shouldn't it be like in the attached patch? > Oops, I meant to ask Daniel Braniss (not Jeremy) w.r.t testing the patch, since he can easily reproduce the problem. Of course, I'd appreciate anyone who can test it to do so and let us know how it goes. Daniel, here's the patch just in case you didn't see Mikolaj's email. rick Mikolaj's patch: --- sys/rpc/svc.c.orig 2010-03-21 10:17:20.000000000 +0200 +++ sys/rpc/svc.c 2010-03-21 10:20:05.000000000 +0200 @@ -819,6 +819,7 @@ svc_getreq(SVCXPRT *xprt, struct svc_req free(r->rq_addr, M_SONAME); r->rq_addr = NULL; } + m_freem(args); goto call_done; default: From owner-freebsd-fs@FreeBSD.ORG Mon Mar 22 00:30:10 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0AAD0106564A for ; Mon, 22 Mar 2010 00:30:10 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id EEF678FC1E for ; Mon, 22 Mar 2010 00:30:09 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o2M0U9p4034690 for ; Mon, 22 Mar 2010 00:30:09 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o2M0U91J034682; Mon, 22 Mar 2010 00:30:09 GMT (envelope-from gnats) Date: Mon, 22 Mar 2010 00:30:09 GMT Message-Id: <201003220030.o2M0U91J034682@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Rick Macklem Cc: Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Rick Macklem List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 00:30:10 -0000 The following reply was made to PR kern/144330; it has been noted by GNATS. From: Rick Macklem To: Mikolaj Golub Cc: Jeremy Chadwick , freebsd-fs@FreeBSD.org, Kai Kockro , bug-followup@FreeBSD.org, gerrit@pmp.uni-hannover.de Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs Date: Sun, 21 Mar 2010 20:12:22 -0400 (EDT) On Sun, 21 Mar 2010, Mikolaj Golub wrote: [good stuff snipped] > > Athough it might be another issue then reported in this pr :-). > I think it's the same one, since disabling the replay cache made the leak go away. > Reviewing rpc/svc.c:svc_getreq() it looks for me that for RS_DONE case args > are nevere freed. Shouldn't it be like in the attached patch? > Good catch!! It certainly looks like what would have caused the leak to me. Since r_args has not been set to args for that case, svc_freereq() wouldn't free args, just as you observed. Hopefully Jeremy can test this, but I suspect you've found/fixed the culprit. Sorry, I can't remember if you are a committer? (If not, I'll try and get dfr to review it and then get it committed.) Again, good job, rick ps: I was looking for a leak of the copy in the cache and didn't think of the request coming in. From owner-freebsd-fs@FreeBSD.ORG Mon Mar 22 04:55:59 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AE0CD1065670 for ; Mon, 22 Mar 2010 04:55:59 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [64.81.247.49]) by mx1.freebsd.org (Postfix) with ESMTP id 790938FC1A for ; Mon, 22 Mar 2010 04:55:59 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id o2M4txXr082327; Sun, 21 Mar 2010 21:55:59 -0700 (PDT) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201003220455.o2M4txXr082327@chez.mckusick.com> To: Ivan Voras In-reply-to: Date: Sun, 21 Mar 2010 21:55:59 -0700 From: Kirk McKusick Cc: freebsd-fs@freebsd.org Subject: Re: UFS files in a directory limit? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 04:55:59 -0000 > To: freebsd-fs@freebsd.org > From: Ivan Voras > Date: Mon, 22 Mar 2010 01:08:31 +0100 > Subject: UFS files in a directory limit? > > hi, > > What is the limit on the number of files in a directory on UFS? I always > thought it is 32,767 (or near it) but now I see several directories on a > server with more than 36,000 files (yes it's inefficient, that's not the > point). The only limit on the size of a directory is the number of files that you can have in the filesystem. There is a limit of 2^16 directories within a directory due to the limit on the number of hard links. > On a similar topic, I presume there are no unexpected problems with > increasing vfs.ufs.dirhash_mem to ridiculous amounts like 100 MB? :) The only issue with making vfs.ufs.dirhash_mem very large is that you may exhaust the address space available to your kernel. Kirk McKusick From owner-freebsd-fs@FreeBSD.ORG Mon Mar 22 06:24:11 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EBCB6106564A for ; Mon, 22 Mar 2010 06:24:11 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-bw0-f228.google.com (mail-bw0-f228.google.com [209.85.218.228]) by mx1.freebsd.org (Postfix) with ESMTP id 73EFF8FC13 for ; Mon, 22 Mar 2010 06:24:11 +0000 (UTC) Received: by bwz28 with SMTP id 28so4544324bwz.14 for ; Sun, 21 Mar 2010 23:24:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:from:to:cc:subject :organization:references:date:in-reply-to:message-id:user-agent :mime-version:content-type; bh=tR0HX/I+Z5p40uZD6knekhsv6Ipgi/YWXyz3O9uEjiU=; b=n5A2lLGufIh7IkfT0Jbp+//cpj6ZoUj83pBvHq+2RHb7HVSxdaDUJfBWSGODCoJzC5 HGKPrjl5h3fRKFOrdvMFu9DAaEhasIOpcIWfFcrAk1dpmMR5vVpPDl4pfgGOV8m1nEQ2 hXiQb5bjxt0fpcpDAanrVCxWsu/YdvVrcsWQY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:organization:references:date:in-reply-to :message-id:user-agent:mime-version:content-type; b=e9ZAbQ8XK0lDYucFKPhJFMUzSVzHpbyp2o0bmvPs5sfCevIkykhlVyxSpqaq1KGU2S /TvTGfpHZphe8eoXEh+mCtI/3cCMCU2WP/PQD/NByKXSMFtTLuMcDtZfEDsRErWF84Jl D156tb0ntNGXgm5SxPrt3ZreqaEl3aDbsXUmY= Received: by 10.204.145.23 with SMTP id b23mr1345229bkv.17.1269239050138; Sun, 21 Mar 2010 23:24:10 -0700 (PDT) Received: from localhost (ms.singlescrowd.net [80.85.90.67]) by mx.google.com with ESMTPS id a11sm18174196bkc.21.2010.03.21.23.24.08 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sun, 21 Mar 2010 23:24:09 -0700 (PDT) From: Mikolaj Golub To: Rick Macklem Organization: TOA Ukraine References: <201003220030.o2M0U91J034682@freefall.freebsd.org> Date: Mon, 22 Mar 2010 08:24:07 +0200 In-Reply-To: <201003220030.o2M0U91J034682@freefall.freebsd.org> (Rick Macklem's message of "Mon, 22 Mar 2010 00:30:09 GMT") Message-ID: <86y6hkri0o.fsf@zhuzha.ua1> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-fs@FreeBSD.org Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 06:24:12 -0000 On Mon, 22 Mar 2010 00:30:09 GMT Rick Macklem wrote: RM> Sorry, I can't remember if you are a committer? (If not, I'll try and RM> get dfr to review it and then get it committed.) I am not a committer. So please do this :-) -- Mikolaj Golub From owner-freebsd-fs@FreeBSD.ORG Mon Mar 22 06:30:06 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4FFCC1065675 for ; Mon, 22 Mar 2010 06:30:06 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 3ECAB8FC14 for ; Mon, 22 Mar 2010 06:30:06 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o2M6U58V044762 for ; Mon, 22 Mar 2010 06:30:05 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o2M6U5c0044757; Mon, 22 Mar 2010 06:30:05 GMT (envelope-from gnats) Date: Mon, 22 Mar 2010 06:30:05 GMT Message-Id: <201003220630.o2M6U5c0044757@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Kai Kockro Cc: Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Kai Kockro List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 06:30:06 -0000 The following reply was made to PR kern/144330; it has been noted by GNATS. From: Kai Kockro To: Rick Macklem Cc: Mikolaj Golub , Jeremy Chadwick , freebsd-fs@freebsd.org, bug-followup@freebsd.org, gerrit@pmp.uni-hannover.de, danny@cs.huji.ac.il Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs Date: Mon, 22 Mar 2010 07:26:05 +0100 I will test it tonight on our ZFS Storages. kai Am Montag, 22. M=E4rz 2010 01:23:02 schrieb Rick Macklem: > On Sun, 21 Mar 2010, Mikolaj Golub wrote: > > Reviewing rpc/svc.c:svc_getreq() it looks for me that for RS_DONE case > > args are nevere freed. Shouldn't it be like in the attached patch? >=20 > Oops, I meant to ask Daniel Braniss (not Jeremy) w.r.t testing the patch, > since he can easily reproduce the problem. Of course, I'd appreciate > anyone who can test it to do so and let us know how it goes. >=20 > Daniel, here's the patch just in case you didn't see Mikolaj's email. >=20 > rick > Mikolaj's patch: > --- sys/rpc/svc.c.orig 2010-03-21 10:17:20.000000000 +0200 > +++ sys/rpc/svc.c 2010-03-21 10:20:05.000000000 +0200 > @@ -819,6 +819,7 @@ svc_getreq(SVCXPRT *xprt, struct svc_req > free(r->rq_addr, M_SONAME); > r->rq_addr =3D NULL; > } > + m_freem(args); > goto call_done; >=20 > default: >=20 From owner-freebsd-fs@FreeBSD.ORG Mon Mar 22 06:52:45 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A07D6106566C for ; Mon, 22 Mar 2010 06:52:45 +0000 (UTC) (envelope-from kkockro@web.de) Received: from mail.myphotobook.de (mail.myphotobook.de [85.237.87.140]) by mx1.freebsd.org (Postfix) with ESMTP id DEB398FC16 for ; Mon, 22 Mar 2010 06:52:44 +0000 (UTC) Received: (qmail 66393 invoked by uid 89); 22 Mar 2010 06:25:42 -0000 Received: from unknown (HELO ) (k.kockro@myphotobook.de@87.234.224.68) by mail.myphotobook.de with AES256-SHA encrypted SMTP; 22 Mar 2010 06:25:42 -0000 From: Kai Kockro To: Rick Macklem Date: Mon, 22 Mar 2010 07:26:05 +0100 User-Agent: KMail/1.12.4 (FreeBSD/8.0-STABLE; KDE/4.3.5; amd64; ; ) References: <201003171120.o2HBK3CV082081@freefall.freebsd.org> <86tys9eqo6.fsf@kopusha.onet> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Message-Id: <201003220726.05291.kkockro@web.de> Cc: bug-followup@freebsd.org, freebsd-fs@freebsd.org, danny@cs.huji.ac.il Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 06:52:45 -0000 I will test it tonight on our ZFS Storages. kai Am Montag, 22. M=E4rz 2010 01:23:02 schrieb Rick Macklem: > On Sun, 21 Mar 2010, Mikolaj Golub wrote: > > Reviewing rpc/svc.c:svc_getreq() it looks for me that for RS_DONE case > > args are nevere freed. Shouldn't it be like in the attached patch? >=20 > Oops, I meant to ask Daniel Braniss (not Jeremy) w.r.t testing the patch, > since he can easily reproduce the problem. Of course, I'd appreciate > anyone who can test it to do so and let us know how it goes. >=20 > Daniel, here's the patch just in case you didn't see Mikolaj's email. >=20 > rick > Mikolaj's patch: > --- sys/rpc/svc.c.orig 2010-03-21 10:17:20.000000000 +0200 > +++ sys/rpc/svc.c 2010-03-21 10:20:05.000000000 +0200 > @@ -819,6 +819,7 @@ svc_getreq(SVCXPRT *xprt, struct svc_req > free(r->rq_addr, M_SONAME); > r->rq_addr =3D NULL; > } > + m_freem(args); > goto call_done; >=20 > default: >=20 From owner-freebsd-fs@FreeBSD.ORG Mon Mar 22 11:07:01 2010 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CED0D106564A for ; Mon, 22 Mar 2010 11:07:01 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id A34B78FC22 for ; Mon, 22 Mar 2010 11:07:01 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o2MB71w2015019 for ; Mon, 22 Mar 2010 11:07:01 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o2MB70Cu015017 for freebsd-fs@FreeBSD.org; Mon, 22 Mar 2010 11:07:00 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 22 Mar 2010 11:07:00 GMT Message-Id: <201003221107.o2MB70Cu015017@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 11:07:02 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c o kern/144458 fs [nfs] [patch] nfsd fails as a kld o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144330 fs [nfs] mbuf leakage in nfsd with zfs o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o bin/144214 fs zfsboot fails on gang block after upgrade to zfs v14 o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o kern/143345 fs [ext2fs] [patch] extfs minor header cleanups to better o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142924 fs [ext2fs] [patch] Small cleanup for the inode struct in o kern/142914 fs [zfs] ZFS performance degradation over time o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142401 fs [ntfs] [patch] Minor updates to NTFS from NetBSD o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141950 fs [unionfs] [lor] ufs/unionfs/ufs Lock order reversal o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140433 fs [zfs] [panic] panic while replaying ZIL after crash o kern/140134 fs [msdosfs] write and fsck destroy filesystem integrity o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs o bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139597 fs [patch] [tmpfs] tmpfs initializes va_gen but doesn't u o kern/139564 fs [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/139363 fs [nfs] diskless root nfs mount from non FreeBSD server o kern/138790 fs [zfs] ZFS ceases caching when mem demand is high o kern/138524 fs [msdosfs] disks and usb flashes/cards with Russian lab o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb f kern/137037 fs [zfs] [hang] zfs rollback on root causes FreeBSD to fr o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic o kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135594 fs [zfs] Single dataset unresponsive with Samba o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/133614 fs [panic] panic: ffs_truncate: read-only filesystem o kern/133174 fs [msdosfs] [patch] msdosfs must support utf-encoded int f kern/133150 fs [zfs] Page fault with ZFS on 7.1-RELEASE/amd64 while w o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130979 fs [smbfs] [panic] boot/kernel/smbfs.ko o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130229 fs [iconv] usermount fails on fs that need iconv o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/129059 fs [zfs] [patch] ZFS bootloader whitelistable via WITHOUT f kern/128829 fs smbd(8) causes periodic panic on 7-RELEASE o kern/127420 fs [gjournal] [panic] Journal overflow on gmirrored gjour o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS p kern/124621 fs [ext3] [patch] Cannot mount ext2fs partition f bin/124424 fs [zfs] zfs(8): zfs list -r shows strange snapshots' siz o kern/123939 fs [msdosfs] corrupts new files o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121779 fs [ufs] snapinfo(8) (and related tools?) only work for t o bin/121366 fs [zfs] [patch] Automatic disk scrubbing from periodic(8 o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha f kern/120991 fs [panic] [fs] [snapshot] System crashes when manipulati o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F f kern/119735 fs [zfs] geli + ZFS + samba starting on boot panics 7.0-B o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o bin/118249 fs mv(1): moving a directory changes its mtime o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117314 fs [ntfs] Long-filename only NTFS fs'es cause kernel pani o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o kern/116913 fs [ffs] [panic] ffs_blkfree: freeing free block p kern/116608 fs [msdosfs] [patch] msdosfs fails to check mount options o kern/116583 fs [ffs] [hang] System freezes for short time when using o kern/116170 fs [panic] Kernel panic when mounting /tmp o kern/115645 fs [snapshots] [panic] lockmgr: thread 0xc4c00d80, not ex o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o kern/109024 fs [msdosfs] mount_msdosfs: msdosfs_iconv: Operation not o kern/109010 fs [msdosfs] can't mv directory within fat32 file system o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106030 fs [ufs] [panic] panic in ufs from geom when a dead disk o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [iso9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna f kern/91568 fs [ufs] [panic] writing to UFS/softupdates DVD media in o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o kern/85326 fs [smbfs] [panic] saving a file via samba to an overquot o kern/84589 fs [2TB] 5.4-STABLE unresponsive during background fsck 2 o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/51583 fs [nullfs] [patch] allow to work with devices and socket o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o kern/33464 fs [ufs] soft update inconsistencies after system crash o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 160 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Mar 22 11:09:58 2010 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 16CF51065675; Mon, 22 Mar 2010 11:09:58 +0000 (UTC) (envelope-from danny@cs.huji.ac.il) Received: from kabab.cs.huji.ac.il (kabab.cs.huji.ac.il [132.65.16.84]) by mx1.freebsd.org (Postfix) with ESMTP id B62328FC2C; Mon, 22 Mar 2010 11:09:57 +0000 (UTC) Received: from pampa.cs.huji.ac.il ([132.65.80.32]) by kabab.cs.huji.ac.il with esmtp id 1NtfW6-0008E7-9q; Mon, 22 Mar 2010 13:09:54 +0200 X-Mailer: exmh version 2.7.2 01/07/2005 with nmh-1.2 To: Rick Macklem In-reply-to: References: <201003171120.o2HBK3CV082081@freefall.freebsd.org> <20100317113953.GA14582@icarus.home.lan> <86tys9eqo6.fsf@kopusha.onet> Comments: In-reply-to Rick Macklem message dated "Sun, 21 Mar 2010 20:23:02 -0400." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Mon, 22 Mar 2010 13:09:53 +0200 From: Daniel Braniss Message-ID: Cc: bug-followup@FreeBSD.org, freebsd-fs@FreeBSD.org, Kai Kockro Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 11:09:58 -0000 > > > On Sun, 21 Mar 2010, Mikolaj Golub wrote: > > > > > Reviewing rpc/svc.c:svc_getreq() it looks for me that for RS_DONE case args > > are nevere freed. Shouldn't it be like in the attached patch? > > > Oops, I meant to ask Daniel Braniss (not Jeremy) w.r.t testing the patch, > since he can easily reproduce the problem. Of course, I'd appreciate > anyone who can test it to do so and let us know how it goes. > > Daniel, here's the patch just in case you didn't see Mikolaj's email. > > rick > Mikolaj's patch: > --- sys/rpc/svc.c.orig 2010-03-21 10:17:20.000000000 +0200 > +++ sys/rpc/svc.c 2010-03-21 10:20:05.000000000 +0200 > @@ -819,6 +819,7 @@ svc_getreq(SVCXPRT *xprt, struct svc_req > free(r->rq_addr, M_SONAME); > r->rq_addr = NULL; > } > + m_freem(args); > goto call_done; > > default: well, it's much better!, but no cookies yet :-) from comparing graphs in ftp://ftp.cs.huji.ac.il/users/danny/freebsd/mbuf-leak/ store-01-e.ps: a production server running newfsd - now up almost 20 days notice that the average used mbuf is below 1000! store-02.ps: kernel without last patch, classic nfsd the leak is huge. store-02++.ps: with latest patch the leak is much smaller but I see 2 issues: - the initial leap to over 2000, then a smaller leak. could someone explain replay_prune() to me? cheers, danny From owner-freebsd-fs@FreeBSD.ORG Mon Mar 22 11:20:03 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 708CE1065672 for ; Mon, 22 Mar 2010 11:20:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 59F4B8FC12 for ; Mon, 22 Mar 2010 11:20:03 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o2MBK3Wb027084 for ; Mon, 22 Mar 2010 11:20:03 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o2MBK3iK027083; Mon, 22 Mar 2010 11:20:03 GMT (envelope-from gnats) Date: Mon, 22 Mar 2010 11:20:03 GMT Message-Id: <201003221120.o2MBK3iK027083@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Daniel Braniss Cc: Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Daniel Braniss List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 11:20:03 -0000 The following reply was made to PR kern/144330; it has been noted by GNATS. From: Daniel Braniss To: Rick Macklem Cc: Mikolaj Golub , Jeremy Chadwick , freebsd-fs@FreeBSD.org, Kai Kockro , bug-followup@FreeBSD.org, gerrit@pmp.uni-hannover.de Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs Date: Mon, 22 Mar 2010 13:09:53 +0200 > > > On Sun, 21 Mar 2010, Mikolaj Golub wrote: > > > > > Reviewing rpc/svc.c:svc_getreq() it looks for me that for RS_DONE case args > > are nevere freed. Shouldn't it be like in the attached patch? > > > Oops, I meant to ask Daniel Braniss (not Jeremy) w.r.t testing the patch, > since he can easily reproduce the problem. Of course, I'd appreciate > anyone who can test it to do so and let us know how it goes. > > Daniel, here's the patch just in case you didn't see Mikolaj's email. > > rick > Mikolaj's patch: > --- sys/rpc/svc.c.orig 2010-03-21 10:17:20.000000000 +0200 > +++ sys/rpc/svc.c 2010-03-21 10:20:05.000000000 +0200 > @@ -819,6 +819,7 @@ svc_getreq(SVCXPRT *xprt, struct svc_req > free(r->rq_addr, M_SONAME); > r->rq_addr = NULL; > } > + m_freem(args); > goto call_done; > > default: well, it's much better!, but no cookies yet :-) from comparing graphs in ftp://ftp.cs.huji.ac.il/users/danny/freebsd/mbuf-leak/ store-01-e.ps: a production server running newfsd - now up almost 20 days notice that the average used mbuf is below 1000! store-02.ps: kernel without last patch, classic nfsd the leak is huge. store-02++.ps: with latest patch the leak is much smaller but I see 2 issues: - the initial leap to over 2000, then a smaller leak. could someone explain replay_prune() to me? cheers, danny From owner-freebsd-fs@FreeBSD.ORG Mon Mar 22 13:51:53 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 724F5106567A; Mon, 22 Mar 2010 13:51:53 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 05A1E8FC14; Mon, 22 Mar 2010 13:51:52 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAN4Sp0uDaFvK/2dsb2JhbACbLHO6QoR9BI5T X-IronPort-AV: E=Sophos;i="4.51,287,1267419600"; d="scan'208";a="69493788" Received: from fraser.cs.uoguelph.ca ([131.104.91.202]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 22 Mar 2010 09:51:51 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by fraser.cs.uoguelph.ca (Postfix) with ESMTP id 32FA7109C2DC; Mon, 22 Mar 2010 09:51:51 -0400 (EDT) X-Virus-Scanned: amavisd-new at fraser.cs.uoguelph.ca Received: from fraser.cs.uoguelph.ca ([127.0.0.1]) by localhost (fraser.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4DunYG72kxcn; Mon, 22 Mar 2010 09:51:50 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by fraser.cs.uoguelph.ca (Postfix) with ESMTP id 59877109C27F; Mon, 22 Mar 2010 09:51:50 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id o2ME4kt14007; Mon, 22 Mar 2010 10:04:46 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Mon, 22 Mar 2010 10:04:46 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: Daniel Braniss In-Reply-To: Message-ID: References: <201003171120.o2HBK3CV082081@freefall.freebsd.org> <20100317113953.GA14582@icarus.home.lan> <86tys9eqo6.fsf@kopusha.onet> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: bug-followup@FreeBSD.org, freebsd-fs@FreeBSD.org, Kai Kockro Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 13:51:53 -0000 On Mon, 22 Mar 2010, Daniel Braniss wrote: > > well, it's much better!, but no cookies yet :-) > Well, that's good news. I'll try and get dfr to review it and then commit it. Thanks Mikolaj, for finding this. > from comparing graphs in > ftp://ftp.cs.huji.ac.il/users/danny/freebsd/mbuf-leak/ > store-01-e.ps: a production server running newfsd - now up almost 20 days > notice that the average used mbuf is below 1000! > > store-02.ps: kernel without last patch, classic nfsd > the leak is huge. > > store-02++.ps: with latest patch > the leak is much smaller but I see 2 issues: > - the initial leap to over 2000, then a smaller leak. The initial leap doesn't worry me. That's just a design constraint. A slow leak after that is still a problem. (I might have seen the slow leak in testing here. I'll poke at it and see if I can reproduce that.) > > could someone explain replay_prune() to me? > I just looked at it and I think it does the following: - when it thinks the cache is too big (either too many entries or too much mbuf data) it loops around until: - no longer too much or can't free any more (when an entry is free'd, rc_size and rc_count are reduced) (the loop is from the end of the tailq, so it is freeing the least recently used entries) - the test for rce_repmsg.rm_xid != 0 avoids freeing ones that are in progress, since rce_repmsg is all zeroed until the reply has been generated I did notice that the call to replay_prune() from replay_setsize() does not lock the mutex before calling it, so it doesn't look smp safe to me for this case, but I doubt that would cause a slow leak. (I think this is only called when the number of mbuf clusters in the kernel changes and might cause a kernel crash if the tailq wasn't in a consistent state as it rattled through the list in the loop.) rick From owner-freebsd-fs@FreeBSD.ORG Mon Mar 22 14:00:13 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 56A3E106566B for ; Mon, 22 Mar 2010 14:00:13 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 2C13A8FC13 for ; Mon, 22 Mar 2010 14:00:13 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o2ME0CsR065997 for ; Mon, 22 Mar 2010 14:00:12 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o2ME0CsC065996; Mon, 22 Mar 2010 14:00:12 GMT (envelope-from gnats) Date: Mon, 22 Mar 2010 14:00:12 GMT Message-Id: <201003221400.o2ME0CsC065996@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Rick Macklem Cc: Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Rick Macklem List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 14:00:13 -0000 The following reply was made to PR kern/144330; it has been noted by GNATS. From: Rick Macklem To: Daniel Braniss Cc: Mikolaj Golub , Jeremy Chadwick , freebsd-fs@FreeBSD.org, Kai Kockro , bug-followup@FreeBSD.org, gerrit@pmp.uni-hannover.de Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs Date: Mon, 22 Mar 2010 10:04:46 -0400 (EDT) On Mon, 22 Mar 2010, Daniel Braniss wrote: > > well, it's much better!, but no cookies yet :-) > Well, that's good news. I'll try and get dfr to review it and then commit it. Thanks Mikolaj, for finding this. > from comparing graphs in > ftp://ftp.cs.huji.ac.il/users/danny/freebsd/mbuf-leak/ > store-01-e.ps: a production server running newfsd - now up almost 20 days > notice that the average used mbuf is below 1000! > > store-02.ps: kernel without last patch, classic nfsd > the leak is huge. > > store-02++.ps: with latest patch > the leak is much smaller but I see 2 issues: > - the initial leap to over 2000, then a smaller leak. The initial leap doesn't worry me. That's just a design constraint. A slow leak after that is still a problem. (I might have seen the slow leak in testing here. I'll poke at it and see if I can reproduce that.) > > could someone explain replay_prune() to me? > I just looked at it and I think it does the following: - when it thinks the cache is too big (either too many entries or too much mbuf data) it loops around until: - no longer too much or can't free any more (when an entry is free'd, rc_size and rc_count are reduced) (the loop is from the end of the tailq, so it is freeing the least recently used entries) - the test for rce_repmsg.rm_xid != 0 avoids freeing ones that are in progress, since rce_repmsg is all zeroed until the reply has been generated I did notice that the call to replay_prune() from replay_setsize() does not lock the mutex before calling it, so it doesn't look smp safe to me for this case, but I doubt that would cause a slow leak. (I think this is only called when the number of mbuf clusters in the kernel changes and might cause a kernel crash if the tailq wasn't in a consistent state as it rattled through the list in the loop.) rick From owner-freebsd-fs@FreeBSD.ORG Mon Mar 22 14:51:07 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A3C861065678; Mon, 22 Mar 2010 14:51:07 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 722188FC08; Mon, 22 Mar 2010 14:51:07 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 218FE46B7E; Mon, 22 Mar 2010 10:51:07 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPA id 40B748A021; Mon, 22 Mar 2010 10:51:06 -0400 (EDT) From: John Baldwin To: Rick Macklem Date: Mon, 22 Mar 2010 09:46:57 -0400 User-Agent: KMail/1.12.1 (FreeBSD/7.3-CBSD-20100217; KDE/4.3.1; amd64; ; ) References: <4BA3613F.4070606@comcast.net> <4BA432C8.4040707@comcast.net> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201003220946.57087.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Mon, 22 Mar 2010 10:51:06 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.7 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: freebsd-fs@freebsd.org, bseklecki@noc.cfi.pgh.pa.us, User Questions Subject: Re: FreeBSD NFS client goes into infinite retry loop X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 14:51:07 -0000 On Friday 19 March 2010 11:27:13 pm Rick Macklem wrote: > > On Fri, 19 Mar 2010, Steve Polyack wrote: > > [good stuff snipped] > > > > This makes sense. According to wireshark, the server is indeed transmitting > > "Status: NFS3ERR_IO (5)". Perhaps this should be STALE instead; it sounds > > more correct than marking it a general IO error. Also, the NFS server is > > serving its share off of a ZFS filesystem, if it makes any difference. I > > suppose ZFS could be talking to the NFS server threads with some mismatched > > language, but I doubt it. > > > Ok, now I think we're making progress. If VFS_FHTOVP() doesn't return > ESTALE when the file no longer exists, the NFS server returns whatever > error it has returned. > > So, either VFS_FHTOVP() succeeds after the file has been deleted, which > would be a problem that needs to be fixed within ZFS > OR > ZFS returns an error other than ESTALE when it doesn't exist. > > Try the following patch on the server (which just makes any error > returned by VFS_FHTOVP() into ESTALE) and see if that helps. > > --- nfsserver/nfs_srvsubs.c.sav 2010-03-19 22:06:43.000000000 -0400 > +++ nfsserver/nfs_srvsubs.c 2010-03-19 22:07:22.000000000 -0400 > @@ -1127,6 +1127,8 @@ > } > } > error = VFS_FHTOVP(mp, &fhp->fh_fid, vpp); > + if (error != 0) > + error = ESTALE; > vfs_unbusy(mp); > if (error) > goto out; > > Please let me know if the patch helps, rick I can confirm that ZFS's FHTOVP() method never returns ESTALE. Perhaps this patch would fix it? It changes zfs_fhtovp() to return ESTALE if the generation count doesn't match. If this doesn't help, you can try changing some of the other return cases in this function to ESTALE (many use EINVAL) until you find the one that matches this condition. Index: zfs_vfsops.c =================================================================== --- zfs_vfsops.c (revision 205334) +++ zfs_vfsops.c (working copy) @@ -1256,7 +1256,7 @@ dprintf("znode gen (%u) != fid gen (%u)\n", zp_gen, fid_gen); VN_RELE(ZTOV(zp)); ZFS_EXIT(zfsvfs); - return (EINVAL); + return (ESTALE); } *vpp = ZTOV(zp); -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Mon Mar 22 14:52:46 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 909D71065670 for ; Mon, 22 Mar 2010 14:52:46 +0000 (UTC) (envelope-from korvus@comcast.net) Received: from qmta08.westchester.pa.mail.comcast.net (qmta08.westchester.pa.mail.comcast.net [76.96.62.80]) by mx1.freebsd.org (Postfix) with ESMTP id 39F548FC1B for ; Mon, 22 Mar 2010 14:52:45 +0000 (UTC) Received: from omta14.westchester.pa.mail.comcast.net ([76.96.62.60]) by qmta08.westchester.pa.mail.comcast.net with comcast id wAvu1d0061HzFnQ58Esle3; Mon, 22 Mar 2010 14:52:45 +0000 Received: from [10.0.0.51] ([71.199.122.142]) by omta14.westchester.pa.mail.comcast.net with comcast id wEsk1d00D34Sj4f3aEskB9; Mon, 22 Mar 2010 14:52:45 +0000 Message-ID: <4BA78444.4040707@comcast.net> Date: Mon, 22 Mar 2010 10:52:52 -0400 From: Steve Polyack User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.1.8) Gecko/20100227 Lightning/1.0b1 Thunderbird/3.0.3 MIME-Version: 1.0 To: Rick Macklem References: <4BA3613F.4070606@comcast.net> <201003190831.00950.jhb@freebsd.org> <4BA37AE9.4060806@comcast.net> <4BA392B1.4050107@comcast.net> <4BA3DEBC.2000608@comcast.net> <4BA432C8.4040707@comcast.net> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, bseklecki@noc.cfi.pgh.pa.us, User Questions Subject: Re: FreeBSD NFS client goes into infinite retry loop X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 14:52:46 -0000 On 3/19/2010 11:27 PM, Rick Macklem wrote: > > > On Fri, 19 Mar 2010, Steve Polyack wrote: > > [good stuff snipped] >> >> This makes sense. According to wireshark, the server is indeed >> transmitting "Status: NFS3ERR_IO (5)". Perhaps this should be STALE >> instead; it sounds more correct than marking it a general IO error. >> Also, the NFS server is serving its share off of a ZFS filesystem, if >> it makes any difference. I suppose ZFS could be talking to the NFS >> server threads with some mismatched language, but I doubt it. >> > Ok, now I think we're making progress. If VFS_FHTOVP() doesn't return > ESTALE when the file no longer exists, the NFS server returns whatever > error it has returned. > > So, either VFS_FHTOVP() succeeds after the file has been deleted, which > would be a problem that needs to be fixed within ZFS > OR > ZFS returns an error other than ESTALE when it doesn't exist. > > Try the following patch on the server (which just makes any error > returned by VFS_FHTOVP() into ESTALE) and see if that helps. > > --- nfsserver/nfs_srvsubs.c.sav 2010-03-19 22:06:43.000000000 -0400 > +++ nfsserver/nfs_srvsubs.c 2010-03-19 22:07:22.000000000 -0400 > @@ -1127,6 +1127,8 @@ > } > } > error = VFS_FHTOVP(mp, &fhp->fh_fid, vpp); > + if (error != 0) > + error = ESTALE; > vfs_unbusy(mp); > if (error) > goto out; > > Please let me know if the patch helps, rick > > The patch seems to fix the bad behavior. Running with the patch, I see the following output from my patch (return code of nfs_doio from within nfsiod): nfssvc_iod: iod 0 nfs_doio returned errno: 70 Furthermore, when inspecting the transaction with Wireshark, after deleting the file on the NFS server it looks like there is only a single error. This time there it is a reply to a V3 Lookup call that contains a status of "NFS3ERR_NOENT (2)" coming from the NFS server. The client also does not repeatedly try to complete the failed request. Any suggestions on the next step here? Based on what you said it looks like ZFS is falsely reporting an IO error to VFS instead of ESTALE / NOENT. I tried looking around zfs_fhtovp() and only saw returns of EINVAL, but I'm not even sure I'm looking in the right place. From owner-freebsd-fs@FreeBSD.ORG Mon Mar 22 15:04:44 2010 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F0D8F1065670; Mon, 22 Mar 2010 15:04:43 +0000 (UTC) (envelope-from danny@cs.huji.ac.il) Received: from kabab.cs.huji.ac.il (kabab.cs.huji.ac.il [132.65.16.84]) by mx1.freebsd.org (Postfix) with ESMTP id A115F8FC26; Mon, 22 Mar 2010 15:04:43 +0000 (UTC) Received: from pampa.cs.huji.ac.il ([132.65.80.32]) by kabab.cs.huji.ac.il with esmtp id 1NtjBJ-000AyL-B5; Mon, 22 Mar 2010 17:04:41 +0200 X-Mailer: exmh version 2.7.2 01/07/2005 with nmh-1.2 To: Rick Macklem In-reply-to: References: <201003171120.o2HBK3CV082081@freefall.freebsd.org> <20100317113953.GA14582@icarus.home.lan> <86tys9eqo6.fsf@kopusha.onet> Comments: In-reply-to Rick Macklem message dated "Mon, 22 Mar 2010 10:04:46 -0400." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Mon, 22 Mar 2010 17:04:40 +0200 From: Daniel Braniss Message-ID: Cc: bug-followup@FreeBSD.org, freebsd-fs@FreeBSD.org, Kai Kockro Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 15:04:44 -0000 > > > On Mon, 22 Mar 2010, Daniel Braniss wrote: > > > > > well, it's much better!, but no cookies yet :-) > > > > Well, that's good news. I'll try and get dfr to review it and then > commit it. Thanks Mikolaj, for finding this. > > > from comparing graphs in > > ftp://ftp.cs.huji.ac.il/users/danny/freebsd/mbuf-leak/ > > store-01-e.ps: a production server running newfsd - now up almost 20 days > > notice that the average used mbuf is below 1000! > > > > store-02.ps: kernel without last patch, classic nfsd > > the leak is huge. > > > > store-02++.ps: with latest patch > > the leak is much smaller but I see 2 issues: > > - the initial leap to over 2000, then a smaller leak. > > The initial leap doesn't worry me. That's just a design constraint. yes, but new-nsfd does it better. > A slow leak after that is still a problem. (I might have seen the > slow leak in testing here. I'll poke at it and see if I can reproduce > that.) all I do is mount upd on a client and start a write process. > > > > > could someone explain replay_prune() to me? > > > I just looked at it and I think it does the following: > - when it thinks the cache is too big (either too many entries > or too much mbuf data) it loops around until: > - no longer too much or can't free any more > (when an entry is free'd, rc_size and rc_count are > reduced) > (the loop is from the end of the tailq, so it is freeing > the least recently used entries) > - the test for rce_repmsg.rm_xid != 0 avoids freeing ones > that are in progress, since rce_repmsg is all zeroed until > the reply has been generated thanks for the information, it's what i thought, but the coding made it look as something else could happen - why else start the search of the queue after each match?> > I did notice that the call to replay_prune() from replay_setsize() does > not lock the mutex before calling it, so it doesn't look smp safe to me > for this case, but I doubt that would cause a slow leak. (I think this is > only called when the number of mbuf clusters in the kernel changes and > might cause a kernel crash if the tailq wasn't in a consistent state as > it rattled through the list in the loop.) > there seems to be an NFSLOCK involved before calling replay_setsize ... well, the server is a 2 cpu quad nehalem, so maybe I should try several clients ... > rick > btw, the new-nfsd has been running on a production server for almost 20 days and all seeems fine. anyways, things are looking better, cheers, danny From owner-freebsd-fs@FreeBSD.ORG Mon Mar 22 15:10:03 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BED3B106564A for ; Mon, 22 Mar 2010 15:10:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 92C148FC0C for ; Mon, 22 Mar 2010 15:10:03 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o2MFA3Rw024923 for ; Mon, 22 Mar 2010 15:10:03 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o2MFA3Ft024915; Mon, 22 Mar 2010 15:10:03 GMT (envelope-from gnats) Date: Mon, 22 Mar 2010 15:10:03 GMT Message-Id: <201003221510.o2MFA3Ft024915@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Daniel Braniss Cc: Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Daniel Braniss List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 15:10:03 -0000 The following reply was made to PR kern/144330; it has been noted by GNATS. From: Daniel Braniss To: Rick Macklem Cc: Mikolaj Golub , Jeremy Chadwick , freebsd-fs@FreeBSD.org, Kai Kockro , bug-followup@FreeBSD.org, gerrit@pmp.uni-hannover.de Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs Date: Mon, 22 Mar 2010 17:04:40 +0200 > > > On Mon, 22 Mar 2010, Daniel Braniss wrote: > > > > > well, it's much better!, but no cookies yet :-) > > > > Well, that's good news. I'll try and get dfr to review it and then > commit it. Thanks Mikolaj, for finding this. > > > from comparing graphs in > > ftp://ftp.cs.huji.ac.il/users/danny/freebsd/mbuf-leak/ > > store-01-e.ps: a production server running newfsd - now up almost 20 days > > notice that the average used mbuf is below 1000! > > > > store-02.ps: kernel without last patch, classic nfsd > > the leak is huge. > > > > store-02++.ps: with latest patch > > the leak is much smaller but I see 2 issues: > > - the initial leap to over 2000, then a smaller leak. > > The initial leap doesn't worry me. That's just a design constraint. yes, but new-nsfd does it better. > A slow leak after that is still a problem. (I might have seen the > slow leak in testing here. I'll poke at it and see if I can reproduce > that.) all I do is mount upd on a client and start a write process. > > > > > could someone explain replay_prune() to me? > > > I just looked at it and I think it does the following: > - when it thinks the cache is too big (either too many entries > or too much mbuf data) it loops around until: > - no longer too much or can't free any more > (when an entry is free'd, rc_size and rc_count are > reduced) > (the loop is from the end of the tailq, so it is freeing > the least recently used entries) > - the test for rce_repmsg.rm_xid != 0 avoids freeing ones > that are in progress, since rce_repmsg is all zeroed until > the reply has been generated thanks for the information, it's what i thought, but the coding made it look as something else could happen - why else start the search of the queue after each match?> > I did notice that the call to replay_prune() from replay_setsize() does > not lock the mutex before calling it, so it doesn't look smp safe to me > for this case, but I doubt that would cause a slow leak. (I think this is > only called when the number of mbuf clusters in the kernel changes and > might cause a kernel crash if the tailq wasn't in a consistent state as > it rattled through the list in the loop.) > there seems to be an NFSLOCK involved before calling replay_setsize ... well, the server is a 2 cpu quad nehalem, so maybe I should try several clients ... > rick > btw, the new-nfsd has been running on a production server for almost 20 days and all seeems fine. anyways, things are looking better, cheers, danny From owner-freebsd-fs@FreeBSD.ORG Mon Mar 22 15:47:46 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 370151065675 for ; Mon, 22 Mar 2010 15:47:46 +0000 (UTC) (envelope-from korvus@comcast.net) Received: from mx04.pub.collaborativefusion.com (mx04.pub.collaborativefusion.com [206.210.72.84]) by mx1.freebsd.org (Postfix) with ESMTP id F01B48FC0A for ; Mon, 22 Mar 2010 15:47:45 +0000 (UTC) Received: from [192.168.2.164] ([206.210.89.202]) by mx04.pub.collaborativefusion.com (StrongMail Enterprise 4.1.1.4(4.1.1.4-47689)); Mon, 22 Mar 2010 12:02:43 -0400 X-VirtualServerGroup: Default X-MailingID: 00000::00000::00000::00000::::1476 X-SMHeaderMap: mid="X-MailingID" X-Destination-ID: freebsd-fs@freebsd.org X-SMFBL: ZnJlZWJzZC1mc0BmcmVlYnNkLm9yZw== Message-ID: <4BA7911F.5060905@comcast.net> Date: Mon, 22 Mar 2010 11:47:43 -0400 From: Steve Polyack User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.7) Gecko/20100311 Thunderbird/3.0.1 MIME-Version: 1.0 To: John Baldwin References: <4BA3613F.4070606@comcast.net> <201003190831.00950.jhb@freebsd.org> <4BA37AE9.4060806@comcast.net> <4BA392B1.4050107@comcast.net> <4BA3DEBC.2000608@comcast.net> <4BA432C8.4040707@comcast.net> <4BA78444.4040707@comcast.net> In-Reply-To: <4BA78444.4040707@comcast.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, User Questions , bseklecki@noc.cfi.pgh.pa.us Subject: Re: FreeBSD NFS client goes into infinite retry loop X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 15:47:46 -0000 On 03/22/10 10:52, Steve Polyack wrote: > On 3/19/2010 11:27 PM, Rick Macklem wrote: >> On Fri, 19 Mar 2010, Steve Polyack wrote: >> >> [good stuff snipped] >>> >>> This makes sense. According to wireshark, the server is indeed >>> transmitting "Status: NFS3ERR_IO (5)". Perhaps this should be STALE >>> instead; it sounds more correct than marking it a general IO error. >>> Also, the NFS server is serving its share off of a ZFS filesystem, >>> if it makes any difference. I suppose ZFS could be talking to the >>> NFS server threads with some mismatched language, but I doubt it. >>> >> Ok, now I think we're making progress. If VFS_FHTOVP() doesn't return >> ESTALE when the file no longer exists, the NFS server returns whatever >> error it has returned. >> >> So, either VFS_FHTOVP() succeeds after the file has been deleted, which >> would be a problem that needs to be fixed within ZFS >> OR >> ZFS returns an error other than ESTALE when it doesn't exist. >> >> Try the following patch on the server (which just makes any error >> returned by VFS_FHTOVP() into ESTALE) and see if that helps. >> >> --- nfsserver/nfs_srvsubs.c.sav 2010-03-19 22:06:43.000000000 -0400 >> +++ nfsserver/nfs_srvsubs.c 2010-03-19 22:07:22.000000000 -0400 >> @@ -1127,6 +1127,8 @@ >> } >> } >> error = VFS_FHTOVP(mp, &fhp->fh_fid, vpp); >> + if (error != 0) >> + error = ESTALE; >> vfs_unbusy(mp); >> if (error) >> goto out; >> >> Please let me know if the patch helps, rick >> >> > The patch seems to fix the bad behavior. Running with the patch, I > see the following output from my patch (return code of nfs_doio from > within nfsiod): > nfssvc_iod: iod 0 nfs_doio returned errno: 70 > > Furthermore, when inspecting the transaction with Wireshark, after > deleting the file on the NFS server it looks like there is only a > single error. This time there it is a reply to a V3 Lookup call that > contains a status of "NFS3ERR_NOENT (2)" coming from the NFS server. > The client also does not repeatedly try to complete the failed request. > > Any suggestions on the next step here? Based on what you said it > looks like ZFS is falsely reporting an IO error to VFS instead of > ESTALE / NOENT. I tried looking around zfs_fhtovp() and only saw > returns of EINVAL, but I'm not even sure I'm looking in the right place. Further on down the rabbit hole... here's the piece in zfs_fhtovp() where it's kicking out EINVAL instead of ESTALE - the following patch corrects the behavior, but of course also suggests further digging within the zfs_zget() function to ensure that _it_ is returning the correct thing and whether or not it needs to be handled there or within zfs_fhtovp(). --- src-orig/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c 2010-03-22 11:41:21.000000000 -0400 +++ src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c 2010-03-22 16:25:21.000000000 -0400 @@ -1246,7 +1246,7 @@ dprintf("getting %llu [%u mask %llx]\n", object, fid_gen, gen_mask); if (err = zfs_zget(zfsvfs, object, &zp)) { ZFS_EXIT(zfsvfs); - return (err); + return (ESTALE); } zp_gen = zp->z_phys->zp_gen & gen_mask; if (zp_gen == 0) From owner-freebsd-fs@FreeBSD.ORG Mon Mar 22 16:02:45 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2F53F106566C; Mon, 22 Mar 2010 16:02:45 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id DDE258FC08; Mon, 22 Mar 2010 16:02:44 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 76A6D46B2D; Mon, 22 Mar 2010 12:02:44 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPA id AE9ED8A021; Mon, 22 Mar 2010 12:02:43 -0400 (EDT) From: John Baldwin To: Steve Polyack Date: Mon, 22 Mar 2010 12:00:41 -0400 User-Agent: KMail/1.12.1 (FreeBSD/7.3-CBSD-20100217; KDE/4.3.1; amd64; ; ) References: <4BA3613F.4070606@comcast.net> <4BA78444.4040707@comcast.net> <4BA7911F.5060905@comcast.net> In-Reply-To: <4BA7911F.5060905@comcast.net> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201003221200.41607.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Mon, 22 Mar 2010 12:02:43 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.7 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: freebsd-fs@freebsd.org, User Questions , bseklecki@noc.cfi.pgh.pa.us Subject: Re: FreeBSD NFS client goes into infinite retry loop X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 16:02:45 -0000 On Monday 22 March 2010 11:47:43 am Steve Polyack wrote: > On 03/22/10 10:52, Steve Polyack wrote: > > On 3/19/2010 11:27 PM, Rick Macklem wrote: > >> On Fri, 19 Mar 2010, Steve Polyack wrote: > >> > >> [good stuff snipped] > >>> > >>> This makes sense. According to wireshark, the server is indeed > >>> transmitting "Status: NFS3ERR_IO (5)". Perhaps this should be STALE > >>> instead; it sounds more correct than marking it a general IO error. > >>> Also, the NFS server is serving its share off of a ZFS filesystem, > >>> if it makes any difference. I suppose ZFS could be talking to the > >>> NFS server threads with some mismatched language, but I doubt it. > >>> > >> Ok, now I think we're making progress. If VFS_FHTOVP() doesn't return > >> ESTALE when the file no longer exists, the NFS server returns whatever > >> error it has returned. > >> > >> So, either VFS_FHTOVP() succeeds after the file has been deleted, which > >> would be a problem that needs to be fixed within ZFS > >> OR > >> ZFS returns an error other than ESTALE when it doesn't exist. > >> > >> Try the following patch on the server (which just makes any error > >> returned by VFS_FHTOVP() into ESTALE) and see if that helps. > >> > >> --- nfsserver/nfs_srvsubs.c.sav 2010-03-19 22:06:43.000000000 -0400 > >> +++ nfsserver/nfs_srvsubs.c 2010-03-19 22:07:22.000000000 -0400 > >> @@ -1127,6 +1127,8 @@ > >> } > >> } > >> error = VFS_FHTOVP(mp, &fhp->fh_fid, vpp); > >> + if (error != 0) > >> + error = ESTALE; > >> vfs_unbusy(mp); > >> if (error) > >> goto out; > >> > >> Please let me know if the patch helps, rick > >> > >> > > The patch seems to fix the bad behavior. Running with the patch, I > > see the following output from my patch (return code of nfs_doio from > > within nfsiod): > > nfssvc_iod: iod 0 nfs_doio returned errno: 70 > > > > Furthermore, when inspecting the transaction with Wireshark, after > > deleting the file on the NFS server it looks like there is only a > > single error. This time there it is a reply to a V3 Lookup call that > > contains a status of "NFS3ERR_NOENT (2)" coming from the NFS server. > > The client also does not repeatedly try to complete the failed request. > > > > Any suggestions on the next step here? Based on what you said it > > looks like ZFS is falsely reporting an IO error to VFS instead of > > ESTALE / NOENT. I tried looking around zfs_fhtovp() and only saw > > returns of EINVAL, but I'm not even sure I'm looking in the right place. > > Further on down the rabbit hole... here's the piece in zfs_fhtovp() > where it's kicking out EINVAL instead of ESTALE - the following patch > corrects the behavior, but of course also suggests further digging > within the zfs_zget() function to ensure that _it_ is returning the > correct thing and whether or not it needs to be handled there or within > zfs_fhtovp(). > > --- > src-orig/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c > 2010-03-22 11:41:21.000000000 -0400 > +++ src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c > 2010-03-22 16:25:21.000000000 -0400 > @@ -1246,7 +1246,7 @@ > dprintf("getting %llu [%u mask %llx]\n", object, fid_gen, gen_mask); > if (err = zfs_zget(zfsvfs, object, &zp)) { > ZFS_EXIT(zfsvfs); > - return (err); > + return (ESTALE); > } > zp_gen = zp->z_phys->zp_gen & gen_mask; > if (zp_gen == 0) So the odd thing here is that ffs_fhtovp() doesn't return ESTALE if VFS_VGET() (which calls ffs_vget()) fails, it only returns ESTALE if the generation count doesn't matter. -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Mon Mar 22 16:44:06 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7A7131065676 for ; Mon, 22 Mar 2010 16:44:06 +0000 (UTC) (envelope-from korvus@comcast.net) Received: from mx04.pub.collaborativefusion.com (mx04.pub.collaborativefusion.com [206.210.72.84]) by mx1.freebsd.org (Postfix) with ESMTP id 419F98FC12 for ; Mon, 22 Mar 2010 16:44:06 +0000 (UTC) Received: from [192.168.2.164] ([206.210.89.202]) by mx04.pub.collaborativefusion.com (StrongMail Enterprise 4.1.1.4(4.1.1.4-47689)); Mon, 22 Mar 2010 12:59:04 -0400 X-VirtualServerGroup: Default X-MailingID: 00000::00000::00000::00000::::1403 X-SMHeaderMap: mid="X-MailingID" X-Destination-ID: freebsd-fs@freebsd.org X-SMFBL: ZnJlZWJzZC1mc0BmcmVlYnNkLm9yZw== Message-ID: <4BA79E54.5030504@comcast.net> Date: Mon, 22 Mar 2010 12:44:04 -0400 From: Steve Polyack User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.7) Gecko/20100311 Thunderbird/3.0.1 MIME-Version: 1.0 To: John Baldwin References: <4BA3613F.4070606@comcast.net> <4BA78444.4040707@comcast.net> <4BA7911F.5060905@comcast.net> <201003221200.41607.jhb@freebsd.org> In-Reply-To: <201003221200.41607.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, User Questions , bseklecki@noc.cfi.pgh.pa.us Subject: Re: FreeBSD NFS client goes into infinite retry loop X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 16:44:06 -0000 On 03/22/10 12:00, John Baldwin wrote: > On Monday 22 March 2010 11:47:43 am Steve Polyack wrote: > >> On 03/22/10 10:52, Steve Polyack wrote: >> >>> On 3/19/2010 11:27 PM, Rick Macklem wrote: >>> >>>> On Fri, 19 Mar 2010, Steve Polyack wrote: >>>> >>>> [good stuff snipped] >>>> >>>>> This makes sense. According to wireshark, the server is indeed >>>>> transmitting "Status: NFS3ERR_IO (5)". Perhaps this should be STALE >>>>> instead; it sounds more correct than marking it a general IO error. >>>>> Also, the NFS server is serving its share off of a ZFS filesystem, >>>>> if it makes any difference. I suppose ZFS could be talking to the >>>>> NFS server threads with some mismatched language, but I doubt it. >>>>> >>>>> >>>> Ok, now I think we're making progress. If VFS_FHTOVP() doesn't return >>>> ESTALE when the file no longer exists, the NFS server returns whatever >>>> error it has returned. >>>> >>>> So, either VFS_FHTOVP() succeeds after the file has been deleted, which >>>> would be a problem that needs to be fixed within ZFS >>>> OR >>>> ZFS returns an error other than ESTALE when it doesn't exist. >>>> >>>> Try the following patch on the server (which just makes any error >>>> returned by VFS_FHTOVP() into ESTALE) and see if that helps. >>>> >>>> --- nfsserver/nfs_srvsubs.c.sav 2010-03-19 22:06:43.000000000 -0400 >>>> +++ nfsserver/nfs_srvsubs.c 2010-03-19 22:07:22.000000000 -0400 >>>> @@ -1127,6 +1127,8 @@ >>>> } >>>> } >>>> error = VFS_FHTOVP(mp,&fhp->fh_fid, vpp); >>>> + if (error != 0) >>>> + error = ESTALE; >>>> vfs_unbusy(mp); >>>> if (error) >>>> goto out; >>>> >>>> Please let me know if the patch helps, rick >>>> >>>> >>>> >>> The patch seems to fix the bad behavior. Running with the patch, I >>> see the following output from my patch (return code of nfs_doio from >>> within nfsiod): >>> nfssvc_iod: iod 0 nfs_doio returned errno: 70 >>> >>> Furthermore, when inspecting the transaction with Wireshark, after >>> deleting the file on the NFS server it looks like there is only a >>> single error. This time there it is a reply to a V3 Lookup call that >>> contains a status of "NFS3ERR_NOENT (2)" coming from the NFS server. >>> The client also does not repeatedly try to complete the failed request. >>> >>> Any suggestions on the next step here? Based on what you said it >>> looks like ZFS is falsely reporting an IO error to VFS instead of >>> ESTALE / NOENT. I tried looking around zfs_fhtovp() and only saw >>> returns of EINVAL, but I'm not even sure I'm looking in the right place. >>> >> Further on down the rabbit hole... here's the piece in zfs_fhtovp() >> where it's kicking out EINVAL instead of ESTALE - the following patch >> corrects the behavior, but of course also suggests further digging >> within the zfs_zget() function to ensure that _it_ is returning the >> correct thing and whether or not it needs to be handled there or within >> zfs_fhtovp(). >> >> --- >> src-orig/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c >> 2010-03-22 11:41:21.000000000 -0400 >> +++ src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c >> 2010-03-22 16:25:21.000000000 -0400 >> @@ -1246,7 +1246,7 @@ >> dprintf("getting %llu [%u mask %llx]\n", object, fid_gen, gen_mask); >> if (err = zfs_zget(zfsvfs, object,&zp)) { >> ZFS_EXIT(zfsvfs); >> - return (err); >> + return (ESTALE); >> } >> zp_gen = zp->z_phys->zp_gen& gen_mask; >> if (zp_gen == 0) >> > So the odd thing here is that ffs_fhtovp() doesn't return ESTALE if VFS_VGET() > (which calls ffs_vget()) fails, it only returns ESTALE if the generation count > doesn't matter. > > It looks like it also returns ESTALE when the inode is invalid (< ROOTINO || > max inodes?) - would an unlinked file in FFS referenced at a later time report an invalid inode? But back to your point, zfs_zget() seems to be failing and returning the EINVAL before zfs_fhtovp() even has a chance to set and check zp_gen. I'm trying to get some more details through the use of gratuitous dprintf()'s, but they don't seem to be making it to any logs or the console even with vfs.zfs.debug=1 set. Any pointers on how to get these dprintf() calls working? Thanks again. From owner-freebsd-fs@FreeBSD.ORG Mon Mar 22 17:41:03 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E17D51065679; Mon, 22 Mar 2010 17:41:03 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id AF52A8FC14; Mon, 22 Mar 2010 17:41:03 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 5D6A646B2D; Mon, 22 Mar 2010 13:41:03 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPA id 477F58A027; Mon, 22 Mar 2010 13:41:02 -0400 (EDT) From: John Baldwin To: Steve Polyack Date: Mon, 22 Mar 2010 13:39:37 -0400 User-Agent: KMail/1.12.1 (FreeBSD/7.3-CBSD-20100217; KDE/4.3.1; amd64; ; ) References: <4BA3613F.4070606@comcast.net> <201003221200.41607.jhb@freebsd.org> <4BA79E54.5030504@comcast.net> In-Reply-To: <4BA79E54.5030504@comcast.net> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201003221339.37169.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Mon, 22 Mar 2010 13:41:02 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.7 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: freebsd-fs@freebsd.org, User Questions , bseklecki@noc.cfi.pgh.pa.us Subject: Re: FreeBSD NFS client goes into infinite retry loop X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 17:41:04 -0000 On Monday 22 March 2010 12:44:04 pm Steve Polyack wrote: > On 03/22/10 12:00, John Baldwin wrote: > > On Monday 22 March 2010 11:47:43 am Steve Polyack wrote: > > > >> On 03/22/10 10:52, Steve Polyack wrote: > >> > >>> On 3/19/2010 11:27 PM, Rick Macklem wrote: > >>> > >>>> On Fri, 19 Mar 2010, Steve Polyack wrote: > >>>> > >>>> [good stuff snipped] > >>>> > >>>>> This makes sense. According to wireshark, the server is indeed > >>>>> transmitting "Status: NFS3ERR_IO (5)". Perhaps this should be STALE > >>>>> instead; it sounds more correct than marking it a general IO error. > >>>>> Also, the NFS server is serving its share off of a ZFS filesystem, > >>>>> if it makes any difference. I suppose ZFS could be talking to the > >>>>> NFS server threads with some mismatched language, but I doubt it. > >>>>> > >>>>> > >>>> Ok, now I think we're making progress. If VFS_FHTOVP() doesn't return > >>>> ESTALE when the file no longer exists, the NFS server returns whatever > >>>> error it has returned. > >>>> > >>>> So, either VFS_FHTOVP() succeeds after the file has been deleted, which > >>>> would be a problem that needs to be fixed within ZFS > >>>> OR > >>>> ZFS returns an error other than ESTALE when it doesn't exist. > >>>> > >>>> Try the following patch on the server (which just makes any error > >>>> returned by VFS_FHTOVP() into ESTALE) and see if that helps. > >>>> > >>>> --- nfsserver/nfs_srvsubs.c.sav 2010-03-19 22:06:43.000000000 -0400 > >>>> +++ nfsserver/nfs_srvsubs.c 2010-03-19 22:07:22.000000000 -0400 > >>>> @@ -1127,6 +1127,8 @@ > >>>> } > >>>> } > >>>> error = VFS_FHTOVP(mp,&fhp->fh_fid, vpp); > >>>> + if (error != 0) > >>>> + error = ESTALE; > >>>> vfs_unbusy(mp); > >>>> if (error) > >>>> goto out; > >>>> > >>>> Please let me know if the patch helps, rick > >>>> > >>>> > >>>> > >>> The patch seems to fix the bad behavior. Running with the patch, I > >>> see the following output from my patch (return code of nfs_doio from > >>> within nfsiod): > >>> nfssvc_iod: iod 0 nfs_doio returned errno: 70 > >>> > >>> Furthermore, when inspecting the transaction with Wireshark, after > >>> deleting the file on the NFS server it looks like there is only a > >>> single error. This time there it is a reply to a V3 Lookup call that > >>> contains a status of "NFS3ERR_NOENT (2)" coming from the NFS server. > >>> The client also does not repeatedly try to complete the failed request. > >>> > >>> Any suggestions on the next step here? Based on what you said it > >>> looks like ZFS is falsely reporting an IO error to VFS instead of > >>> ESTALE / NOENT. I tried looking around zfs_fhtovp() and only saw > >>> returns of EINVAL, but I'm not even sure I'm looking in the right place. > >>> > >> Further on down the rabbit hole... here's the piece in zfs_fhtovp() > >> where it's kicking out EINVAL instead of ESTALE - the following patch > >> corrects the behavior, but of course also suggests further digging > >> within the zfs_zget() function to ensure that _it_ is returning the > >> correct thing and whether or not it needs to be handled there or within > >> zfs_fhtovp(). > >> > >> --- > >> src-orig/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c > >> 2010-03-22 11:41:21.000000000 -0400 > >> +++ src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c > >> 2010-03-22 16:25:21.000000000 -0400 > >> @@ -1246,7 +1246,7 @@ > >> dprintf("getting %llu [%u mask %llx]\n", object, fid_gen, gen_mask); > >> if (err = zfs_zget(zfsvfs, object,&zp)) { > >> ZFS_EXIT(zfsvfs); > >> - return (err); > >> + return (ESTALE); > >> } > >> zp_gen = zp->z_phys->zp_gen& gen_mask; > >> if (zp_gen == 0) > >> > > So the odd thing here is that ffs_fhtovp() doesn't return ESTALE if VFS_VGET() > > (which calls ffs_vget()) fails, it only returns ESTALE if the generation count > > doesn't matter. > > > > > It looks like it also returns ESTALE when the inode is invalid (< > ROOTINO || > max inodes?) - would an unlinked file in FFS referenced at > a later time report an invalid inode? > > But back to your point, zfs_zget() seems to be failing and returning the > EINVAL before zfs_fhtovp() even has a chance to set and check zp_gen. > I'm trying to get some more details through the use of gratuitous > dprintf()'s, but they don't seem to be making it to any logs or the > console even with vfs.zfs.debug=1 set. Any pointers on how to get these > dprintf() calls working? That I have no idea on. Maybe Rick can chime in? I'm actually not sure why we would want to treat a FHTOVP failure as anything but an ESTALE error in the NFS server to be honest. -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Mon Mar 22 18:25:58 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E8D5C1065670; Mon, 22 Mar 2010 18:25:58 +0000 (UTC) (envelope-from js@saltmine.radix.net) Received: from saltmine.radix.net (saltmine.radix.net [207.192.128.40]) by mx1.freebsd.org (Postfix) with ESMTP id 53EDF8FC13; Mon, 22 Mar 2010 18:25:58 +0000 (UTC) Received: from saltmine.radix.net (localhost [127.0.0.1]) by saltmine.radix.net (8.12.2/8.12.2) with ESMTP id o2MI9JV4019812; Mon, 22 Mar 2010 14:09:19 -0400 (EDT) Received: (from root@localhost) by saltmine.radix.net (8.12.2/8.12.2/Submit) id o2MI9I6R019808; Mon, 22 Mar 2010 14:09:18 -0400 (EDT) Received: from mail1.radix.net (mail1.radix.net [207.192.128.31]) by saltmine.radix.net (8.12.2/8.12.2) with ESMTP id o2MG3rV4002056 for ; Mon, 22 Mar 2010 12:03:53 -0400 (EDT) Received: from mx2.freebsd.org (mx2.freebsd.org [69.147.83.53]) by mail1.radix.net (8.13.4/8.13.4) with ESMTP id o2MG3qgf018626 for ; Mon, 22 Mar 2010 12:03:53 -0400 (EDT) Received: from hub.freebsd.org (hub.freebsd.org [IPv6:2001:4f8:fff6::36]) by mx2.freebsd.org (Postfix) with ESMTP id 49A7015FCB7; Mon, 22 Mar 2010 16:03:07 +0000 (UTC) Received: from hub.freebsd.org (localhost [127.0.0.1]) by hub.freebsd.org (Postfix) with ESMTP id 1314510656AD; Mon, 22 Mar 2010 16:03:07 +0000 (UTC) (envelope-from owner-freebsd-questions@freebsd.org) Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2F53F106566C; Mon, 22 Mar 2010 16:02:45 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id DDE258FC08; Mon, 22 Mar 2010 16:02:44 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 76A6D46B2D; Mon, 22 Mar 2010 12:02:44 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPA id AE9ED8A021; Mon, 22 Mar 2010 12:02:43 -0400 (EDT) From: John Baldwin To: Steve Polyack Date: Mon, 22 Mar 2010 12:00:41 -0400 User-Agent: KMail/1.12.1 (FreeBSD/7.3-CBSD-20100217; KDE/4.3.1; amd64; ; ) References: <4BA3613F.4070606@comcast.net> <4BA78444.4040707@comcast.net> <4BA7911F.5060905@comcast.net> In-Reply-To: <4BA7911F.5060905@comcast.net> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201003221200.41607.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Mon, 22 Mar 2010 12:02:43 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.7 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Sender: owner-freebsd-questions@freebsd.org Errors-To: owner-freebsd-questions@freebsd.org Status: O X-Status: X-Keywords: X-UID: 19 Cc: freebsd-fs@freebsd.org, User Questions , bseklecki@noc.cfi.pgh.pa.us Subject: Re: FreeBSD NFS client goes into infinite retry loop X-BeenThere: freebsd-fs@freebsd.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 18:25:59 -0000 On Monday 22 March 2010 11:47:43 am Steve Polyack wrote: > On 03/22/10 10:52, Steve Polyack wrote: > > On 3/19/2010 11:27 PM, Rick Macklem wrote: > >> On Fri, 19 Mar 2010, Steve Polyack wrote: > >> > >> [good stuff snipped] > >>> > >>> This makes sense. According to wireshark, the server is indeed > >>> transmitting "Status: NFS3ERR_IO (5)". Perhaps this should be STALE > >>> instead; it sounds more correct than marking it a general IO error. > >>> Also, the NFS server is serving its share off of a ZFS filesystem, > >>> if it makes any difference. I suppose ZFS could be talking to the > >>> NFS server threads with some mismatched language, but I doubt it. > >>> > >> Ok, now I think we're making progress. If VFS_FHTOVP() doesn't return > >> ESTALE when the file no longer exists, the NFS server returns whatever > >> error it has returned. > >> > >> So, either VFS_FHTOVP() succeeds after the file has been deleted, which > >> would be a problem that needs to be fixed within ZFS > >> OR > >> ZFS returns an error other than ESTALE when it doesn't exist. > >> > >> Try the following patch on the server (which just makes any error > >> returned by VFS_FHTOVP() into ESTALE) and see if that helps. > >> > >> --- nfsserver/nfs_srvsubs.c.sav 2010-03-19 22:06:43.000000000 -0400 > >> +++ nfsserver/nfs_srvsubs.c 2010-03-19 22:07:22.000000000 -0400 > >> @@ -1127,6 +1127,8 @@ > >> } > >> } > >> error = VFS_FHTOVP(mp, &fhp->fh_fid, vpp); > >> + if (error != 0) > >> + error = ESTALE; > >> vfs_unbusy(mp); > >> if (error) > >> goto out; > >> > >> Please let me know if the patch helps, rick > >> > >> > > The patch seems to fix the bad behavior. Running with the patch, I > > see the following output from my patch (return code of nfs_doio from > > within nfsiod): > > nfssvc_iod: iod 0 nfs_doio returned errno: 70 > > > > Furthermore, when inspecting the transaction with Wireshark, after > > deleting the file on the NFS server it looks like there is only a > > single error. This time there it is a reply to a V3 Lookup call that > > contains a status of "NFS3ERR_NOENT (2)" coming from the NFS server. > > The client also does not repeatedly try to complete the failed request. > > > > Any suggestions on the next step here? Based on what you said it > > looks like ZFS is falsely reporting an IO error to VFS instead of > > ESTALE / NOENT. I tried looking around zfs_fhtovp() and only saw > > returns of EINVAL, but I'm not even sure I'm looking in the right place. > > Further on down the rabbit hole... here's the piece in zfs_fhtovp() > where it's kicking out EINVAL instead of ESTALE - the following patch > corrects the behavior, but of course also suggests further digging > within the zfs_zget() function to ensure that _it_ is returning the > correct thing and whether or not it needs to be handled there or within > zfs_fhtovp(). > > --- > src-orig/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c > 2010-03-22 11:41:21.000000000 -0400 > +++ src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c > 2010-03-22 16:25:21.000000000 -0400 > @@ -1246,7 +1246,7 @@ > dprintf("getting %llu [%u mask %llx]\n", object, fid_gen, gen_mask); > if (err = zfs_zget(zfsvfs, object, &zp)) { > ZFS_EXIT(zfsvfs); > - return (err); > + return (ESTALE); > } > zp_gen = zp->z_phys->zp_gen & gen_mask; > if (zp_gen == 0) So the odd thing here is that ffs_fhtovp() doesn't return ESTALE if VFS_VGET() (which calls ffs_vget()) fails, it only returns ESTALE if the generation count doesn't matter. -- John Baldwin _______________________________________________ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Mon Mar 22 18:41:42 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D18B8106564A; Mon, 22 Mar 2010 18:41:42 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 7B7858FC18; Mon, 22 Mar 2010 18:41:41 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id UAA00928; Mon, 22 Mar 2010 20:41:39 +0200 (EET) (envelope-from avg@freebsd.org) Message-ID: <4BA7B9E3.3080003@freebsd.org> Date: Mon, 22 Mar 2010 20:41:39 +0200 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.23 (X11/20100211) MIME-Version: 1.0 To: freebsd-fs@freebsd.org, freebsd-geom@freebsd.org X-Enigmail-Version: 0.95.7 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: Pawel Jakub Dawidek Subject: geom_vfs: disallow multiple readonly mounts of a device X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 18:41:42 -0000 Currently FreeBSD allows multiple readonly mounts of the same device. This feature seems to be introduced along with GEOM. Before that, only one mount of a device was allowed, whether readonly or not. Other (major) BSDs still work that way. Unfortunately, the feature has never really worked correctly because of some architectural/design issues in our buffer cache layer/interface. Multiple shared mounts are allowed to happen but later on, during filesystem access or at unmount time, a system would crash. Because of this, I propose to disable shared mounting feature for time being. In my opinion it is very unlikely that anybody depends on this feature now, but nullfs can be used as its replacement. In fact, it could be even more efficient if the same files are access via different mount points. The proposed patch is a greatly reduced version of a patch by Pawel Jakub Dawidek: http://people.freebsd.org/~pjd/patches/mro_mount.patch Pawel's patch, in fact, fixes the shared mounting feature. Unfortunately, at least one filesystem (FFS) is quite intrusive at what it does with a device vnode (or rather its bufobj) when it attempts mounting. So, in some edge cases (synthetic or accidental) a system may still crash even with the Pawel's patch. So, I think that it should not be committed until after all filesystems in the tree are made to play nice. Otherwise, I like it. As an afterthought, perhaps it is the Pawel's patch that we should actually use with one change - add a sysctl that enables/disables shared mounts and make them disabled by default. Thank you very for the feedback. --- a/sys/geom/geom_vfs.c +++ b/sys/geom/geom_vfs.c @@ -161,6 +161,10 @@ g_vfs_open g_topology_assert(); *cpp = NULL; + bo = &vp->v_bufobj; + if (bo->bo_private != vp) + return (EBUSY); + pp = g_dev_getprovider(vp->v_rdev); if (pp == NULL) return (ENOENT); @@ -176,6 +180,6 @@ g_vfs_open vnode_create_vobject(vp, pp->mediasize, curthread); VFS_UNLOCK_GIANT(vfslocked); *cpp = cp; - bo = &vp->v_bufobj; + cp->private = vp; bo->bo_ops = g_vfs_bufops; bo->bo_private = cp; @@ -196,5 +200,6 @@ g_vfs_close gp = cp->geom; bo = gp->softc; bufobj_invalbuf(bo, V_SAVE, 0, 0); + bo->bo_private = cp->private; g_wither_geom_close(gp, ENXIO); } -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Mon Mar 22 21:38:19 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CFC86106566B for ; Mon, 22 Mar 2010 21:38:19 +0000 (UTC) (envelope-from korvus@comcast.net) Received: from mx04.pub.collaborativefusion.com (mx04.pub.collaborativefusion.com [206.210.72.84]) by mx1.freebsd.org (Postfix) with ESMTP id 994DF8FC0C for ; Mon, 22 Mar 2010 21:38:19 +0000 (UTC) Received: from [192.168.2.164] ([206.210.89.202]) by mx04.pub.collaborativefusion.com (StrongMail Enterprise 4.1.1.4(4.1.1.4-47689)); Mon, 22 Mar 2010 17:53:14 -0400 X-VirtualServerGroup: Default X-MailingID: 00000::00000::00000::00000::::1864 X-SMHeaderMap: mid="X-MailingID" X-Destination-ID: freebsd-fs@freebsd.org X-SMFBL: ZnJlZWJzZC1mc0BmcmVlYnNkLm9yZw== Message-ID: <4BA7E349.3080804@comcast.net> Date: Mon, 22 Mar 2010 17:38:17 -0400 From: Steve Polyack User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.7) Gecko/20100311 Thunderbird/3.0.1 MIME-Version: 1.0 To: John Baldwin References: <4BA3613F.4070606@comcast.net> <201003221200.41607.jhb@freebsd.org> <4BA79E54.5030504@comcast.net> <201003221339.37169.jhb@freebsd.org> In-Reply-To: <201003221339.37169.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, User Questions , bseklecki@noc.cfi.pgh.pa.us Subject: Re: FreeBSD NFS client goes into infinite retry loop X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 21:38:19 -0000 On 03/22/10 13:39, John Baldwin wrote: > On Monday 22 March 2010 12:44:04 pm Steve Polyack wrote: > >> On 03/22/10 12:00, John Baldwin wrote: >> >>> On Monday 22 March 2010 11:47:43 am Steve Polyack wrote: >>> >>> >>>> On 03/22/10 10:52, Steve Polyack wrote: >>>> >>>> >>>>> On 3/19/2010 11:27 PM, Rick Macklem wrote: >>>>> >>>>> >>>>>> On Fri, 19 Mar 2010, Steve Polyack wrote: >>>>>> >>>>>> [good stuff snipped] >>>>>> >>>>>> >>>>>>> This makes sense. According to wireshark, the server is indeed >>>>>>> transmitting "Status: NFS3ERR_IO (5)". Perhaps this should be STALE >>>>>>> instead; it sounds more correct than marking it a general IO error. >>>>>>> Also, the NFS server is serving its share off of a ZFS filesystem, >>>>>>> if it makes any difference. I suppose ZFS could be talking to the >>>>>>> NFS server threads with some mismatched language, but I doubt it. >>>>>>> >>>>>>> >>>>>>> >>>>>> Ok, now I think we're making progress. If VFS_FHTOVP() doesn't return >>>>>> ESTALE when the file no longer exists, the NFS server returns whatever >>>>>> error it has returned. >>>>>> >>>>>> So, either VFS_FHTOVP() succeeds after the file has been deleted, which >>>>>> would be a problem that needs to be fixed within ZFS >>>>>> OR >>>>>> ZFS returns an error other than ESTALE when it doesn't exist. >>>>>> >>>>>> Try the following patch on the server (which just makes any error >>>>>> returned by VFS_FHTOVP() into ESTALE) and see if that helps. >>>>>> >>>>>> --- nfsserver/nfs_srvsubs.c.sav 2010-03-19 22:06:43.000000000 -0400 >>>>>> +++ nfsserver/nfs_srvsubs.c 2010-03-19 22:07:22.000000000 -0400 >>>>>> @@ -1127,6 +1127,8 @@ >>>>>> } >>>>>> } >>>>>> error = VFS_FHTOVP(mp,&fhp->fh_fid, vpp); >>>>>> + if (error != 0) >>>>>> + error = ESTALE; >>>>>> vfs_unbusy(mp); >>>>>> if (error) >>>>>> goto out; >>>>>> >>>>>> Please let me know if the patch helps, rick >>>>>> >>>>>> >>>>>> >>>>>> >>>>> The patch seems to fix the bad behavior. Running with the patch, I >>>>> see the following output from my patch (return code of nfs_doio from >>>>> within nfsiod): >>>>> nfssvc_iod: iod 0 nfs_doio returned errno: 70 >>>>> >>>>> Furthermore, when inspecting the transaction with Wireshark, after >>>>> deleting the file on the NFS server it looks like there is only a >>>>> single error. This time there it is a reply to a V3 Lookup call that >>>>> contains a status of "NFS3ERR_NOENT (2)" coming from the NFS server. >>>>> The client also does not repeatedly try to complete the failed request. >>>>> >>>>> Any suggestions on the next step here? Based on what you said it >>>>> looks like ZFS is falsely reporting an IO error to VFS instead of >>>>> ESTALE / NOENT. I tried looking around zfs_fhtovp() and only saw >>>>> returns of EINVAL, but I'm not even sure I'm looking in the right place. >>>>> >>>>> >>>> Further on down the rabbit hole... here's the piece in zfs_fhtovp() >>>> where it's kicking out EINVAL instead of ESTALE - the following patch >>>> corrects the behavior, but of course also suggests further digging >>>> within the zfs_zget() function to ensure that _it_ is returning the >>>> correct thing and whether or not it needs to be handled there or within >>>> zfs_fhtovp(). >>>> >>>> --- >>>> src-orig/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c >>>> 2010-03-22 11:41:21.000000000 -0400 >>>> +++ src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c >>>> 2010-03-22 16:25:21.000000000 -0400 >>>> @@ -1246,7 +1246,7 @@ >>>> dprintf("getting %llu [%u mask %llx]\n", object, fid_gen, >>>> > gen_mask); > >>>> if (err = zfs_zget(zfsvfs, object,&zp)) { >>>> ZFS_EXIT(zfsvfs); >>>> - return (err); >>>> + return (ESTALE); >>>> } >>>> zp_gen = zp->z_phys->zp_gen& gen_mask; >>>> if (zp_gen == 0) >>>> >>>> >>> So the odd thing here is that ffs_fhtovp() doesn't return ESTALE if >>> > VFS_VGET() > >>> (which calls ffs_vget()) fails, it only returns ESTALE if the generation >>> > count > >>> doesn't matter. >>> >>> >>> >> It looks like it also returns ESTALE when the inode is invalid (< >> ROOTINO ||> max inodes?) - would an unlinked file in FFS referenced at >> a later time report an invalid inode? >> >> But back to your point, zfs_zget() seems to be failing and returning the >> EINVAL before zfs_fhtovp() even has a chance to set and check zp_gen. >> I'm trying to get some more details through the use of gratuitous >> dprintf()'s, but they don't seem to be making it to any logs or the >> console even with vfs.zfs.debug=1 set. Any pointers on how to get these >> dprintf() calls working? >> > That I have no idea on. Maybe Rick can chime in? I'm actually not sure why > we would want to treat a FHTOVP failure as anything but an ESTALE error in the > NFS server to be honest. > > I resorted to changing dprintf()s to printf()s. The failure in zfs_fhtovp() is indeed from zfs_zget(), which fails right at the top where it calls dmu_bonus_hold(): Mar 22 16:55:44 zfs-dev kernel: zfs_zget(): dmu_bonus_hold() failed, returning err: 17 Mar 22 16:55:44 zfs-dev kernel: zfs_fhtovp(): zfs_zget() failed, bailing out with err: 17 errno 17 seems to map to EEXIST. in zfs_zget(): err = dmu_bonus_hold(zfsvfs->z_os, obj_num, NULL, &db); if (err) { ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num); printf("zfs_zget(): dmu_bonus_hold() failed, returning err: %d\n", err); return (err); } dmu_bonus_hold() calls dnode_hold_impl(), which seems to be who is returning EEXIST (17). It's probably not kosher to modify such returns, so I suspect fixing this within the NFS server may be the only option. Regardless, is it safe to just treat any other error from VFS_FHTOVP() as ESTALE within the NFS code? This is what Rick's testing patch does, but it leaves me curious as to how it would act when other real errors returned from an _fhtovp() call. Perhaps zfs_fhtovp() is the place to handle it, translating the error appropriately for the VFS_FHTOVP() caller within the NFS server code. This could avoid any nasty side effects for NFS operations on other filesystems where things are already working "fine". This is the point where I'm lost as to where to go next. I'll try the experimental NFS server switches soon just to see if it handles it any better, but based on the findings so far I would think that it would act the same. From owner-freebsd-fs@FreeBSD.ORG Mon Mar 22 23:14:26 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E2ED81065672 for ; Mon, 22 Mar 2010 23:14:26 +0000 (UTC) (envelope-from oscaruser@programmer.net) Received: from imr-db03.mx.aol.com (imr-db03.mx.aol.com [205.188.91.97]) by mx1.freebsd.org (Postfix) with ESMTP id A326F8FC15 for ; Mon, 22 Mar 2010 23:14:26 +0000 (UTC) Received: from imo-ma02.mx.aol.com (imo-ma02.mx.aol.com [64.12.78.137]) by imr-db03.mx.aol.com (8.14.1/8.14.1) with ESMTP id o2MNEP7k030269 for ; Mon, 22 Mar 2010 19:14:25 -0400 Received: from oscaruser@programmer.net by imo-ma02.mx.aol.com (mail_out_v42.9.) id n.d63.55a715a6 (37583) for ; Mon, 22 Mar 2010 19:14:21 -0400 (EDT) Received: from smtprly-dc02.mx.aol.com (smtprly-dc02.mx.aol.com [205.188.170.2]) by cia-mb05.mx.aol.com (v127_r1.2) with ESMTP id MAILCIAMB058-d3874ba7f9c66d; Mon, 22 Mar 2010 19:14:21 -0400 Received: from web-mmc-d07 (web-mmc-d07.sim.aol.com [205.188.103.97]) by smtprly-dc02.mx.aol.com (v127.7) with ESMTP id MAILSMTPRLYDC027-d3874ba7f9c66d; Mon, 22 Mar 2010 19:14:14 -0400 To: freebsd-fs@freebsd.org Content-Transfer-Encoding: quoted-printable Date: Mon, 22 Mar 2010 19:14:14 -0400 X-MB-Message-Source: WebUI X-AOL-IP: 72.29.180.81 X-MB-Message-Type: User MIME-Version: 1.0 From: oscaruser@programmer.net Content-Type: text/plain; charset="us-ascii"; format=flowed X-Mailer: Mail.com Webmail 31144-STANDARD Received: from 72.29.180.81 by web-mmc-d07.sysops.aol.com (205.188.103.97) with HTTP (WebMailUI); Mon, 22 Mar 2010 19:14:14 -0400 Message-Id: <8CC982C85C86524-A6C-25C1@web-mmc-d07.sysops.aol.com> X-Spam-Flag: NO X-AOL-SENDER: oscaruser@programmer.net X-Mailman-Approved-At: Mon, 22 Mar 2010 23:27:31 +0000 Subject: NFS Read Only Mount & NFS-failover X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 23:14:27 -0000 Folks, Apparently Solaris has a mechanism for NFS failover, and is described=20 in the below URL reference. The key is that the NFS was mounted read=20 only & and the files systems have identical files so that the NFS=20 client can switch-over seamlessly. We have a similar need, but does=20 FBSD support this via NFS at all? If not, is there an alternative (AFS=20 or something) that achieves this end goal. Google searches said that=20 no, this is not available yet. Thanks, OSC http://searchnetworking.techtarget.com/tip/0,289483,sid7_gci903711,00.html From owner-freebsd-fs@FreeBSD.ORG Mon Mar 22 23:40:27 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 24D751065672; Mon, 22 Mar 2010 23:40:27 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 9DE0E8FC12; Mon, 22 Mar 2010 23:40:26 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAK6cp0uDaFvK/2dsb2JhbACbKXO9G4R9BA X-IronPort-AV: E=Sophos;i="4.51,290,1267419600"; d="scan'208";a="69919885" Received: from fraser.cs.uoguelph.ca ([131.104.91.202]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 22 Mar 2010 19:40:25 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by fraser.cs.uoguelph.ca (Postfix) with ESMTP id AB40E109C2EF; Mon, 22 Mar 2010 19:40:25 -0400 (EDT) X-Virus-Scanned: amavisd-new at fraser.cs.uoguelph.ca Received: from fraser.cs.uoguelph.ca ([127.0.0.1]) by localhost (fraser.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8txDjRB+Ab4p; Mon, 22 Mar 2010 19:40:25 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by fraser.cs.uoguelph.ca (Postfix) with ESMTP id 3C9CF109C2F4; Mon, 22 Mar 2010 19:40:25 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id o2MNrNa22445; Mon, 22 Mar 2010 19:53:23 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Mon, 22 Mar 2010 19:53:23 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: John Baldwin In-Reply-To: <201003221339.37169.jhb@freebsd.org> Message-ID: References: <4BA3613F.4070606@comcast.net> <201003221200.41607.jhb@freebsd.org> <4BA79E54.5030504@comcast.net> <201003221339.37169.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org, bseklecki@noc.cfi.pgh.pa.us, User Questions Subject: Re: FreeBSD NFS client goes into infinite retry loop X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Mar 2010 23:40:27 -0000 On Mon, 22 Mar 2010, John Baldwin wrote: >> It looks like it also returns ESTALE when the inode is invalid (< >> ROOTINO || > max inodes?) - would an unlinked file in FFS referenced at >> a later time report an invalid inode? >> I'm no ufs guy, but the only way I can think of is if the file system on the server was newfs'd with fewer i-nodes? (Unlikely, but...) (Basically, it is safe to return ESTALE for anything that is not a transient failure that could recover on a retry.) >> But back to your point, zfs_zget() seems to be failing and returning the >> EINVAL before zfs_fhtovp() even has a chance to set and check zp_gen. >> I'm trying to get some more details through the use of gratuitous >> dprintf()'s, but they don't seem to be making it to any logs or the >> console even with vfs.zfs.debug=1 set. Any pointers on how to get these >> dprintf() calls working? I know diddly (as in absolutely nothing about zfs). > > That I have no idea on. Maybe Rick can chime in? I'm actually not sure why > we would want to treat a FHTOVP failure as anything but an ESTALE error in the > NFS server to be honest. > As far as I know, only if the underlying file system somehow has a situation where the file handle can't be translated at that point in time, but could be able to later. I have no idea if any file system is like that and I don't such a file system would be an appropriate choice for an NFS server, even if such a beast exists. (Even then, although FreeBSD's client assumes EIO might recover on a retry, that isn't specified in any RFC, as far as I know.) That's why I proposed a patch that simply translates all VFS_FHTOVP() errors to ESTALE in the NFS server. (It seems simpler than chasing down cases in all the underlying file systems?) rick, chiming in:-) From owner-freebsd-fs@FreeBSD.ORG Tue Mar 23 00:35:54 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 340EC106566B; Tue, 23 Mar 2010 00:35:53 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 6D1D98FC24; Tue, 23 Mar 2010 00:35:53 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAFapp0uDaFvK/2dsb2JhbACbKnO+AIR9BI5O X-IronPort-AV: E=Sophos;i="4.51,291,1267419600"; d="scan'208";a="69924857" Received: from fraser.cs.uoguelph.ca ([131.104.91.202]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 22 Mar 2010 20:35:52 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by fraser.cs.uoguelph.ca (Postfix) with ESMTP id 8E0F7109C2EF; Mon, 22 Mar 2010 20:35:52 -0400 (EDT) X-Virus-Scanned: amavisd-new at fraser.cs.uoguelph.ca Received: from fraser.cs.uoguelph.ca ([127.0.0.1]) by localhost (fraser.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5YK87VunK0Ms; Mon, 22 Mar 2010 20:35:52 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by fraser.cs.uoguelph.ca (Postfix) with ESMTP id 08F00109C263; Mon, 22 Mar 2010 20:35:52 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id o2N0mop28799; Mon, 22 Mar 2010 20:48:50 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Mon, 22 Mar 2010 20:48:50 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: Daniel Braniss In-Reply-To: Message-ID: References: <201003171120.o2HBK3CV082081@freefall.freebsd.org> <20100317113953.GA14582@icarus.home.lan> <86tys9eqo6.fsf@kopusha.onet> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: bug-followup@FreeBSD.org, freebsd-fs@FreeBSD.org, Kai Kockro Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Mar 2010 00:35:54 -0000 On Mon, 22 Mar 2010, Daniel Braniss wrote: [good stuff snipped] >> >> The initial leap doesn't worry me. That's just a design constraint. > yes, but new-nsfd does it better. > It's the classic tradeoff between a generic tool and one designed for a specific case. Because of quirks in NFSv4, the experimental server has no choice but to use a replay cache designed specifically for it and it knows assorted things about NFS. The one in sys/rpc/replay.c doesn't know anything about NFS, so it will be less efficient w.r.t. NFS. >> A slow leak after that is still a problem. (I might have seen the >> slow leak in testing here. I'll poke at it and see if I can reproduce >> that.) > > all I do is mount upd on a client and start a write process. > I only have a FreeBSD client at this point, and it doesn't cause the leak for nfsv3,udp for me here. Doug Rabson pointed out that there would be a leak for the "default:" case too, although didn't know if that would occur in practice. So, maybe you could test this variant of the patch (just in case that was the slow leak...): --- rpc/svc.c.sav 2010-03-21 18:46:20.000000000 -0400 +++ rpc/svc.c 2010-03-22 19:00:17.000000000 -0400 @@ -819,9 +819,11 @@ free(r->rq_addr, M_SONAME); r->rq_addr = NULL; } + m_freem(args); goto call_done; default: + m_freem(args); goto call_done; } } > there seems to be an NFSLOCK involved before calling replay_setsize ... > Ah, thanks for pointing that out. Thanks for the good testing. At least we're down to a slow leak..rick From owner-freebsd-fs@FreeBSD.ORG Tue Mar 23 00:40:05 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B02B81065673 for ; Tue, 23 Mar 2010 00:40:05 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 8501D8FC1B for ; Tue, 23 Mar 2010 00:40:05 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o2N0e5cf020841 for ; Tue, 23 Mar 2010 00:40:05 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o2N0e55M020836; Tue, 23 Mar 2010 00:40:05 GMT (envelope-from gnats) Date: Tue, 23 Mar 2010 00:40:05 GMT Message-Id: <201003230040.o2N0e55M020836@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Rick Macklem Cc: Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Rick Macklem List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Mar 2010 00:40:05 -0000 The following reply was made to PR kern/144330; it has been noted by GNATS. From: Rick Macklem To: Daniel Braniss Cc: Mikolaj Golub , Jeremy Chadwick , freebsd-fs@FreeBSD.org, Kai Kockro , bug-followup@FreeBSD.org, gerrit@pmp.uni-hannover.de Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs Date: Mon, 22 Mar 2010 20:48:50 -0400 (EDT) On Mon, 22 Mar 2010, Daniel Braniss wrote: [good stuff snipped] >> >> The initial leap doesn't worry me. That's just a design constraint. > yes, but new-nsfd does it better. > It's the classic tradeoff between a generic tool and one designed for a specific case. Because of quirks in NFSv4, the experimental server has no choice but to use a replay cache designed specifically for it and it knows assorted things about NFS. The one in sys/rpc/replay.c doesn't know anything about NFS, so it will be less efficient w.r.t. NFS. >> A slow leak after that is still a problem. (I might have seen the >> slow leak in testing here. I'll poke at it and see if I can reproduce >> that.) > > all I do is mount upd on a client and start a write process. > I only have a FreeBSD client at this point, and it doesn't cause the leak for nfsv3,udp for me here. Doug Rabson pointed out that there would be a leak for the "default:" case too, although didn't know if that would occur in practice. So, maybe you could test this variant of the patch (just in case that was the slow leak...): --- rpc/svc.c.sav 2010-03-21 18:46:20.000000000 -0400 +++ rpc/svc.c 2010-03-22 19:00:17.000000000 -0400 @@ -819,9 +819,11 @@ free(r->rq_addr, M_SONAME); r->rq_addr = NULL; } + m_freem(args); goto call_done; default: + m_freem(args); goto call_done; } } > there seems to be an NFSLOCK involved before calling replay_setsize ... > Ah, thanks for pointing that out. Thanks for the good testing. At least we're down to a slow leak..rick From owner-freebsd-fs@FreeBSD.ORG Tue Mar 23 08:34:37 2010 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4BC9A1065688; Tue, 23 Mar 2010 08:34:37 +0000 (UTC) (envelope-from danny@cs.huji.ac.il) Received: from kabab.cs.huji.ac.il (kabab.cs.huji.ac.il [132.65.16.84]) by mx1.freebsd.org (Postfix) with ESMTP id E50258FC0A; Tue, 23 Mar 2010 08:34:36 +0000 (UTC) Received: from pampa.cs.huji.ac.il ([132.65.80.32]) by kabab.cs.huji.ac.il with esmtp id 1NtzZK-000O4D-Bh; Tue, 23 Mar 2010 10:34:34 +0200 X-Mailer: exmh version 2.7.2 01/07/2005 with nmh-1.2 To: Rick Macklem In-reply-to: References: <201003171120.o2HBK3CV082081@freefall.freebsd.org> <20100317113953.GA14582@icarus.home.lan> <86tys9eqo6.fsf@kopusha.onet> Comments: In-reply-to Rick Macklem message dated "Mon, 22 Mar 2010 20:48:50 -0400." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Tue, 23 Mar 2010 10:34:33 +0200 From: Daniel Braniss Message-ID: Cc: bug-followup@FreeBSD.org, freebsd-fs@FreeBSD.org, Kai Kockro Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Mar 2010 08:34:37 -0000 > > > On Mon, 22 Mar 2010, Daniel Braniss wrote: > > [good stuff snipped] > I only have a FreeBSD client at this point, and it doesn't cause the > leak for nfsv3,udp for me here. my client is also FreeBSD 8.0, strange > > Doug Rabson pointed out that there would be a leak for the "default:" > case too, although didn't know if that would occur in practice. it does! :-) > > So, maybe you could test this variant of the patch (just in case that > was the slow leak...): > --- rpc/svc.c.sav 2010-03-21 18:46:20.000000000 -0400 > +++ rpc/svc.c 2010-03-22 19:00:17.000000000 -0400 > @@ -819,9 +819,11 @@ > free(r->rq_addr, M_SONAME); > r->rq_addr = NULL; > } > + m_freem(args); > goto call_done; > > default: > + m_freem(args); > goto call_done; > } > } that plugged it! see ftp://ftp.cs.huji.ac.il/users/danny/freebsd/mbuf-leak/store-02+++.ps [...] > Thanks for the good testing. At least we're down to a slow leak..rick thanks to you for taking time off of your retirement :-) danny From owner-freebsd-fs@FreeBSD.ORG Tue Mar 23 08:40:04 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6A66F1065695 for ; Tue, 23 Mar 2010 08:40:04 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 41B568FC1C for ; Tue, 23 Mar 2010 08:40:04 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o2N8e4Ml071709 for ; Tue, 23 Mar 2010 08:40:04 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o2N8e4Vm071708; Tue, 23 Mar 2010 08:40:04 GMT (envelope-from gnats) Date: Tue, 23 Mar 2010 08:40:04 GMT Message-Id: <201003230840.o2N8e4Vm071708@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Daniel Braniss Cc: Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Daniel Braniss List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Mar 2010 08:40:04 -0000 The following reply was made to PR kern/144330; it has been noted by GNATS. From: Daniel Braniss To: Rick Macklem Cc: Mikolaj Golub , Jeremy Chadwick , freebsd-fs@FreeBSD.org, Kai Kockro , bug-followup@FreeBSD.org, gerrit@pmp.uni-hannover.de Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs Date: Tue, 23 Mar 2010 10:34:33 +0200 > > > On Mon, 22 Mar 2010, Daniel Braniss wrote: > > [good stuff snipped] > I only have a FreeBSD client at this point, and it doesn't cause the > leak for nfsv3,udp for me here. my client is also FreeBSD 8.0, strange > > Doug Rabson pointed out that there would be a leak for the "default:" > case too, although didn't know if that would occur in practice. it does! :-) > > So, maybe you could test this variant of the patch (just in case that > was the slow leak...): > --- rpc/svc.c.sav 2010-03-21 18:46:20.000000000 -0400 > +++ rpc/svc.c 2010-03-22 19:00:17.000000000 -0400 > @@ -819,9 +819,11 @@ > free(r->rq_addr, M_SONAME); > r->rq_addr = NULL; > } > + m_freem(args); > goto call_done; > > default: > + m_freem(args); > goto call_done; > } > } that plugged it! see ftp://ftp.cs.huji.ac.il/users/danny/freebsd/mbuf-leak/store-02+++.ps [...] > Thanks for the good testing. At least we're down to a slow leak..rick thanks to you for taking time off of your retirement :-) danny From owner-freebsd-fs@FreeBSD.ORG Tue Mar 23 12:17:05 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 52B41106564A for ; Tue, 23 Mar 2010 12:17:05 +0000 (UTC) (envelope-from lopez.on.the.lists@yellowspace.net) Received: from mail.yellowspace.net (mail.yellowspace.net [80.190.200.164]) by mx1.freebsd.org (Postfix) with ESMTP id 0D8858FC32 for ; Tue, 23 Mar 2010 12:17:04 +0000 (UTC) Received: from furia.intranet ([93.104.175.237]) (AUTH: CRAM-MD5 lopez.on.the.lists@yellowspace.net, SSL: TLSv1/SSLv3, 256bits, CAMELLIA256-SHA) by mail.yellowspace.net with esmtp; Tue, 23 Mar 2010 13:17:01 +0100 id 00341CD2.000000004BA8B13D.00003FFF Message-ID: <4BA8B13C.3030804@yellowspace.net> Date: Tue, 23 Mar 2010 13:17:00 +0100 From: Lorenzo Perone User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.7) Gecko/20100111 Thunderbird/3.0.1 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <864468D4-DCE9-493B-9280-00E5FAB2A05C@lassitu.de> <20100309122954.GE3155@garage.freebsd.pl> <20100309125815.GF3155@garage.freebsd.pl> <20100310110202.GA1715@garage.freebsd.pl> <20100310173143.GD1715@garage.freebsd.pl> <27A6DF40-2EA8-4CFC-9E42-DD995E2F9342@sarenet.es> <20100313152429.GH3209@garage.freebsd.pl> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Many processes stuck in zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Mar 2010 12:17:05 -0000 Hi all, Any updates on this thread/issue? was following it with some interest, as I'm planning to set up a similar environment (periodical send/recv on live systems). big regards, Lorenzo On 15.03.10 09:32, Borja Marcos wrote: > > On Mar 13, 2010, at 4:24 PM, Pawel Jakub Dawidek wrote: > >> On Fri, Mar 12, 2010 at 11:12:49AM +0100, Borja Marcos wrote: >>> >>> On Mar 10, 2010, at 6:31 PM, Pawel Jakub Dawidek wrote: >>> >>>> Hmm, interesting. Especially those two traces: >>>> >>>> Tracing command zfs pid 1820 tid 100105 td 0xffffff0002ca4000 >>> >>> Just in case something was wrong, I wiped the two virtual machines and started everything again. The two virtual systems were acting a bit weird, giving lots of LORs. I don't know why, maybe something was corrupted, these virtual machines have been panicked and restarted many times. >>> >>> So, experiment repeated again, fresh start. Still deadlocking pretty soon if I start, say, 5 or 6 tar processes. Traces follow. >> >> Are you able to dump the kernel memory and make it available for me >> somewhere? I'd need to look at it wit GDB, as DDB is not enough here, >> I'm afraid. > > Sure. Do you have access to VMWare Fusion? I can create a snapshot of the deadlocked system and make it available to you. > > > > > > Borja. > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Tue Mar 23 13:21:42 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 430D5106566B; Tue, 23 Mar 2010 13:21:42 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id CB0528FC0C; Tue, 23 Mar 2010 13:21:41 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAFddqEuDaFvI/2dsb2JhbACbLHO7R4R9BI5P X-IronPort-AV: E=Sophos;i="4.51,295,1267419600"; d="scan'208";a="69624609" Received: from darling.cs.uoguelph.ca ([131.104.91.200]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 23 Mar 2010 09:21:40 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by darling.cs.uoguelph.ca (Postfix) with ESMTP id A359894011C; Tue, 23 Mar 2010 09:21:40 -0400 (EDT) X-Virus-Scanned: amavisd-new at darling.cs.uoguelph.ca Received: from darling.cs.uoguelph.ca ([127.0.0.1]) by localhost (darling.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BdkrlGJN7My5; Tue, 23 Mar 2010 09:21:40 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by darling.cs.uoguelph.ca (Postfix) with ESMTP id 15A16940117; Tue, 23 Mar 2010 09:21:40 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id o2NDYd228055; Tue, 23 Mar 2010 09:34:39 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Tue, 23 Mar 2010 09:34:39 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: Daniel Braniss In-Reply-To: Message-ID: References: <201003171120.o2HBK3CV082081@freefall.freebsd.org> <20100317113953.GA14582@icarus.home.lan> <86tys9eqo6.fsf@kopusha.onet> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: bug-followup@FreeBSD.org, freebsd-fs@FreeBSD.org, Kai Kockro Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Mar 2010 13:21:42 -0000 On Tue, 23 Mar 2010, Daniel Braniss wrote: >> I only have a FreeBSD client at this point, and it doesn't cause the >> leak for nfsv3,udp for me here. > my client is also FreeBSD 8.0, strange > I was already using the patch below when I tested and couldn't see it, so I guess it now appears that the patch works. >> >> Doug Rabson pointed out that there would be a leak for the "default:" >> case too, although didn't know if that would occur in practice. > it does! :-) > >> >> So, maybe you could test this variant of the patch (just in case that >> was the slow leak...): >> --- rpc/svc.c.sav 2010-03-21 18:46:20.000000000 -0400 >> +++ rpc/svc.c 2010-03-22 19:00:17.000000000 -0400 >> @@ -819,9 +819,11 @@ >> free(r->rq_addr, M_SONAME); >> r->rq_addr = NULL; >> } >> + m_freem(args); >> goto call_done; >> >> default: >> + m_freem(args); >> goto call_done; >> } >> } > that plugged it! > see > ftp://ftp.cs.huji.ac.il/users/danny/freebsd/mbuf-leak/store-02+++.ps Good work with the testing. I'll get it committed and put it up on the nfs patches page I have under http://people.freebsd.org/~rmacklem > thanks to you for taking time off of your retirement :-) > I plan on doing quite a bit of FreeBSD/NFS stuff during it, rick From owner-freebsd-fs@FreeBSD.ORG Tue Mar 23 13:30:04 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6D0D91065670 for ; Tue, 23 Mar 2010 13:30:04 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 419AB8FC19 for ; Tue, 23 Mar 2010 13:30:04 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o2NDU4W6021786 for ; Tue, 23 Mar 2010 13:30:04 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o2NDU44o021783; Tue, 23 Mar 2010 13:30:04 GMT (envelope-from gnats) Date: Tue, 23 Mar 2010 13:30:04 GMT Message-Id: <201003231330.o2NDU44o021783@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Rick Macklem Cc: Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Rick Macklem List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Mar 2010 13:30:04 -0000 The following reply was made to PR kern/144330; it has been noted by GNATS. From: Rick Macklem To: Daniel Braniss Cc: Mikolaj Golub , Jeremy Chadwick , freebsd-fs@FreeBSD.org, Kai Kockro , bug-followup@FreeBSD.org, gerrit@pmp.uni-hannover.de Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs Date: Tue, 23 Mar 2010 09:34:39 -0400 (EDT) On Tue, 23 Mar 2010, Daniel Braniss wrote: >> I only have a FreeBSD client at this point, and it doesn't cause the >> leak for nfsv3,udp for me here. > my client is also FreeBSD 8.0, strange > I was already using the patch below when I tested and couldn't see it, so I guess it now appears that the patch works. >> >> Doug Rabson pointed out that there would be a leak for the "default:" >> case too, although didn't know if that would occur in practice. > it does! :-) > >> >> So, maybe you could test this variant of the patch (just in case that >> was the slow leak...): >> --- rpc/svc.c.sav 2010-03-21 18:46:20.000000000 -0400 >> +++ rpc/svc.c 2010-03-22 19:00:17.000000000 -0400 >> @@ -819,9 +819,11 @@ >> free(r->rq_addr, M_SONAME); >> r->rq_addr = NULL; >> } >> + m_freem(args); >> goto call_done; >> >> default: >> + m_freem(args); >> goto call_done; >> } >> } > that plugged it! > see > ftp://ftp.cs.huji.ac.il/users/danny/freebsd/mbuf-leak/store-02+++.ps Good work with the testing. I'll get it committed and put it up on the nfs patches page I have under http://people.freebsd.org/~rmacklem > thanks to you for taking time off of your retirement :-) > I plan on doing quite a bit of FreeBSD/NFS stuff during it, rick From owner-freebsd-fs@FreeBSD.ORG Tue Mar 23 13:37:58 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BE9561065673 for ; Tue, 23 Mar 2010 13:37:58 +0000 (UTC) (envelope-from korvus@comcast.net) Received: from mx04.pub.collaborativefusion.com (mx04.pub.collaborativefusion.com [206.210.72.84]) by mx1.freebsd.org (Postfix) with ESMTP id 86DE58FC1E for ; Tue, 23 Mar 2010 13:37:58 +0000 (UTC) Received: from [192.168.2.164] ([206.210.89.202]) by mx04.pub.collaborativefusion.com (StrongMail Enterprise 4.1.1.4(4.1.1.4-47689)); Tue, 23 Mar 2010 09:52:46 -0400 X-VirtualServerGroup: Default X-MailingID: 00000::00000::00000::00000::::1526 X-SMHeaderMap: mid="X-MailingID" X-Destination-ID: freebsd-fs@freebsd.org X-SMFBL: ZnJlZWJzZC1mc0BmcmVlYnNkLm9yZw== Message-ID: <4BA8C434.30805@comcast.net> Date: Tue, 23 Mar 2010 09:37:56 -0400 From: Steve Polyack User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.7) Gecko/20100311 Thunderbird/3.0.1 MIME-Version: 1.0 To: Rick Macklem , John Baldwin , freebsd-fs@freebsd.org References: <4BA8B2CB.1090905@comcast.net> In-Reply-To: <4BA8B2CB.1090905@comcast.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: User Questions , bseklecki@noc.cfi.pgh.pa.us Subject: Re: Fwd: Re: FreeBSD NFS client goes into infinite retry loop X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Mar 2010 13:37:58 -0000 On 03/22/10 19:53, Rick Macklem wrote: > > On Mon, 22 Mar 2010, John Baldwin wrote: > > >> It looks like it also returns ESTALE when the inode is invalid (< > >> ROOTINO ||> max inodes?) - would an unlinked file in FFS referenced at > >> a later time report an invalid inode? > >> > > I'm no ufs guy, but the only way I can think of is if the file system > on the server was newfs'd with fewer i-nodes? (Unlikely, but...) > (Basically, it is safe to return ESTALE for anything that is not > a transient failure that could recover on a retry.) > > >> But back to your point, zfs_zget() seems to be failing and returning the > >> EINVAL before zfs_fhtovp() even has a chance to set and check zp_gen. > >> I'm trying to get some more details through the use of gratuitous > >> dprintf()'s, but they don't seem to be making it to any logs or the > >> console even with vfs.zfs.debug=1 set. Any pointers on how to get these > >> dprintf() calls working? > > I know diddly (as in absolutely nothing about zfs). > > > > That I have no idea on. Maybe Rick can chime in? I'm actually not sure why > > we would want to treat a FHTOVP failure as anything but an ESTALE error in the > > NFS server to be honest. > > > As far as I know, only if the underlying file system somehow has a > situation where the file handle can't be translated at that point in time, > but could be able to later. I have no idea if any file system is like that > and I don't such a file system would be an appropriate choice for an NFS > server, even if such a beast exists. (Even then, although FreeBSD's client > assumes EIO might recover on a retry, that isn't specified in any RFC, as > far as I know.) > > That's why I proposed a patch that simply translates all VFS_FHTOVP() > errors to ESTALE in the NFS server. (It seems simpler than chasing down > cases in all the underlying file systems?) > > rick, chiming in:-) > > > Makes sense to me. I'll continue to bang on NFS with your initial patch in my lab for a while. Should I open a PR for further discussion / resolution of the issue in -CURRENT / STABLE? Thanks, Steve Polyack From owner-freebsd-fs@FreeBSD.ORG Tue Mar 23 14:16:05 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BF00B106566B; Tue, 23 Mar 2010 14:16:05 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 993DF8FC17; Tue, 23 Mar 2010 14:16:04 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA19353; Tue, 23 Mar 2010 16:16:01 +0200 (EET) (envelope-from avg@freebsd.org) Message-ID: <4BA8CD21.3000803@freebsd.org> Date: Tue, 23 Mar 2010 16:16:01 +0200 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.23 (X11/20100211) MIME-Version: 1.0 To: freebsd-fs@freebsd.org, freebsd-geom@freebsd.org X-Enigmail-Version: 0.95.7 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: Bruce Evans Subject: on st_blksize value X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Mar 2010 14:16:05 -0000 First, what I am proposing: --- a/sys/kern/vfs_vnops.c +++ b/sys/kern/vfs_vnops.c @@ -790,11 +790,11 @@ vn_stat(vp, sb, active_cred, file_cred, td) * to file" * Default to PAGE_SIZE after much discussion. * XXX: min(PAGE_SIZE, vp->v_bufobj.bo_bsize) may be more correct. */ - sb->st_blksize = PAGE_SIZE; + sb->st_blksize = max(PAGE_SIZE, vap->va_blocksize); sb->st_flags = vap->va_flags; if (priv_check(td, PRIV_VFS_GENERATION)) sb->st_gen = 0; else Explanation: 1. IMO it is not nice that we totally ignore va_blocksize value that can be set by a filesystem. This takes away flexibility. That va_blocksize value might really turn out to be optimal given the filesystem implementation. 2. As currently st_blksize is always PAGE_SIZE, it is playing safe to not use any smaller value. For some case this might not be optimal (which I personally doubt), but at least nothing should get broken. One practical benefit can be with ZFS: if a filesystem has recordsize > PAGE_SIZE (e.g. default 128K) and it has checksums or compression enabled, then (over-)writing in blocks smaller than recordsize would require reading of a whole record first. And some applications do use st_blksize as a hint (just for the record: some other use f_iosize instead, and yet some use a hardcoded value). BTW, some torrent-like applications can serve as a good example of applications that overwrite chunks of existing files. Additionally, here's a little bit of history that explains the PAGE_SIZE ("much discussion") comment in vn_stat. It seems that the comment may be misleading nowadays. It was introduced in r89784 and at that time it applied only to the case of non-VREG and non-vn_isdisk vnodes. Then, almost 3 years later, in revision 136966 code for VREG vnodes and vn_isdisk vnodes was dropped, the XXX comment was introduced, and we ended up with the current state of matters. BTW, I am not sure about the XXX comment either. Using bo_bsize may be a nice shortcut, but it would also take away some flexibility. Filesystems can already set bo_bsize and va_blocksize to the same value, but there could be special cases where they not need be the same. Thanks a lot for opinions and suggestions! P.S. Yes, I have read the following interesting thread _completely_: http://lists.freebsd.org/pipermail/freebsd-fs/2007-May/003155.html And this one too: http://freebsd.monkey.org/freebsd-fs/200810/msg00059.html Unfortunately, the discussions didn't result in any action. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue Mar 23 14:28:04 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2E3E0106566C; Tue, 23 Mar 2010 14:28:04 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id F229F8FC0A; Tue, 23 Mar 2010 14:28:03 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id A2D3B46B92; Tue, 23 Mar 2010 10:28:03 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPA id EEBEF8A026; Tue, 23 Mar 2010 10:28:02 -0400 (EDT) From: John Baldwin To: Rick Macklem Date: Tue, 23 Mar 2010 10:27:25 -0400 User-Agent: KMail/1.12.1 (FreeBSD/7.3-CBSD-20100217; KDE/4.3.1; amd64; ; ) References: <4BA3613F.4070606@comcast.net> <201003221339.37169.jhb@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201003231027.25874.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Tue, 23 Mar 2010 10:28:03 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.7 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: freebsd-fs@freebsd.org, bseklecki@noc.cfi.pgh.pa.us, User Questions Subject: Re: FreeBSD NFS client goes into infinite retry loop X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Mar 2010 14:28:04 -0000 On Monday 22 March 2010 7:53:23 pm Rick Macklem wrote: > > That I have no idea on. Maybe Rick can chime in? I'm actually not sure why > > we would want to treat a FHTOVP failure as anything but an ESTALE error in the > > NFS server to be honest. > > > As far as I know, only if the underlying file system somehow has a > situation where the file handle can't be translated at that point in time, > but could be able to later. I have no idea if any file system is like that > and I don't such a file system would be an appropriate choice for an NFS > server, even if such a beast exists. (Even then, although FreeBSD's client > assumes EIO might recover on a retry, that isn't specified in any RFC, as > far as I know.) > > That's why I proposed a patch that simply translates all VFS_FHTOVP() > errors to ESTALE in the NFS server. (It seems simpler than chasing down > cases in all the underlying file systems?) Ah, I had read that patch as being a temporary testing hack. If you think that would be a good approach in general that would be ok with me. -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Tue Mar 23 21:59:01 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1BC1E106566B for ; Tue, 23 Mar 2010 21:59:01 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx07.syd.optusnet.com.au (fallbackmx07.syd.optusnet.com.au [211.29.132.9]) by mx1.freebsd.org (Postfix) with ESMTP id 3E1318FC0C for ; Tue, 23 Mar 2010 21:58:59 +0000 (UTC) Received: from mail10.syd.optusnet.com.au (mail10.syd.optusnet.com.au [211.29.132.191]) by fallbackmx07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o2NJrQJu010125 for ; Wed, 24 Mar 2010 06:53:26 +1100 Received: from c122-106-174-6.carlnfd1.nsw.optusnet.com.au (c122-106-174-6.carlnfd1.nsw.optusnet.com.au [122.106.174.6]) by mail10.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o2NJrMbG014829 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 24 Mar 2010 06:53:23 +1100 Date: Wed, 24 Mar 2010 06:53:22 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Kirk McKusick In-Reply-To: <201003220455.o2M4txXr082327@chez.mckusick.com> Message-ID: <20100324065242.U5160@delplex.bde.org> References: <201003220455.o2M4txXr082327@chez.mckusick.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org, Ivan Voras Subject: Re: UFS files in a directory limit? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Mar 2010 21:59:01 -0000 On Sun, 21 Mar 2010, Kirk McKusick wrote: >> From: Ivan Voras >> What is the limit on the number of files in a directory on UFS? I always >> thought it is 32,767 (or near it) but now I see several directories on a >> server with more than 36,000 files (yes it's inefficient, that's not the >> point). > > The only limit on the size of a directory is the number of files that > you can have in the filesystem. There is a limit of 2^16 directories 2^15-1 > within a directory due to the limit on the number of hard links. Bruce From owner-freebsd-fs@FreeBSD.ORG Tue Mar 23 22:50:18 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0AD64106564A; Tue, 23 Mar 2010 22:50:18 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 815258FC12; Tue, 23 Mar 2010 22:50:17 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAO/iqEuDaFvG/2dsb2JhbACbJnO/VIR9BA X-IronPort-AV: E=Sophos;i="4.51,297,1267419600"; d="scan'208";a="69706999" Received: from amazon.cs.uoguelph.ca ([131.104.91.198]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 23 Mar 2010 18:50:07 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by amazon.cs.uoguelph.ca (Postfix) with ESMTP id 2F83D2101A8; Tue, 23 Mar 2010 18:50:07 -0400 (EDT) X-Virus-Scanned: amavisd-new at amazon.cs.uoguelph.ca Received: from amazon.cs.uoguelph.ca ([127.0.0.1]) by localhost (amazon.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1JHbjhQhLoy4; Tue, 23 Mar 2010 18:50:06 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by amazon.cs.uoguelph.ca (Postfix) with ESMTP id C360F210182; Tue, 23 Mar 2010 18:50:06 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id o2NN36623075; Tue, 23 Mar 2010 19:03:06 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Tue, 23 Mar 2010 19:03:06 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: John Baldwin In-Reply-To: <201003231027.25874.jhb@freebsd.org> Message-ID: References: <4BA3613F.4070606@comcast.net> <201003221339.37169.jhb@freebsd.org> <201003231027.25874.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org, bseklecki@noc.cfi.pgh.pa.us, User Questions Subject: Re: FreeBSD NFS client goes into infinite retry loop X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Mar 2010 22:50:18 -0000 On Tue, 23 Mar 2010, John Baldwin wrote: > > Ah, I had read that patch as being a temporary testing hack. If you think > that would be a good approach in general that would be ok with me. > Well, it kinda was. I wasn't betting on it fixing the problem, but since it does... I think just mapping VFS_FHTOVP() errors to ESTALE is ok. Do you think I should ask pjd@ about it or just go ahead with a commit? Thanks for the help, rick From owner-freebsd-fs@FreeBSD.ORG Tue Mar 23 23:10:04 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E46C61065670 for ; Tue, 23 Mar 2010 23:10:04 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id D43328FC12 for ; Tue, 23 Mar 2010 23:10:04 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o2NNA4lh015306 for ; Tue, 23 Mar 2010 23:10:04 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o2NNA4XN015305; Tue, 23 Mar 2010 23:10:04 GMT (envelope-from gnats) Date: Tue, 23 Mar 2010 23:10:04 GMT Message-Id: <201003232310.o2NNA4XN015305@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: dfilter@FreeBSD.ORG (dfilter service) Cc: Subject: Re: kern/144330: commit references a PR X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: dfilter service List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Mar 2010 23:10:05 -0000 The following reply was made to PR kern/144330; it has been noted by GNATS. From: dfilter@FreeBSD.ORG (dfilter service) To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/144330: commit references a PR Date: Tue, 23 Mar 2010 23:03:52 +0000 (UTC) Author: rmacklem Date: Tue Mar 23 23:03:30 2010 New Revision: 205562 URL: http://svn.freebsd.org/changeset/base/205562 Log: When the regular NFS server replied to a UDP client out of the replay cache, it did not free the request argument mbuf list, resulting in a leak. This patch fixes that leak. Tested by: danny AT cs.huji.ac.il PR: kern/144330 Submitted by: to.my.trociny AT gmail.com (earlier version) Reviewed by: dfr MFC after: 2 weeks Modified: head/sys/rpc/svc.c Modified: head/sys/rpc/svc.c ============================================================================== --- head/sys/rpc/svc.c Tue Mar 23 23:00:35 2010 (r205561) +++ head/sys/rpc/svc.c Tue Mar 23 23:03:30 2010 (r205562) @@ -819,9 +819,11 @@ svc_getreq(SVCXPRT *xprt, struct svc_req free(r->rq_addr, M_SONAME); r->rq_addr = NULL; } + m_freem(args); goto call_done; default: + m_freem(args); goto call_done; } } _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Wed Mar 24 00:09:54 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7CAA71065673 for ; Wed, 24 Mar 2010 00:09:54 +0000 (UTC) (envelope-from als@modulus.org) Received: from email.octopus.com.au (email.octopus.com.au [122.100.2.232]) by mx1.freebsd.org (Postfix) with ESMTP id 3FA928FC13 for ; Wed, 24 Mar 2010 00:09:53 +0000 (UTC) Received: by email.octopus.com.au (Postfix, from userid 1002) id 071D85CB91F; Wed, 24 Mar 2010 10:41:10 +1100 (EST) X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on email.octopus.com.au X-Spam-Level: **** X-Spam-Status: No, score=4.4 required=10.0 tests=ALL_TRUSTED, DNS_FROM_OPENWHOIS,FH_DATE_PAST_20XX autolearn=no version=3.2.3 Received: from [10.1.50.144] (ppp121-45-173-157.lns20.syd6.internode.on.net [121.45.173.157]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: admin@email.octopus.com.au) by email.octopus.com.au (Postfix) with ESMTP id DA0A05CB8E7; Wed, 24 Mar 2010 10:41:05 +1100 (EST) Message-ID: <4BA954A6.9030505@modulus.org> Date: Wed, 24 Mar 2010 10:54:14 +1100 From: Andrew Snow User-Agent: Thunderbird 2.0.0.23 (X11/20090817) MIME-Version: 1.0 To: Andriy Gapon References: <4BA8CD21.3000803@freebsd.org> In-Reply-To: <4BA8CD21.3000803@freebsd.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: on st_blksize value X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Mar 2010 00:09:54 -0000 Andriy Gapon wrote: > One practical benefit can be with ZFS: if a filesystem has recordsize > PAGE_SIZE > (e.g. default 128K) and it has checksums or compression enabled, then > (over-)writing in blocks smaller than recordsize would require reading of a whole > record first. Not strictly true: in ZFS the recordsize setting is for the maximum size of a record, it can still write smaller than this. If you overwrite 1K in the middle of a 128K record then it should just be writing a 1K block. Each block has its own checksum attached to it so there's no need to recalculate checksums for data that isn't changing. - Andrew From owner-freebsd-fs@FreeBSD.ORG Wed Mar 24 06:10:11 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9FB971065670 for ; Wed, 24 Mar 2010 06:10:11 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id E3B508FC17 for ; Wed, 24 Mar 2010 06:10:10 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id IAA05876; Wed, 24 Mar 2010 08:10:04 +0200 (EET) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1NuJn1-000LPC-Pu; Wed, 24 Mar 2010 08:10:03 +0200 Message-ID: <4BA9ACBA.4080608@freebsd.org> Date: Wed, 24 Mar 2010 08:10:02 +0200 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.24 (X11/20100321) MIME-Version: 1.0 To: Andrew Snow References: <4BA8CD21.3000803@freebsd.org> <4BA954A6.9030505@modulus.org> In-Reply-To: <4BA954A6.9030505@modulus.org> X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: on st_blksize value X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Mar 2010 06:10:11 -0000 on 24/03/2010 01:54 Andrew Snow said the following: > Andriy Gapon wrote: > >> One practical benefit can be with ZFS: if a filesystem has recordsize >> > PAGE_SIZE >> (e.g. default 128K) and it has checksums or compression enabled, then >> (over-)writing in blocks smaller than recordsize would require reading >> of a whole >> record first. > > Not strictly true: in ZFS the recordsize setting is for the maximum size > of a record, it can still write smaller than this. If you overwrite 1K > in the middle of a 128K record then it should just be writing a 1K > block. Each block has its own checksum attached to it so there's no > need to recalculate checksums for data that isn't changing. I must admit that know almost zero about ZFS internals, but I see a logical problem in your explanation - if the original data was written as a single 128K block, and if changing a 1K range within it would result in a new 1K block, then the original data is still affected as it needs to account that the range is now stored in a different block. Perhaps, I am just misunderstanding what you said. But you perhaps you were referring to the case of (over)writing a small _file_ as opposed to the case of overwriting a small range within a large file? -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Wed Mar 24 06:30:28 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CE76B106566B; Wed, 24 Mar 2010 06:30:28 +0000 (UTC) (envelope-from peterjeremy@acm.org) Received: from mail36.syd.optusnet.com.au (mail36.syd.optusnet.com.au [211.29.133.76]) by mx1.freebsd.org (Postfix) with ESMTP id 5DBC98FC0C; Wed, 24 Mar 2010 06:30:27 +0000 (UTC) Received: from server.vk2pj.dyndns.org (c122-106-253-149.belrs3.nsw.optusnet.com.au [122.106.253.149]) by mail36.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o2O6TvZj020889 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 24 Mar 2010 17:30:23 +1100 X-Bogosity: Ham, spamicity=0.000000 Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1]) by server.vk2pj.dyndns.org (8.14.3/8.14.3) with ESMTP id o2O6TvL6089064; Wed, 24 Mar 2010 17:29:57 +1100 (EST) (envelope-from peter@server.vk2pj.dyndns.org) Received: (from peter@localhost) by server.vk2pj.dyndns.org (8.14.3/8.14.3/Submit) id o2O6TuX0089063; Wed, 24 Mar 2010 17:29:56 +1100 (EST) (envelope-from peter) Date: Wed, 24 Mar 2010 17:29:56 +1100 From: Peter Jeremy To: Bruce Evans Message-ID: <20100324062956.GA88991@server.vk2pj.dyndns.org> References: <201003220455.o2M4txXr082327@chez.mckusick.com> <20100324065242.U5160@delplex.bde.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="3MwIy2ne0vdjdPXF" Content-Disposition: inline In-Reply-To: <20100324065242.U5160@delplex.bde.org> X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc User-Agent: Mutt/1.5.20 (2009-06-14) Cc: Kirk McKusick , freebsd-fs@freebsd.org, Ivan Voras Subject: Re: UFS files in a directory limit? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Mar 2010 06:30:28 -0000 --3MwIy2ne0vdjdPXF Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2010-Mar-24 06:53:22 +1100, Bruce Evans wrote: >On Sun, 21 Mar 2010, Kirk McKusick wrote: >> The only limit on the size of a directory is the number of files that >> you can have in the filesystem. There is a limit of 2^16 directories > 2^15-1 >> within a directory due to the limit on the number of hard links. If we're going to be pedantic, it's 2^15-3 sub-directories: There's a maximum of 2^15-1 hardlinks. One is used for the directory's name in its parent directory, one is used for the '.' entry in the directory itself and one-per-sub-directory is needed for the '..' entry in each sub-directory. (And a single file could theoretically have up to 2^15-1 hardlinks within a single directory - though this would not be particularly sensible). --=20 Peter Jeremy --3MwIy2ne0vdjdPXF Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAkupsWQACgkQ/opHv/APuIeeYQCcC6RUp8Ag4S9UKSon5E8IrUCQ tCMAn36NdQOJOz2ufS+no0/8h7eZ2rJP =rJTf -----END PGP SIGNATURE----- --3MwIy2ne0vdjdPXF-- From owner-freebsd-fs@FreeBSD.ORG Wed Mar 24 06:38:43 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7E769106564A for ; Wed, 24 Mar 2010 06:38:43 +0000 (UTC) (envelope-from kkockro@web.de) Received: from mail.myphotobook.de (mail.myphotobook.de [85.237.87.140]) by mx1.freebsd.org (Postfix) with ESMTP id 061BB8FC1A for ; Wed, 24 Mar 2010 06:38:42 +0000 (UTC) Received: (qmail 74319 invoked by uid 89); 24 Mar 2010 06:38:40 -0000 Received: from unknown (HELO ) (k.kockro@myphotobook.de@87.234.224.68) by mail.myphotobook.de with AES256-SHA encrypted SMTP; 24 Mar 2010 06:38:40 -0000 From: Kai Kockro To: Rick Macklem Date: Wed, 24 Mar 2010 07:39:04 +0100 User-Agent: KMail/1.12.4 (FreeBSD/8.0-STABLE; KDE/4.3.5; amd64; ; ) References: <201003171120.o2HBK3CV082081@freefall.freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Message-Id: <201003240739.04994.kkockro@web.de> Cc: bug-followup@freebsd.org, freebsd-fs@freebsd.org, Daniel Braniss Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Mar 2010 06:38:43 -0000 Hi, after 3 days with the first patch ( FreeBSD 8-STABLE AMD64, old nfsd ): netstat -m 5732/10528/16260 mbufs in use (current/cache/total) 4825/10131/14956/262144 mbuf clusters in use (current/cache/total/max) It looks very fine. I'll patch with the final corrections and then test aga= in. But why i have the same issues? I dont use UDP connects, only TCP?! Greetings, Kai Am Dienstag, 23. M=E4rz 2010 14:34:39 schrieb Rick Macklem: > On Tue, 23 Mar 2010, Daniel Braniss wrote: > >> I only have a FreeBSD client at this point, and it doesn't cause the > >> leak for nfsv3,udp for me here. > > > > my client is also FreeBSD 8.0, strange >=20 > I was already using the patch below when I tested and couldn't see it, > so I guess it now appears that the patch works. >=20 > >> Doug Rabson pointed out that there would be a leak for the "default:" > >> case too, although didn't know if that would occur in practice. > > > > it does! :-) > > > >> So, maybe you could test this variant of the patch (just in case that > >> was the slow leak...): > >> --- rpc/svc.c.sav 2010-03-21 18:46:20.000000000 -0400 > >> +++ rpc/svc.c 2010-03-22 19:00:17.000000000 -0400 > >> @@ -819,9 +819,11 @@ > >> free(r->rq_addr, M_SONAME); > >> r->rq_addr =3D NULL; > >> } > >> + m_freem(args); > >> goto call_done; > >> > >> default: > >> + m_freem(args); > >> goto call_done; > >> } > >> } > > > > that plugged it! > > see > > ftp://ftp.cs.huji.ac.il/users/danny/freebsd/mbuf-leak/store-02+++.ps >=20 > Good work with the testing. I'll get it committed and put it up on the > nfs patches page I have under http://people.freebsd.org/~rmacklem >=20 > > thanks to you for taking time off of your retirement :-) >=20 > I plan on doing quite a bit of FreeBSD/NFS stuff during it, rick >=20 From owner-freebsd-fs@FreeBSD.ORG Wed Mar 24 06:40:07 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 92D96106564A for ; Wed, 24 Mar 2010 06:40:07 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 67CFE8FC13 for ; Wed, 24 Mar 2010 06:40:07 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o2O6e7tg007547 for ; Wed, 24 Mar 2010 06:40:07 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o2O6e7YG007546; Wed, 24 Mar 2010 06:40:07 GMT (envelope-from gnats) Date: Wed, 24 Mar 2010 06:40:07 GMT Message-Id: <201003240640.o2O6e7YG007546@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Kai Kockro Cc: Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Kai Kockro List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Mar 2010 06:40:07 -0000 The following reply was made to PR kern/144330; it has been noted by GNATS. From: Kai Kockro To: Rick Macklem Cc: Daniel Braniss , Mikolaj Golub , Jeremy Chadwick , freebsd-fs@freebsd.org, bug-followup@freebsd.org, gerrit@pmp.uni-hannover.de Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs Date: Wed, 24 Mar 2010 07:39:04 +0100 Hi, after 3 days with the first patch ( FreeBSD 8-STABLE AMD64, old nfsd ): netstat -m 5732/10528/16260 mbufs in use (current/cache/total) 4825/10131/14956/262144 mbuf clusters in use (current/cache/total/max) It looks very fine. I'll patch with the final corrections and then test aga= in. But why i have the same issues? I dont use UDP connects, only TCP?! Greetings, Kai Am Dienstag, 23. M=E4rz 2010 14:34:39 schrieb Rick Macklem: > On Tue, 23 Mar 2010, Daniel Braniss wrote: > >> I only have a FreeBSD client at this point, and it doesn't cause the > >> leak for nfsv3,udp for me here. > > > > my client is also FreeBSD 8.0, strange >=20 > I was already using the patch below when I tested and couldn't see it, > so I guess it now appears that the patch works. >=20 > >> Doug Rabson pointed out that there would be a leak for the "default:" > >> case too, although didn't know if that would occur in practice. > > > > it does! :-) > > > >> So, maybe you could test this variant of the patch (just in case that > >> was the slow leak...): > >> --- rpc/svc.c.sav 2010-03-21 18:46:20.000000000 -0400 > >> +++ rpc/svc.c 2010-03-22 19:00:17.000000000 -0400 > >> @@ -819,9 +819,11 @@ > >> free(r->rq_addr, M_SONAME); > >> r->rq_addr =3D NULL; > >> } > >> + m_freem(args); > >> goto call_done; > >> > >> default: > >> + m_freem(args); > >> goto call_done; > >> } > >> } > > > > that plugged it! > > see > > ftp://ftp.cs.huji.ac.il/users/danny/freebsd/mbuf-leak/store-02+++.ps >=20 > Good work with the testing. I'll get it committed and put it up on the > nfs patches page I have under http://people.freebsd.org/~rmacklem >=20 > > thanks to you for taking time off of your retirement :-) >=20 > I plan on doing quite a bit of FreeBSD/NFS stuff during it, rick >=20 From owner-freebsd-fs@FreeBSD.ORG Wed Mar 24 14:15:46 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8D7D4106566C; Wed, 24 Mar 2010 14:15:46 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 5CCD68FC33; Wed, 24 Mar 2010 14:15:46 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 094FB46B8A; Wed, 24 Mar 2010 10:15:46 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPA id 6E4D18A029; Wed, 24 Mar 2010 10:15:44 -0400 (EDT) From: John Baldwin To: Rick Macklem Date: Wed, 24 Mar 2010 10:06:23 -0400 User-Agent: KMail/1.12.1 (FreeBSD/7.3-CBSD-20100217; KDE/4.3.1; amd64; ; ) References: <4BA3613F.4070606@comcast.net> <201003231027.25874.jhb@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201003241006.23347.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Wed, 24 Mar 2010 10:15:45 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-1.7 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: freebsd-fs@freebsd.org, bseklecki@noc.cfi.pgh.pa.us, User Questions Subject: Re: FreeBSD NFS client goes into infinite retry loop X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Mar 2010 14:15:46 -0000 On Tuesday 23 March 2010 7:03:06 pm Rick Macklem wrote: > > On Tue, 23 Mar 2010, John Baldwin wrote: > > > > > Ah, I had read that patch as being a temporary testing hack. If you think > > that would be a good approach in general that would be ok with me. > > > Well, it kinda was. I wasn't betting on it fixing the problem, but since > it does... > > I think just mapping VFS_FHTOVP() errors to ESTALE is ok. Do you think > I should ask pjd@ about it or just go ahead with a commit? Go ahead and fix it I think. -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Wed Mar 24 15:55:49 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2617F1065670; Wed, 24 Mar 2010 15:55:49 +0000 (UTC) (envelope-from js@saltmine.radix.net) Received: from saltmine.radix.net (saltmine.radix.net [207.192.128.40]) by mx1.freebsd.org (Postfix) with ESMTP id C25498FC17; Wed, 24 Mar 2010 15:55:46 +0000 (UTC) Received: (from root@localhost) by saltmine.radix.net (8.12.2/8.12.2/Submit) id o2MIAtQE020744; Mon, 22 Mar 2010 14:10:55 -0400 (EDT) Received: from mail1.radix.net (mail1.radix.net [207.192.128.31]) by saltmine.radix.net (8.12.2/8.12.2) with ESMTP id o2MGiZV4007619 for ; Mon, 22 Mar 2010 12:44:35 -0400 (EDT) Received: from mx2.freebsd.org (mx2.freebsd.org [69.147.83.53]) by mail1.radix.net (8.13.4/8.13.4) with ESMTP id o2MGiZTE002283 for ; Mon, 22 Mar 2010 12:44:35 -0400 (EDT) Received: from hub.freebsd.org (hub.freebsd.org [IPv6:2001:4f8:fff6::36]) by mx2.freebsd.org (Postfix) with ESMTP id 47CB5152321; Mon, 22 Mar 2010 16:44:21 +0000 (UTC) Received: from hub.freebsd.org (localhost [127.0.0.1]) by hub.freebsd.org (Postfix) with ESMTP id DA97910656AC; Mon, 22 Mar 2010 16:44:20 +0000 (UTC) (envelope-from owner-freebsd-questions@freebsd.org) Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 243371065675 for ; Mon, 22 Mar 2010 16:44:06 +0000 (UTC) (envelope-from korvus@comcast.net) Received: from mx04.pub.collaborativefusion.com (mx04.pub.collaborativefusion.com [206.210.72.84]) by mx1.freebsd.org (Postfix) with ESMTP id DE7598FC15 for ; Mon, 22 Mar 2010 16:44:05 +0000 (UTC) Received: from [192.168.2.164] ([206.210.89.202]) by mx04.pub.collaborativefusion.com (StrongMail Enterprise 4.1.1.4(4.1.1.4-47689)); Mon, 22 Mar 2010 12:59:04 -0400 X-VirtualServerGroup: Default X-MailingID: 00000::00000::00000::00000::::1403 X-SMHeaderMap: mid="X-MailingID" X-Destination-ID: freebsd-questions@freebsd.org X-SMFBL: ZnJlZWJzZC1xdWVzdGlvbnNAZnJlZWJzZC5vcmc= Message-ID: <4BA79E54.5030504@comcast.net> Date: Mon, 22 Mar 2010 12:44:04 -0400 From: Steve Polyack User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.7) Gecko/20100311 Thunderbird/3.0.1 MIME-Version: 1.0 To: John Baldwin References: <4BA3613F.4070606@comcast.net> <4BA78444.4040707@comcast.net> <4BA7911F.5060905@comcast.net> <201003221200.41607.jhb@freebsd.org> In-Reply-To: <201003221200.41607.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Sender: owner-freebsd-questions@freebsd.org Errors-To: owner-freebsd-questions@freebsd.org Status: O X-Status: X-Keywords: X-UID: 229 Cc: freebsd-fs@freebsd.org, User Questions , bseklecki@noc.cfi.pgh.pa.us Subject: Re: FreeBSD NFS client goes into infinite retry loop X-BeenThere: freebsd-fs@freebsd.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Mar 2010 15:55:49 -0000 On 03/22/10 12:00, John Baldwin wrote: > On Monday 22 March 2010 11:47:43 am Steve Polyack wrote: > >> On 03/22/10 10:52, Steve Polyack wrote: >> >>> On 3/19/2010 11:27 PM, Rick Macklem wrote: >>> >>>> On Fri, 19 Mar 2010, Steve Polyack wrote: >>>> >>>> [good stuff snipped] >>>> >>>>> This makes sense. According to wireshark, the server is indeed >>>>> transmitting "Status: NFS3ERR_IO (5)". Perhaps this should be STALE >>>>> instead; it sounds more correct than marking it a general IO error. >>>>> Also, the NFS server is serving its share off of a ZFS filesystem, >>>>> if it makes any difference. I suppose ZFS could be talking to the >>>>> NFS server threads with some mismatched language, but I doubt it. >>>>> >>>>> >>>> Ok, now I think we're making progress. If VFS_FHTOVP() doesn't return >>>> ESTALE when the file no longer exists, the NFS server returns whatever >>>> error it has returned. >>>> >>>> So, either VFS_FHTOVP() succeeds after the file has been deleted, which >>>> would be a problem that needs to be fixed within ZFS >>>> OR >>>> ZFS returns an error other than ESTALE when it doesn't exist. >>>> >>>> Try the following patch on the server (which just makes any error >>>> returned by VFS_FHTOVP() into ESTALE) and see if that helps. >>>> >>>> --- nfsserver/nfs_srvsubs.c.sav 2010-03-19 22:06:43.000000000 -0400 >>>> +++ nfsserver/nfs_srvsubs.c 2010-03-19 22:07:22.000000000 -0400 >>>> @@ -1127,6 +1127,8 @@ >>>> } >>>> } >>>> error = VFS_FHTOVP(mp,&fhp->fh_fid, vpp); >>>> + if (error != 0) >>>> + error = ESTALE; >>>> vfs_unbusy(mp); >>>> if (error) >>>> goto out; >>>> >>>> Please let me know if the patch helps, rick >>>> >>>> >>>> >>> The patch seems to fix the bad behavior. Running with the patch, I >>> see the following output from my patch (return code of nfs_doio from >>> within nfsiod): >>> nfssvc_iod: iod 0 nfs_doio returned errno: 70 >>> >>> Furthermore, when inspecting the transaction with Wireshark, after >>> deleting the file on the NFS server it looks like there is only a >>> single error. This time there it is a reply to a V3 Lookup call that >>> contains a status of "NFS3ERR_NOENT (2)" coming from the NFS server. >>> The client also does not repeatedly try to complete the failed request. >>> >>> Any suggestions on the next step here? Based on what you said it >>> looks like ZFS is falsely reporting an IO error to VFS instead of >>> ESTALE / NOENT. I tried looking around zfs_fhtovp() and only saw >>> returns of EINVAL, but I'm not even sure I'm looking in the right place. >>> >> Further on down the rabbit hole... here's the piece in zfs_fhtovp() >> where it's kicking out EINVAL instead of ESTALE - the following patch >> corrects the behavior, but of course also suggests further digging >> within the zfs_zget() function to ensure that _it_ is returning the >> correct thing and whether or not it needs to be handled there or within >> zfs_fhtovp(). >> >> --- >> src-orig/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c >> 2010-03-22 11:41:21.000000000 -0400 >> +++ src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c >> 2010-03-22 16:25:21.000000000 -0400 >> @@ -1246,7 +1246,7 @@ >> dprintf("getting %llu [%u mask %llx]\n", object, fid_gen, gen_mask); >> if (err = zfs_zget(zfsvfs, object,&zp)) { >> ZFS_EXIT(zfsvfs); >> - return (err); >> + return (ESTALE); >> } >> zp_gen = zp->z_phys->zp_gen& gen_mask; >> if (zp_gen == 0) >> > So the odd thing here is that ffs_fhtovp() doesn't return ESTALE if VFS_VGET() > (which calls ffs_vget()) fails, it only returns ESTALE if the generation count > doesn't matter. > > It looks like it also returns ESTALE when the inode is invalid (< ROOTINO || > max inodes?) - would an unlinked file in FFS referenced at a later time report an invalid inode? But back to your point, zfs_zget() seems to be failing and returning the EINVAL before zfs_fhtovp() even has a chance to set and check zp_gen. I'm trying to get some more details through the use of gratuitous dprintf()'s, but they don't seem to be making it to any logs or the console even with vfs.zfs.debug=1 set. Any pointers on how to get these dprintf() calls working? Thanks again. _______________________________________________ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Wed Mar 24 16:18:54 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 22CF2106564A; Wed, 24 Mar 2010 16:18:54 +0000 (UTC) (envelope-from dan.naumov@gmail.com) Received: from mail-bw0-f216.google.com (mail-bw0-f216.google.com [209.85.218.216]) by mx1.freebsd.org (Postfix) with ESMTP id 7ED428FC08; Wed, 24 Mar 2010 16:18:53 +0000 (UTC) Received: by mail-bw0-f216.google.com with SMTP id 8so1281969bwz.3 for ; Wed, 24 Mar 2010 09:18:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:date:message-id:subject :from:to:content-type; bh=9WvB4PoqQfeDySwmu/Qx5SY+cwfQ4ZQUlX+qo91boXw=; b=t+Fn5mNa+uyo9kEJ2zKWlcdTZONRCWhENN83VgF8rr7gfLBX7G5mySXU+PU0w3+iPF hWpDNtlIyskVTpWMIAYCxSbRXgKjCNI/KA+OriuC+tY0L1s4ep8zvm02ruj2sKOb+tVT hvMaFH2C+bN1Ns9owH2gE7nUsZ8+ZiQscsdO8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=cdxgD/dKonmgwFRvm11dBZ2Jo5zHK7O2ocoI3uOHMmPNYGSBxriIwlBY8ESsoFjw1M ULgd3i+a4V4I//k7yfVnbJFhIIblP5DLLD3ii3N9s84qId5MyKKFjJ21OG8tgxVtqLXl ZcBFulu6LGmGpppnn3OmVJY+2XM9ey1CIGXEM= MIME-Version: 1.0 Received: by 10.204.134.70 with SMTP id i6mr7528622bkt.74.1269447532762; Wed, 24 Mar 2010 09:18:52 -0700 (PDT) Date: Wed, 24 Mar 2010 18:18:52 +0200 Message-ID: From: Dan Naumov To: freebsd-fs@freebsd.org, freebsd-questions@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Cc: Subject: tuning vfs.zfs.vdev.max_pending and solving the issue of ZFS writes choking read IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Mar 2010 16:18:54 -0000 Hello I am having a slight issue (and judging by Google results, similar issues have been seen by other FreeBSD and Solaris/OpenSolaris users) with writes choking the read IO. The issue I am having is described pretty well here: http://opensolaris.org/jive/thread.jspa?threadID=106453 It seems that under heavy write load, ZFS likes to aggregate a really huge amount of data before actually writing it to disks, resulting in sudden 10+ second stalls where it frantically tries to commit everything, completely choking read IO in the process and sometimes even the network (with a large enough write to a mirror pool using DD, I can cause my SSH sessions to drop dead, without actually running out of RAM. As soon as the data is committed, I can reconnect back). Beyond the issue of system interactivity (or rather, the near-disappearance thereof) during these enormous flushes, this kind of pattern seems really ineffective from the CPU utilization point of view. Instead of a relatively stable and consistent flow of reads and writes, allowing the CPU to be utilized as much as possible, when the system is committing the data the CPU basically stays IDLE for 10+ seconds (or as long as the flush takes) and the process of committing unwritten data to the pool seemingly completely trounces the priority of any read operations. Has anyone done any extensive testing of the effects of tuning vfs.zfs.vdev.max_pending on this issue? Is there some universally recommended value beyond the default 35? Anything else I should be looking at? - Sincerely, Dan Naumov From owner-freebsd-fs@FreeBSD.ORG Wed Mar 24 16:27:23 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 52DAD106566C for ; Wed, 24 Mar 2010 16:27:23 +0000 (UTC) (envelope-from spate@mac.com) Received: from asmtpout029.mac.com (asmtpout029.mac.com [17.148.16.104]) by mx1.freebsd.org (Postfix) with ESMTP id 408A78FC1A for ; Wed, 24 Mar 2010 16:27:23 +0000 (UTC) MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; charset=us-ascii Received: from [10.0.1.4] ([75.44.206.54]) by asmtp029.mac.com (Sun Java(tm) System Messaging Server 6.3-8.01 (built Dec 16 2008; 32bit)) with ESMTPA id <0KZS00GIHLL1YX10@asmtp029.mac.com> for freebsd-fs@freebsd.org; Wed, 24 Mar 2010 08:27:22 -0700 (PDT) X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 ipscore=0 phishscore=0 bulkscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx engine=5.0.0-0908210000 definitions=main-1003240123 From: Steve Pate Date: Wed, 24 Mar 2010 08:27:01 -0700 Message-id: To: freebsd-fs@freebsd.org X-Mailer: Apple Mail (2.1077) Subject: ZFS and Deduplication? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Mar 2010 16:27:23 -0000 I work in a stealth mode startup and we'd be interested in porting the latest version of ZFS (incl dedup) over to FreeBSD. We have some very experienced filesystem engineers who can work on this. Does anyone have any idea if someone in the community has started this work? Regards, Steve From owner-freebsd-fs@FreeBSD.ORG Wed Mar 24 16:30:43 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3C36C1065674; Wed, 24 Mar 2010 16:30:43 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 8D2928FC19; Wed, 24 Mar 2010 16:30:41 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id SAA16693; Wed, 24 Mar 2010 18:30:30 +0200 (EET) (envelope-from avg@icyb.net.ua) Message-ID: <4BAA3E25.9040108@icyb.net.ua> Date: Wed, 24 Mar 2010 18:30:29 +0200 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.23 (X11/20100211) MIME-Version: 1.0 To: Steve Pate References: In-Reply-To: X-Enigmail-Version: 0.95.7 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek Subject: Re: ZFS and Deduplication? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Mar 2010 16:30:43 -0000 on 24/03/2010 17:27 Steve Pate said the following: > I work in a stealth mode startup and we'd be interested in porting the latest version of ZFS (incl dedup) over to FreeBSD. We have some very experienced filesystem engineers who can work on this. > > Does anyone have any idea if someone in the community has started this work? I guess that pjd is doing that, CC-ed. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Wed Mar 24 17:02:44 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 25794106566B for ; Wed, 24 Mar 2010 17:02:44 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id E03228FC1D for ; Wed, 24 Mar 2010 17:02:43 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.13.8+Sun/8.13.8) with ESMTP id o2OH2fLq013072; Wed, 24 Mar 2010 12:02:42 -0500 (CDT) Date: Wed, 24 Mar 2010 12:02:41 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Andrew Snow In-Reply-To: <4BA954A6.9030505@modulus.org> Message-ID: References: <4BA8CD21.3000803@freebsd.org> <4BA954A6.9030505@modulus.org> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Wed, 24 Mar 2010 12:02:42 -0500 (CDT) Cc: freebsd-fs@freebsd.org, Andriy Gapon Subject: Re: on st_blksize value X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Mar 2010 17:02:44 -0000 On Wed, 24 Mar 2010, Andrew Snow wrote: > Not strictly true: in ZFS the recordsize setting is for the maximum size of a > record, it can still write smaller than this. If you overwrite 1K in the > middle of a 128K record then it should just be writing a 1K block. Each > block has its own checksum attached to it so there's no need to recalculate > checksums for data that isn't changing. This is not true. In fact, simple testing will show that it is clearly not true. ZFS will always write recordsize blocks except that the tail block is allowed to be smaller. If compression is enabled, the block is stored in its compressed size, so the amount actually stored on disk may be less than the established recordsize. Due to ZFS's read-modify-write strategy, it is important to performance that the data to be modified be cached in the ARC. There will still be write amplification if the update size is smaller than the recordsize. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Wed Mar 24 17:23:50 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4C7B7106566B; Wed, 24 Mar 2010 17:23:50 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id 10E6B8FC15; Wed, 24 Mar 2010 17:23:49 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.13.8+Sun/8.13.8) with ESMTP id o2OHNn9v013305; Wed, 24 Mar 2010 12:23:49 -0500 (CDT) Date: Wed, 24 Mar 2010 12:23:49 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Dan Naumov In-Reply-To: Message-ID: References: User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Wed, 24 Mar 2010 12:23:49 -0500 (CDT) Cc: freebsd-fs@freebsd.org, freebsd-questions@freebsd.org Subject: Re: tuning vfs.zfs.vdev.max_pending and solving the issue of ZFS writes choking read IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Mar 2010 17:23:50 -0000 On Wed, 24 Mar 2010, Dan Naumov wrote: > Has anyone done any extensive testing of the effects of tuning > vfs.zfs.vdev.max_pending on this issue? Is there some universally > recommended value beyond the default 35? Anything else I should be > looking at? The vdev.max_pending value is primarily used to tune for SAN/HW-RAID LUNs and is used to dial down LUN service time (svc_t) values by limiting the number of pending requests. It is not terribly useful for decreasing stalls due to zfs writes. In order to reduce the impact of zfs writes, you want to limit the maximum size of a zfs transaction group (TXG). I don't know what the FreeBSD tunable is for this, but under Solaris it is zfs:zfs_write_limit_override. On a large-memory system, a properly working zfs should not saturate the write channel for more than 5 seconds. Zfs tries to learn the write bandwidth so that it can tune the TXG size up to 5 seconds (max) worth of writes. If you have both large memory and fast storage, quite a huge amount of data can be written in 5 seconds. On my Solaris system, I found that zfs was quite accurate with its rate estimation, but it resulted in four gigabytes of data being written per TXG. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Wed Mar 24 17:24:06 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B0ED11065679; Wed, 24 Mar 2010 17:24:06 +0000 (UTC) (envelope-from spate@mac.com) Received: from asmtpout024.mac.com (asmtpout024.mac.com [17.148.16.99]) by mx1.freebsd.org (Postfix) with ESMTP id 9BDA18FC1E; Wed, 24 Mar 2010 17:24:06 +0000 (UTC) MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; charset=us-ascii Received: from [10.0.1.4] ([75.44.206.54]) by asmtp024.mac.com (Sun Java(tm) System Messaging Server 6.3-8.01 (built Dec 16 2008; 32bit)) with ESMTPA id <0KZS00DYXR034D00@asmtp024.mac.com>; Wed, 24 Mar 2010 10:24:06 -0700 (PDT) X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 ipscore=0 phishscore=0 bulkscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx engine=5.0.0-0908210000 definitions=main-1003240153 From: Steve Pate In-reply-to: <4BAA3E25.9040108@icyb.net.ua> Date: Wed, 24 Mar 2010 10:24:03 -0700 Message-id: <2FF0A3E3-AF2F-4674-8648-FC01EC87445E@mac.com> References: <4BAA3E25.9040108@icyb.net.ua> To: Andriy Gapon X-Mailer: Apple Mail (2.1077) Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek Subject: Re: ZFS and Deduplication? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Mar 2010 17:24:06 -0000 I'm really not familiar with the FreeBSD development process so I apologize if I'm asking questions which appear naive. We'd like to ship a FreeBSD-based product by early next year that contains ZFS with de-dup. From what I see, FreeBSD seems to be tracking Solaris in terms of taking stable versions of ZFS. In other words, you're taking the version of ZFS that will next move from Open Solaris to Solaris. Is that correct? >From what I understand FreeBSD 8-STABLE includes ZFS version 14, which brings it to parity with Solaris 10. ZFS version 21 contains de-duplication so is quite a long way ahead of where FreeBSD is today. This implies that it will take quite some time for ZFS + de-dupe to get into a stable version of FreeBSD. Correct? Now, if we were to port ZFS version 22 (or later) to FreeBSD 8.x (version number TBD), how could we work with the FreeBSD team to get ZFS version 21+ back into a subsequent FreeBSD release? Cheers, Steve > on 24/03/2010 17:27 Steve Pate said the following: >> I work in a stealth mode startup and we'd be interested in porting the latest version of ZFS (incl dedup) over to FreeBSD. We have some very experienced filesystem engineers who can work on this. >> >> Does anyone have any idea if someone in the community has started this work? > > I guess that pjd is doing that, CC-ed. > > -- > Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Wed Mar 24 18:07:58 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A06F2106564A; Wed, 24 Mar 2010 18:07:58 +0000 (UTC) (envelope-from delphij@gmail.com) Received: from mail-px0-f200.google.com (mail-px0-f200.google.com [209.85.216.200]) by mx1.freebsd.org (Postfix) with ESMTP id 6F76F8FC12; Wed, 24 Mar 2010 18:07:58 +0000 (UTC) Received: by pxi38 with SMTP id 38so4631608pxi.27 for ; Wed, 24 Mar 2010 11:07:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=evEPG31T6Tiqd3UM1fCe5RgJXJUIX6P8v0uSeAK8/d0=; b=OJEm8Rwhx0cXS+2RSvXZVics1uborksN9UnXZLDGBOk+HIREErAdPNNVtXpQHBAyH9 k7rmPisA5JOZsaF7Ogl9oRIG2VLeQLLkIInMDz/2iTgYk3GL6rFCF/M6KcJdqfMYCz6X t4yNP1yZd8oXdiMnO0pepwhU47QSyxqn6yfPc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=Y3LPwJP10s1r1xDLoNFmm5WBTpo/ryu/7FkSJVXlrNtgHsz0a5nEPJqcpiTb2IT3G+ zsYyWFdf3R+PBJWon/bKEHdFSUcKfvb8KR4X5tW029RX2irHy4sqD22AQdR7CdDt1Eg8 1qc3PjEugajfFYr3RgHhh8vLG65bqLtKE5jPs= MIME-Version: 1.0 Received: by 10.142.248.11 with SMTP id v11mr240976wfh.22.1269454077681; Wed, 24 Mar 2010 11:07:57 -0700 (PDT) In-Reply-To: <2FF0A3E3-AF2F-4674-8648-FC01EC87445E@mac.com> References: <4BAA3E25.9040108@icyb.net.ua> <2FF0A3E3-AF2F-4674-8648-FC01EC87445E@mac.com> Date: Wed, 24 Mar 2010 11:07:57 -0700 Message-ID: From: Xin LI To: Steve Pate Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek , Andriy Gapon Subject: Re: ZFS and Deduplication? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Mar 2010 18:07:58 -0000 Hi, Steve, On Wed, Mar 24, 2010 at 10:24 AM, Steve Pate wrote: > I'm really not familiar with the FreeBSD development process so I apologi= ze if I'm asking questions which appear naive. > > We'd like to ship a FreeBSD-based product by early next year that contain= s ZFS with de-dup. From what I see, FreeBSD seems to be tracking Solaris in= terms of taking stable versions of ZFS. In other words, you're taking the = version of ZFS that will next move from Open Solaris to Solaris. Is that co= rrect? > > >From what I understand FreeBSD 8-STABLE includes ZFS version 14, which b= rings it to parity with Solaris 10. > > ZFS version 21 contains de-duplication so is quite a long way ahead of wh= ere FreeBSD is today. This implies that it will take quite some time for ZF= S + de-dupe to get into a stable version of FreeBSD. Correct? > > Now, if we were to port ZFS version 22 (or later) to FreeBSD 8.x (version= number TBD), how could we work with the FreeBSD team to get ZFS version 21= + back into a subsequent FreeBSD release? Pawel is working on porting the -HEAD ZFS version from OpenSolaris right no= w. For FreeBSD 8.x I think there are some things that we need to take into consideration. OpenSolaris generally strives to keep API/ABI stability but for ZFS management tools like zfs(1m) and friends this does not apply, e.g., the ioctl request structure changes from time to time. In the past we broke ABI across ZFS v6 -> v13 update. This basically means one will have to update kernel and userland at the same time, which would make the upgrade an one-way ticket (otherwise it would be impossible to do operations like zpool import once kernel is upgraded and restarted). I think, if we plan to MFC it we need to make some compatibility shims instead of using the current approach. Cheers, --=20 Xin LI http://www.delphij.net From owner-freebsd-fs@FreeBSD.ORG Wed Mar 24 18:17:16 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 02A11106564A; Wed, 24 Mar 2010 18:17:16 +0000 (UTC) (envelope-from sfourman@gmail.com) Received: from mail-pw0-f54.google.com (mail-pw0-f54.google.com [209.85.160.54]) by mx1.freebsd.org (Postfix) with ESMTP id C3FC48FC15; Wed, 24 Mar 2010 18:17:15 +0000 (UTC) Received: by pwj4 with SMTP id 4so5762723pwj.13 for ; Wed, 24 Mar 2010 11:17:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=lqBH1JqcyCpTeeodRZqfesqIZPNUDHczZ2pMNhl0KyU=; b=HHlrTZ2zJPXqu2s2xo1H1ZriyEbzAHgmtFIIQUsxVE16fgN92tpVFPvcCqRaQOboOs H7nrjLVbvieU0wTSCgMZ17qbspTFNzDTDApfGhRJV8MBAsS2tu21h/HJOiUcVAL8+5J0 WPd0D8W4tWf5xBJwe2jNZyEE4YEC2xe69Cqzs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=IfjvDlI5G/gAj64u9Qx6DFq/ctlvFAns/703olmTQv1D5kIrJ27JJOQ59k7l65bqKo LSmRAdX9lAZOikPcNliC/K0+gmjznVXRRIryx38g/4E+SNPvt7R0toZ2FaujRBtgpKTu hgbUqJOhiTytyP/fTRPPOab4uReFa74NDZ7DY= MIME-Version: 1.0 Received: by 10.114.16.19 with SMTP id 19mr796502wap.92.1269454635183; Wed, 24 Mar 2010 11:17:15 -0700 (PDT) In-Reply-To: References: <4BAA3E25.9040108@icyb.net.ua> <2FF0A3E3-AF2F-4674-8648-FC01EC87445E@mac.com> Date: Wed, 24 Mar 2010 13:17:15 -0500 Message-ID: <11167f521003241117tc9821b8s58f3cdf018e6dd71@mail.gmail.com> From: "Sam Fourman Jr." To: Xin LI Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, Steve Pate , Pawel Jakub Dawidek , Andriy Gapon Subject: Re: ZFS and Deduplication? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Mar 2010 18:17:16 -0000 > > In the past we broke ABI across ZFS v6 -> v13 update. =A0This basically > means one will have to update kernel and userland at the same time, > which would make the upgrade an one-way ticket (otherwise it would be > impossible to do operations like zpool import once kernel is upgraded > and restarted). =A0I think, if we plan to MFC it we need to make some > compatibility shims instead of using the current approach. > > Cheers, > -- > Xin LI http://www.delphij.net is it possible that ZFS v22 will be only FreeBSD 9.x or greater ? Sam Fourman Jr. Fourman Networks From owner-freebsd-fs@FreeBSD.ORG Wed Mar 24 18:56:00 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 72749106566C for ; Wed, 24 Mar 2010 18:56:00 +0000 (UTC) (envelope-from dan@dan.emsphone.com) Received: from email1.allantgroup.com (email1.emsphone.com [199.67.51.115]) by mx1.freebsd.org (Postfix) with ESMTP id 25CAC8FC24 for ; Wed, 24 Mar 2010 18:55:59 +0000 (UTC) Received: from dan.emsphone.com (dan.emsphone.com [199.67.51.101]) by email1.allantgroup.com (8.14.0/8.14.0) with ESMTP id o2OIOGtl038034 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 24 Mar 2010 13:24:16 -0500 (CDT) (envelope-from dan@dan.emsphone.com) Received: from dan.emsphone.com (smmsp@localhost [127.0.0.1]) by dan.emsphone.com (8.14.4/8.14.3) with ESMTP id o2OIOFKi002936 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 24 Mar 2010 13:24:16 -0500 (CDT) (envelope-from dan@dan.emsphone.com) Received: (from dan@localhost) by dan.emsphone.com (8.14.4/8.14.3/Submit) id o2OHtkJ1031147; Wed, 24 Mar 2010 12:55:46 -0500 (CDT) (envelope-from dan) Date: Wed, 24 Mar 2010 12:55:46 -0500 From: Dan Nelson To: Bob Friesenhahn Message-ID: <20100324175546.GF12330@dan.emsphone.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-OS: FreeBSD 8.0-STABLE User-Agent: Mutt/1.5.20 (2009-06-14) X-Virus-Scanned: clamav-milter 0.95.3 at email1.allantgroup.com X-Virus-Status: Clean X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 (email1.allantgroup.com [199.67.51.78]); Wed, 24 Mar 2010 13:24:16 -0500 (CDT) X-Scanned-By: MIMEDefang 2.45 Cc: freebsd-fs@freebsd.org, Dan Naumov , freebsd-questions@freebsd.org Subject: Re: tuning vfs.zfs.vdev.max_pending and solving the issue of ZFS writes choking read IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Mar 2010 18:56:00 -0000 In the last episode (Mar 24), Bob Friesenhahn said: > On Wed, 24 Mar 2010, Dan Naumov wrote: > > Has anyone done any extensive testing of the effects of tuning > > vfs.zfs.vdev.max_pending on this issue? Is there some universally > > recommended value beyond the default 35? Anything else I should be > > looking at? > > The vdev.max_pending value is primarily used to tune for SAN/HW-RAID LUNs > and is used to dial down LUN service time (svc_t) values by limiting the > number of pending requests. It is not terribly useful for decreasing > stalls due to zfs writes. In order to reduce the impact of zfs writes, > you want to limit the maximum size of a zfs transaction group (TXG). I > don't know what the FreeBSD tunable is for this, but under Solaris it is > zfs:zfs_write_limit_override. There isn't a sysctl for it by default, but the following patch will enable a vfs.zfs.write_limit_override sysctl: Index: dsl_pool.c =================================================================== RCS file: /home/ncvs/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c,v retrieving revision 1.4.2.1 diff -u -p -r1.4.2.1 dsl_pool.c --- dsl_pool.c 17 Aug 2009 09:55:58 -0000 1.4.2.1 +++ dsl_pool.c 11 Mar 2010 08:34:27 -0000 @@ -47,6 +47,11 @@ uint64_t zfs_write_limit_inflated = 0; uint64_t zfs_write_limit_override = 0; extern uint64_t zfs_write_limit_min; +SYSCTL_DECL(_vfs_zfs); +SYSCTL_QUAD(_vfs_zfs, OID_AUTO, write_limit_override, CTLFLAG_RW, + &zfs_write_limit_override, 0, + "Force a txg if dirty buffers exceed this value (bytes)"); + kmutex_t zfs_write_limit_lock; static pgcnt_t old_physmem = 0; > On a large-memory system, a properly working zfs should not saturate > the write channel for more than 5 seconds. Zfs tries to learn the > write bandwidth so that it can tune the TXG size up to 5 seconds (max) > worth of writes. If you have both large memory and fast storage, > quite a huge amount of data can be written in 5 seconds. On my > Solaris system, I found that zfs was quite accurate with its rate > estimation, but it resulted in four gigabytes of data being written > per TXG. I had similar problems on a 32GB Solaris server at work. Note that with compression enabled, the entire system pauses while it compresses the outgoing block of data. It's just a fraction of a second, but long enough for end-users to complain about bad performance in X sessions. I had to throttle back to a 256MB write limit size to make the stuttering go away completely. It didn't affect write throughput much at all. -- Dan Nelson dnelson@allantgroup.com From owner-freebsd-fs@FreeBSD.ORG Wed Mar 24 20:48:09 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 099141065670; Wed, 24 Mar 2010 20:48:09 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id C24138FC1B; Wed, 24 Mar 2010 20:48:08 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.13.8+Sun/8.13.8) with ESMTP id o2OKlx3Y015304; Wed, 24 Mar 2010 15:47:59 -0500 (CDT) Date: Wed, 24 Mar 2010 15:47:59 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Dan Nelson In-Reply-To: <20100324175546.GF12330@dan.emsphone.com> Message-ID: References: <20100324175546.GF12330@dan.emsphone.com> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Wed, 24 Mar 2010 15:47:59 -0500 (CDT) Cc: freebsd-fs@freebsd.org, Dan Naumov , freebsd-questions@freebsd.org Subject: Re: tuning vfs.zfs.vdev.max_pending and solving the issue of ZFS writes choking read IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Mar 2010 20:48:09 -0000 On Wed, 24 Mar 2010, Dan Nelson wrote: > > I had similar problems on a 32GB Solaris server at work. Note that with > compression enabled, the entire system pauses while it compresses the > outgoing block of data. It's just a fraction of a second, but long enough > for end-users to complain about bad performance in X sessions. I had to > throttle back to a 256MB write limit size to make the stuttering go away > completely. It didn't affect write throughput much at all. Apparently this was a kernel thread priority problem in Solaris. It is apparently fixed in recent versions of OpenSolaris. The fix required adding a scheduling class which allowed the kernel thread doing the compression to be less than the priority of normal user processes (such as the X11 server). Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Wed Mar 24 22:21:19 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2F81A106567C for ; Wed, 24 Mar 2010 22:21:19 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (chello089077043238.chello.pl [89.77.43.238]) by mx1.freebsd.org (Postfix) with ESMTP id 671778FC08 for ; Wed, 24 Mar 2010 22:21:18 +0000 (UTC) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id D57BD45CA0; Wed, 24 Mar 2010 23:21:15 +0100 (CET) Received: from localhost (chello089077043238.chello.pl [89.77.43.238]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id CB3F045E93; Wed, 24 Mar 2010 23:21:09 +0100 (CET) Date: Wed, 24 Mar 2010 23:21:08 +0100 From: Pawel Jakub Dawidek To: "Sam Fourman Jr." Message-ID: <20100324222108.GB1999@garage.freebsd.pl> References: <4BAA3E25.9040108@icyb.net.ua> <2FF0A3E3-AF2F-4674-8648-FC01EC87445E@mac.com> <11167f521003241117tc9821b8s58f3cdf018e6dd71@mail.gmail.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="qlTNgmc+xy1dBmNv" Content-Disposition: inline In-Reply-To: <11167f521003241117tc9821b8s58f3cdf018e6dd71@mail.gmail.com> User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 9.0-CURRENT i386 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-0.6 required=4.5 tests=BAYES_00,RCVD_IN_SORBS_DUL autolearn=no version=3.0.4 Cc: freebsd-fs@freebsd.org, Steve Pate , Andriy Gapon Subject: Re: ZFS and Deduplication? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Mar 2010 22:21:19 -0000 --qlTNgmc+xy1dBmNv Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Mar 24, 2010 at 01:17:15PM -0500, Sam Fourman Jr. wrote: > > > > In the past we broke ABI across ZFS v6 -> v13 update. =A0This basically > > means one will have to update kernel and userland at the same time, > > which would make the upgrade an one-way ticket (otherwise it would be > > impossible to do operations like zpool import once kernel is upgraded > > and restarted). =A0I think, if we plan to MFC it we need to make some > > compatibility shims instead of using the current approach. > > > > Cheers, > > -- > > Xin LI http://www.delphij.net >=20 > is it possible that ZFS v22 will be only FreeBSD 9.x or greater ? It depends. If no changes are needed to the FreeBSD kernel API/ABI that will make it impossible to merge to 8.x, chances are new ZFS will appear in 8.x as well. Of course if 8.x will still be around:) --=20 Pawel Jakub Dawidek http://www.wheelsystems.com pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --qlTNgmc+xy1dBmNv Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAkuqkFMACgkQForvXbEpPzTkVgCgmtaCJi6lo3hHorq2VsA+IioG 0H4AoLgkCeV5J5nPnBjovEfsQO7wAlT1 =EX5t -----END PGP SIGNATURE----- --qlTNgmc+xy1dBmNv-- From owner-freebsd-fs@FreeBSD.ORG Thu Mar 25 00:06:19 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 78E16106567A; Thu, 25 Mar 2010 00:06:19 +0000 (UTC) (envelope-from dan.naumov@gmail.com) Received: from mail-bw0-f216.google.com (mail-bw0-f216.google.com [209.85.218.216]) by mx1.freebsd.org (Postfix) with ESMTP id D95E98FC20; Thu, 25 Mar 2010 00:06:18 +0000 (UTC) Received: by bwz8 with SMTP id 8so1641208bwz.3 for ; Wed, 24 Mar 2010 17:06:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:date:message-id:subject :from:to:content-type; bh=2dGsIvja8NjQePYtsRHtWinSB+xlaNxCZWV4K2iNoqo=; b=dSw9r1rKiTMFDfbatZ81EPKHF2S+BFE+Yl1yqpTQ6WCZj+/CDjgjixd5QHaOxCnp+c HbQmnWwhZ3bvFWjv1AQnhOh+z5c9tSE2kfuVRE4JYV4M1II3JylDKesxZnb43C9h7ECo r0F17Mvx4M2o4F300vx+A5QELWdtyfD/2jx0I= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=Dnd5V/3hGaQ/Vhe9K9DK/Qft+UjGh5RNYO4S6KRvtMG43Z2LnskcLinoFcRMF1TlJ4 zUx6yJ00GdYo3RiotSOsN+x1jOjV0RWCr1ixFx2SPLoahJNQa+IrDGsZz+0UhSjyCcGw BLA+vo4A1ueoz7tFrmdQ9uGMKn3hpYlw4ORww= MIME-Version: 1.0 Received: by 10.204.15.134 with SMTP id k6mr2964876bka.96.1269475576157; Wed, 24 Mar 2010 17:06:16 -0700 (PDT) Date: Thu, 25 Mar 2010 02:06:16 +0200 Message-ID: From: Dan Naumov To: freebsd-fs@freebsd.org, pjd@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Cc: Subject: RE: ZFS and Deduplication? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Mar 2010 00:06:19 -0000 >It depends. If no changes are needed to the FreeBSD kernel API/ABI that >will make it impossible to merge to 8.x, chances are new ZFS will appear >in 8.x as well. Of course if 8.x will still be around:) > >-- >Pawel Jakub Dawidek http://www.wheelsystems.com >pjd at FreeBSD.org http://www.FreeBSD.org >FreeBSD committer Am I Evil? Yes, I Am! Should this be taken as a suspicion that it will take well over 2 years then? - Sincerely, Dan Naumov From owner-freebsd-fs@FreeBSD.ORG Thu Mar 25 00:46:17 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7AAC0106566C for ; Thu, 25 Mar 2010 00:46:17 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 3146C8FC12 for ; Thu, 25 Mar 2010 00:46:16 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAOpOqkuDaFvI/2dsb2JhbACbG3O/e4R+BA X-IronPort-AV: E=Sophos;i="4.51,304,1267419600"; d="scan'208";a="70221484" Received: from darling.cs.uoguelph.ca ([131.104.91.200]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 24 Mar 2010 20:46:15 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by darling.cs.uoguelph.ca (Postfix) with ESMTP id 8E16A94006C; Wed, 24 Mar 2010 20:46:15 -0400 (EDT) X-Virus-Scanned: amavisd-new at darling.cs.uoguelph.ca Received: from darling.cs.uoguelph.ca ([127.0.0.1]) by localhost (darling.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ODJER6Gp9eBp; Wed, 24 Mar 2010 20:46:15 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by darling.cs.uoguelph.ca (Postfix) with ESMTP id 1EF8F940064; Wed, 24 Mar 2010 20:46:15 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id o2P0xG203951; Wed, 24 Mar 2010 20:59:16 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Wed, 24 Mar 2010 20:59:16 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: oscaruser@programmer.net In-Reply-To: <8CC982C85C86524-A6C-25C1@web-mmc-d07.sysops.aol.com> Message-ID: References: <8CC982C85C86524-A6C-25C1@web-mmc-d07.sysops.aol.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org Subject: Re: NFS Read Only Mount & NFS-failover X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Mar 2010 00:46:17 -0000 On Mon, 22 Mar 2010, oscaruser@programmer.net wrote: > Folks, > > Apparently Solaris has a mechanism for NFS failover, and is described in the > below URL reference. The key is that the NFS was mounted read only & and the > files systems have identical files so that the NFS client can switch-over > seamlessly. We have a similar need, but does FBSD support this via NFS at > all? If not, is there an alternative (AFS or something) that achieves this > end goal. Google searches said that no, this is not available yet. > The google search is correct w.r.t. NFS. I don't know enough about AFS to answer w.r.t. it. rick From owner-freebsd-fs@FreeBSD.ORG Thu Mar 25 00:51:55 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9AF45106564A; Thu, 25 Mar 2010 00:51:55 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 381778FC0A; Thu, 25 Mar 2010 00:51:54 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAMtQqkuDaFvI/2dsb2JhbACbG3O/fIR+BA X-IronPort-AV: E=Sophos;i="4.51,304,1267419600"; d="scan'208";a="69867893" Received: from darling.cs.uoguelph.ca ([131.104.91.200]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 24 Mar 2010 20:51:54 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by darling.cs.uoguelph.ca (Postfix) with ESMTP id 654C694006C; Wed, 24 Mar 2010 20:51:54 -0400 (EDT) X-Virus-Scanned: amavisd-new at darling.cs.uoguelph.ca Received: from darling.cs.uoguelph.ca ([127.0.0.1]) by localhost (darling.cs.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mN1xKYHG7YXP; Wed, 24 Mar 2010 20:51:53 -0400 (EDT) Received: from muncher.cs.uoguelph.ca (muncher.cs.uoguelph.ca [131.104.91.102]) by darling.cs.uoguelph.ca (Postfix) with ESMTP id 9A9C1940064; Wed, 24 Mar 2010 20:51:53 -0400 (EDT) Received: from localhost (rmacklem@localhost) by muncher.cs.uoguelph.ca (8.11.7p3+Sun/8.11.6) with ESMTP id o2P14s205017; Wed, 24 Mar 2010 21:04:55 -0400 (EDT) X-Authentication-Warning: muncher.cs.uoguelph.ca: rmacklem owned process doing -bs Date: Wed, 24 Mar 2010 21:04:53 -0400 (EDT) From: Rick Macklem X-X-Sender: rmacklem@muncher.cs.uoguelph.ca To: Kai Kockro In-Reply-To: <201003240739.04994.kkockro@web.de> Message-ID: References: <201003171120.o2HBK3CV082081@freefall.freebsd.org> <201003240739.04994.kkockro@web.de> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: bug-followup@freebsd.org, freebsd-fs@freebsd.org, Daniel Braniss Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Mar 2010 00:51:55 -0000 On Wed, 24 Mar 2010, Kai Kockro wrote: > Hi, > > after 3 days with the first patch ( FreeBSD 8-STABLE AMD64, old nfsd ): > > netstat -m > 5732/10528/16260 mbufs in use (current/cache/total) > 4825/10131/14956/262144 mbuf clusters in use (current/cache/total/max) > > It looks very fine. I'll patch with the final corrections and then test again. > Sounds good. Thanks for letting us know. > But why i have the same issues? I dont use UDP connects, only TCP?! > Hmm, from what I can see, the replay cache is used for both UDP and TCP in the regular NFS server. If I'm correct on that, any retry of an RPC over TCP could cause the leak. (Retries of an RPC over TCP are infrequent, with the likelyhood differing between clients. I can only guess that your TCP clients do retries?) rick From owner-freebsd-fs@FreeBSD.ORG Thu Mar 25 01:00:10 2010 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4D179106564A for ; Thu, 25 Mar 2010 01:00:10 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 3D39E8FC12 for ; Thu, 25 Mar 2010 01:00:10 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o2P10A5f082340 for ; Thu, 25 Mar 2010 01:00:10 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o2P109PK082339; Thu, 25 Mar 2010 01:00:10 GMT (envelope-from gnats) Date: Thu, 25 Mar 2010 01:00:10 GMT Message-Id: <201003250100.o2P109PK082339@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Rick Macklem Cc: Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Rick Macklem List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Mar 2010 01:00:10 -0000 The following reply was made to PR kern/144330; it has been noted by GNATS. From: Rick Macklem To: Kai Kockro Cc: Daniel Braniss , Mikolaj Golub , Jeremy Chadwick , freebsd-fs@freebsd.org, bug-followup@freebsd.org, gerrit@pmp.uni-hannover.de Subject: Re: kern/144330: [nfs] mbuf leakage in nfsd with zfs Date: Wed, 24 Mar 2010 21:04:53 -0400 (EDT) On Wed, 24 Mar 2010, Kai Kockro wrote: > Hi, > > after 3 days with the first patch ( FreeBSD 8-STABLE AMD64, old nfsd ): > > netstat -m > 5732/10528/16260 mbufs in use (current/cache/total) > 4825/10131/14956/262144 mbuf clusters in use (current/cache/total/max) > > It looks very fine. I'll patch with the final corrections and then test again. > Sounds good. Thanks for letting us know. > But why i have the same issues? I dont use UDP connects, only TCP?! > Hmm, from what I can see, the replay cache is used for both UDP and TCP in the regular NFS server. If I'm correct on that, any retry of an RPC over TCP could cause the leak. (Retries of an RPC over TCP are infrequent, with the likelyhood differing between clients. I can only guess that your TCP clients do retries?) rick From owner-freebsd-fs@FreeBSD.ORG Thu Mar 25 15:52:38 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4EA50106567D for ; Thu, 25 Mar 2010 15:52:38 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id 0A34C8FC12 for ; Thu, 25 Mar 2010 15:52:37 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1NupM6-0000j6-8A for freebsd-fs@freebsd.org; Thu, 25 Mar 2010 16:52:22 +0100 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 25 Mar 2010 16:52:22 +0100 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 25 Mar 2010 16:52:22 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Ivan Voras Date: Thu, 25 Mar 2010 16:51:44 +0100 Lines: 4 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.1.5) Gecko/20100118 Thunderbird/3.0 Cc: freebsd-stable@freebsd.org Subject: ZFS, Samba, ACL X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Mar 2010 15:52:38 -0000 ZFS supports "NFSv4-style" ACLs, which are supposed to be similar to Windows NTFS-stye ACLs. Samba has an "ACL" option but I've seen it work (somewhat) on POSIX style ACLs. Did anyone experiment with using Samba and ZFS to store Windows ACLs? From owner-freebsd-fs@FreeBSD.ORG Thu Mar 25 22:57:31 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A8552106566B for ; Thu, 25 Mar 2010 22:57:31 +0000 (UTC) (envelope-from oscaruser@programmer.net) Received: from imr-db01.mx.aol.com (imr-db01.mx.aol.com [205.188.91.95]) by mx1.freebsd.org (Postfix) with ESMTP id 64F9A8FC14 for ; Thu, 25 Mar 2010 22:57:31 +0000 (UTC) Received: from imo-da03.mx.aol.com (imo-da03.mx.aol.com [205.188.169.201]) by imr-db01.mx.aol.com (8.14.1/8.14.1) with ESMTP id o2PMvRQN029445 for ; Thu, 25 Mar 2010 18:57:27 -0400 Received: from oscaruser@programmer.net by imo-da03.mx.aol.com (mail_out_v42.9.) id n.bf9.7bdbe853 (34896) for ; Thu, 25 Mar 2010 18:57:23 -0400 (EDT) Received: from smtprly-db02.mx.aol.com (smtprly-db02.mx.aol.com [205.188.249.153]) by cia-da01.mx.aol.com (v127_r1.2) with ESMTP id MAILCIADA015-5bca4babea4d28c; Thu, 25 Mar 2010 18:57:23 -0400 Received: from web-mmc-d07 (web-mmc-d07.sim.aol.com [205.188.103.97]) by smtprly-db02.mx.aol.com (v127.7) with ESMTP id MAILSMTPRLYDB021-5bca4babea4d28c; Thu, 25 Mar 2010 18:57:17 -0400 References: <8CC982C85C86524-A6C-25C1@web-mmc-d07.sysops.aol.com> To: freebsd-fs@freebsd.org Content-Transfer-Encoding: quoted-printable Date: Thu, 25 Mar 2010 18:57:17 -0400 X-AOL-IP: 72.29.180.81 In-Reply-To: X-MB-Message-Source: WebUI MIME-Version: 1.0 From: oscaruser@programmer.net X-MB-Message-Type: User Content-Type: text/plain; charset="utf-8"; format=flowed X-Mailer: Mail.com Webmail 31226-STANDARD Received: from 72.29.180.81 by web-mmc-d07.sysops.aol.com (205.188.103.97) with HTTP (WebMailUI); Thu, 25 Mar 2010 18:57:17 -0400 Message-Id: <8CC9A85A65CC4AE-13C8-244F@web-mmc-d07.sysops.aol.com> X-Spam-Flag: NO X-AOL-SENDER: oscaruser@programmer.net X-Mailman-Approved-At: Fri, 26 Mar 2010 01:37:40 +0000 Subject: Re: NFS Read Only Mount & NFS-failover X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Mar 2010 22:57:31 -0000 Hi Rick, I have read your response, thank you for that information. I have two=20 other questions. Does NFS on FBSD support read-only mounts? In practice=20 I have seen some posts that claim it does not function properly or=20 mounts are found in 'rw' mode, but docs clearly read how the=20 configuration should be written. Also is NFS usable for production env? Thank you. -----Original Message----- From: Rick Macklem To: oscaruser@programmer.net Cc: freebsd-fs@freebsd.org Sent: Wed, Mar 24, 2010 5:59 pm Subject: Re: NFS Read Only Mount & NFS-failover =C2=A0 On Mon, 22 Mar 2010, oscaruser@programmer.net wrote:=C2=A0 =C2=A0 > Folks,=C2=A0 >=C2=A0 > Apparently Solaris has a mechanism for NFS failover, and is described=20 in the > below URL reference. The key is that the NFS was mounted read=20 only & and the > files systems have identical files so that the NFS=20 client can switch-over > seamlessly. We have a similar need, but does=20 FBSD support this via NFS at > all? If not, is there an alternative=20 (AFS or something) that achieves this > end goal. Google searches said=20 that no, this is not available yet.=C2=A0 >=C2=A0 The google search is correct w.r.t. NFS. I don't know enough about AFS=C2= =A0 to answer w.r.t. it.=C2=A0 =C2=A0 rick=C2=A0 From owner-freebsd-fs@FreeBSD.ORG Fri Mar 26 06:47:11 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 16EBC106566B for ; Fri, 26 Mar 2010 06:47:11 +0000 (UTC) (envelope-from arnaud.houdelette@tzim.net) Received: from golanth.tzim.net (unknown [IPv6:2001:41d0:1:d91f:21c:c0ff:fe4b:cf32]) by mx1.freebsd.org (Postfix) with ESMTP id A8EBF8FC17 for ; Fri, 26 Mar 2010 06:47:10 +0000 (UTC) Received: from 12rf.tzim.net ([82.232.60.244] helo=[192.168.0.14]) by golanth.tzim.net with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.71 (FreeBSD)) (envelope-from ) id 1Nv3K1-000LBu-Hq for freebsd-fs@freebsd.org; Fri, 26 Mar 2010 07:47:09 +0100 Message-ID: <4BAC5871.8080403@tzim.net> Date: Fri, 26 Mar 2010 07:47:13 +0100 From: Arnaud Houdelette User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.8) Gecko/20100227 Lightning/1.0b1 Thunderbird/3.0.3 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Authenticated-User: tzim@tzim.net X-Authenticator: plain Subject: Re: ZFS, Samba, ACL X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Mar 2010 06:47:11 -0000 On 25/03/2010 16:51, Ivan Voras wrote: > ZFS supports "NFSv4-style" ACLs, which are supposed to be similar to > Windows NTFS-stye ACLs. Samba has an "ACL" option but I've seen it > work (somewhat) on POSIX style ACLs. Did anyone experiment with using > Samba and ZFS to store Windows ACLs? > Here's my findings : There is an existing ZFS_ACL VFS module for samba for Solaris. Unfortunately, it doesn't work with FreeBSD ZFS, as the ACL API in FreeBSD is different from the Solaris One. There is a compatibility library, though, but it need to be integrated in (or linked to) the module. Last time I tried, I did not manage to make this work (I may not have the skills). More info on the bottom of this wiki page : http://wiki.freebsd.org/NFSv4_ACLs From owner-freebsd-fs@FreeBSD.ORG Fri Mar 26 09:13:08 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 65C77106566B for ; Fri, 26 Mar 2010 09:13:08 +0000 (UTC) (envelope-from peterjeremy@acm.org) Received: from mail17.syd.optusnet.com.au (mail17.syd.optusnet.com.au [211.29.132.198]) by mx1.freebsd.org (Postfix) with ESMTP id CC5B78FC12 for ; Fri, 26 Mar 2010 09:13:07 +0000 (UTC) Received: from server.vk2pj.dyndns.org (c122-106-253-149.belrs3.nsw.optusnet.com.au [122.106.253.149]) by mail17.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o2Q9D5Hc023932 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 26 Mar 2010 20:13:06 +1100 X-Bogosity: Ham, spamicity=0.000000 Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1]) by server.vk2pj.dyndns.org (8.14.3/8.14.3) with ESMTP id o2Q9D3fp033858; Fri, 26 Mar 2010 20:13:03 +1100 (EST) (envelope-from peter@server.vk2pj.dyndns.org) Received: (from peter@localhost) by server.vk2pj.dyndns.org (8.14.3/8.14.3/Submit) id o2Q9D3sa033857; Fri, 26 Mar 2010 20:13:03 +1100 (EST) (envelope-from peter) Date: Fri, 26 Mar 2010 20:13:03 +1100 From: Peter Jeremy To: oscaruser@programmer.net Message-ID: <20100326091303.GC32799@server.vk2pj.dyndns.org> References: <8CC982C85C86524-A6C-25C1@web-mmc-d07.sysops.aol.com> <8CC9A85A65CC4AE-13C8-244F@web-mmc-d07.sysops.aol.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="aT9PWwzfKXlsBJM1" Content-Disposition: inline In-Reply-To: <8CC9A85A65CC4AE-13C8-244F@web-mmc-d07.sysops.aol.com> X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc User-Agent: Mutt/1.5.20 (2009-06-14) X-CMAE-Score: 0 Cc: freebsd-fs@freebsd.org Subject: Re: NFS Read Only Mount & NFS-failover X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Mar 2010 09:13:08 -0000 --aT9PWwzfKXlsBJM1 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2010-Mar-25 18:57:17 -0400, oscaruser@programmer.net wrote: >Does NFS on FBSD support read-only mounts? Yes. > Also is NFS usable for production env? I'm using various mixtures of FreeBSD, Solaris and unfs servers with FreeBSD, Solaris and Linux clients. The only issues I had were some locking issues between Linux clients and FreeBSD servers. I would consider it production use. --=20 Peter Jeremy --aT9PWwzfKXlsBJM1 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAkusep8ACgkQ/opHv/APuIfvwQCfXU7mJse902SkhAhMDwzmWN33 R14AnRfgL/mNwfcaqVYNW4BmSguo/a/N =jkDJ -----END PGP SIGNATURE----- --aT9PWwzfKXlsBJM1-- From owner-freebsd-fs@FreeBSD.ORG Fri Mar 26 13:17:44 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6F585106564A; Fri, 26 Mar 2010 13:17:44 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 72A788FC19; Fri, 26 Mar 2010 13:17:43 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA01998; Fri, 26 Mar 2010 15:17:42 +0200 (EET) (envelope-from avg@freebsd.org) Message-ID: <4BACB3F5.7010905@freebsd.org> Date: Fri, 26 Mar 2010 15:17:41 +0200 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.24 (X11/20100319) MIME-Version: 1.0 To: freebsd-fs@freebsd.org, freebsd-geom@freebsd.org References: <4BA0A660.3000902@freebsd.org> In-Reply-To: <4BA0A660.3000902@freebsd.org> X-Enigmail-Version: 0.95.7 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: Subject: Re: g_vfs_open and bread(devvp, ...) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Mar 2010 13:17:44 -0000 Will an offer of a beer help reviewing this change? :-) on 17/03/2010 11:52 Andriy Gapon said the following: > I've given a fresh look to the issue of g_vfs_open and bread(devvp, ...) in > filesystem code. This time I hope to present my reasoning more clearly than I did > in my previous attempts. > For this reason I am omitting historical references and dramatic > examples/demonstrations but they are still available upon request (and in archives). > I hope that my shortened notation in references to the code and data structures > will not be confusing. > > bread() and the API family it belongs to is an interface to buffer cache system. > Buffer cache system can be roughly divided into two parts: cache part and I/O path > part. > I think that it is understood that both parts must have the same notion of a block > size when translating a block number to a byte offset. (If this point is not > clear, I can follow up with another essay). > > In the case of cache code the translation is explicit and simple: > A) offset = blkno * bo_bsize > > In the case of I/O code the translation is not that straightforward, because it > can be altered/overridden by bop_strategy which can in turn hook vop_strategy, etc. > > Let's consider a simple case of a filesystem that: > a) connects to a geom provider via g_vfs_open > b) doesn't modify anything in devvp->v_bufobj (in particular bo_ops and bo_bsize) > c) uses bread(devvp) (e.g. to access an equivalent of superblock, etc) > > Short overview of geom_vfs glue: > 1) g_vfs_open sets devvp->v_bufobj.bo_ops to g_vfs_bufops, where bop_strategy is > g_vfs_strategy > 2) bo_bsize is set to pp->sectorsize > 3) g_vfs_strategy doesn't perform any block-to-offset translation of its own, it > expects b_iooffset to be correctly set and passes its value to bio_offset > > When a filesystem issues bread(devvp) the following happens in the I/O path: > I) bread() calls breadn() > II) in breadn(): bp->b_iooffset = dbtob(bp->b_blkno), that is b_iooffset is set to > blkno * DEV_BSIZE (where DEV_BSIZE is 512) > III) breadn() then calls bstrategy() which is a simple wrapper around BO_STRATEGY > IV) g_vfs_strategy gets called and, as described in (3) above, it simply passes on > b_iooffset value to bio_offset > V) thus, a block size used for I/O operation is 512 (DEV_BSIZE) > VI) on the other hand, as stated in (A) above, block size used in caching code is > bo_bsize > > Thus, if bo_bsize != DEV_BSIZE, or alternatively said, pp->sectorsize != 512, we > have a trouble of data getting cached with incorrect offsets. > > Additionally, from (V) above we must conclude that a filesystem must specify block > numbers in DEV_BSIZE units to bread(devvp, blkno, ...) if the conditions (a), (b), > (c) are met. In fact, all such filesystems already do that, because otherwise > they would read incorrect data from the media. > > So, the problem is only with the caching of the data. > As such, this issue has little practical effects, because only a small number of > reads is done via devvp and only for sufficiently small chunks of data (I hope). > fs/udf used to be greatly affected by this issue when it was reading directory > nodes via devvp, but that was in the past (prior to 189082). > > Still I think that we should fix this issue for general code correctness/quality > reasons. And also to avoid possible future bugs. > > As demonstrated by (V) and (VI) above, obvious and easiest fix is to (always) set > bo_bsize to DEV_BSIZE in g_vfs_open(): > --- a/sys/geom/geom_vfs.c > +++ b/sys/geom/geom_vfs.c > @@ -179,7 +179,7 @@ g_vfs_open(struct vnode *vp, struct g_consumer **cpp, const > char *fsname, int wr > bo = &vp->v_bufobj; > bo->bo_ops = g_vfs_bufops; > bo->bo_private = cp; > - bo->bo_bsize = pp->sectorsize; > + bo->bo_bsize = DEV_BSIZE; > gp->softc = bo; > > return (error); > > I tested this change with ufs, udf and cd9660 and haven't observed any regressions. > > P.S. > There might something to changing bread(devvp) convention, so that blkno is > expected in sectorsize units. But setting bo_bsize to sectorsize is only a tiny > portion of what needs to be changed to make it actually work. > -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Fri Mar 26 15:28:10 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D7BF3106564A for ; Fri, 26 Mar 2010 15:28:10 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [64.81.247.49]) by mx1.freebsd.org (Postfix) with ESMTP id A329B8FC12 for ; Fri, 26 Mar 2010 15:28:10 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id o2QFSAuI037251; Fri, 26 Mar 2010 08:28:10 -0700 (PDT) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201003261528.o2QFSAuI037251@chez.mckusick.com> To: Andriy Gapon In-reply-to: <4BACB3F5.7010905@freebsd.org> Date: Fri, 26 Mar 2010 08:28:10 -0700 From: Kirk McKusick Cc: freebsd-fs@freebsd.org, freebsd-geom@freebsd.org Subject: Re: g_vfs_open and bread(devvp, ...) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Mar 2010 15:28:10 -0000 I have reviewed your change and I believe that your analysis is correct. I am in agreement with your making the change. As disk sector sizes will be growing in the near future, it would be desirable to get away from having DEV_BSIZE hard-coded. But as you note, that is a far bigger change than this one. Kirk McKusick From owner-freebsd-fs@FreeBSD.ORG Fri Mar 26 15:40:33 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CDECB1065672 for ; Fri, 26 Mar 2010 15:40:33 +0000 (UTC) (envelope-from ndenev@gmail.com) Received: from mail-fx0-f225.google.com (mail-fx0-f225.google.com [209.85.220.225]) by mx1.freebsd.org (Postfix) with ESMTP id 54AA68FC18 for ; Fri, 26 Mar 2010 15:40:32 +0000 (UTC) Received: by fxm25 with SMTP id 25so30699fxm.3 for ; Fri, 26 Mar 2010 08:40:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:subject:mime-version :content-type:from:in-reply-to:date:cc:content-transfer-encoding :message-id:references:to:x-mailer; bh=6glqHrLyoYUQccVQ2hv3SpzIw1LAIKmJGR8+KQvF44E=; b=YTEfgBzMmU/fv9nptF+XecGtUs4j+irwZydbQGvuLsmd4iz1iI3bi0O0QRYpqDq3t6 zoBguYj5EyrZuMRjkPe0FIVoTX9NpB+LdYB1By4EXP6dW9IIzrSA9QWMfQb4q1P3xp5M dCidgLpnH0wV+vDl89k8MyOPUWjBohbTbI6ME= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; b=bAfyDyEmxN3D27VwPy+P2G/zrY9CpLyTddsy9lzNubIBMB/K9p6DnLnR515sxwNWrL /dpJ9LFrYSgnTEzqt7hVp+V2NLvuF9vYhyfpMJrvra91ono3E3X6pNt8a9O942fjCLsf 42jxtHED8VHL7bzZLaN6+T0rd1O9WmmE1IRgk= Received: by 10.103.3.39 with SMTP id f39mr585895mui.83.1269618031684; Fri, 26 Mar 2010 08:40:31 -0700 (PDT) Received: from [10.32.23.105] ([195.34.111.178]) by mx.google.com with ESMTPS id j10sm4374503mue.18.2010.03.26.08.40.29 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 26 Mar 2010 08:40:30 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v1077) Content-Type: text/plain; charset=us-ascii From: Nikolay Denev In-Reply-To: <20100324175546.GF12330@dan.emsphone.com> Date: Fri, 26 Mar 2010 17:40:28 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <18370A36-8C02-4C7E-ADB5-E7E093E533B9@gmail.com> References: <20100324175546.GF12330@dan.emsphone.com> To: Dan Nelson X-Mailer: Apple Mail (2.1077) Cc: freebsd-fs@freebsd.org, Dan Naumov Subject: Re: tuning vfs.zfs.vdev.max_pending and solving the issue of ZFS writes choking read IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Mar 2010 15:40:33 -0000 On Mar 24, 2010, at 7:55 PM, Dan Nelson wrote: > In the last episode (Mar 24), Bob Friesenhahn said: >> On Wed, 24 Mar 2010, Dan Naumov wrote: >>> Has anyone done any extensive testing of the effects of tuning >>> vfs.zfs.vdev.max_pending on this issue? Is there some universally >>> recommended value beyond the default 35? Anything else I should be >>> looking at? >>=20 >> The vdev.max_pending value is primarily used to tune for SAN/HW-RAID = LUNs >> and is used to dial down LUN service time (svc_t) values by limiting = the >> number of pending requests. It is not terribly useful for decreasing >> stalls due to zfs writes. In order to reduce the impact of zfs = writes, >> you want to limit the maximum size of a zfs transaction group (TXG). = I >> don't know what the FreeBSD tunable is for this, but under Solaris it = is >> zfs:zfs_write_limit_override. >=20 > There isn't a sysctl for it by default, but the following patch will = enable > a vfs.zfs.write_limit_override sysctl: >=20 > Index: dsl_pool.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > RCS file: = /home/ncvs/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_pool.c,v= > retrieving revision 1.4.2.1 > diff -u -p -r1.4.2.1 dsl_pool.c > --- dsl_pool.c 17 Aug 2009 09:55:58 -0000 1.4.2.1 > +++ dsl_pool.c 11 Mar 2010 08:34:27 -0000 > @@ -47,6 +47,11 @@ uint64_t zfs_write_limit_inflated =3D 0; > uint64_t zfs_write_limit_override =3D 0; > extern uint64_t zfs_write_limit_min; >=20 > +SYSCTL_DECL(_vfs_zfs); > +SYSCTL_QUAD(_vfs_zfs, OID_AUTO, write_limit_override, CTLFLAG_RW, > + &zfs_write_limit_override, 0, > + "Force a txg if dirty buffers exceed this value (bytes)"); > + > kmutex_t zfs_write_limit_lock; >=20 > static pgcnt_t old_physmem =3D 0; >=20 >=20 >> On a large-memory system, a properly working zfs should not saturate=20= >> the write channel for more than 5 seconds. Zfs tries to learn the=20 >> write bandwidth so that it can tune the TXG size up to 5 seconds = (max)=20 >> worth of writes. If you have both large memory and fast storage,=20 >> quite a huge amount of data can be written in 5 seconds. On my=20 >> Solaris system, I found that zfs was quite accurate with its rate=20 >> estimation, but it resulted in four gigabytes of data being written=20= >> per TXG. >=20 > I had similar problems on a 32GB Solaris server at work. Note that = with > compression enabled, the entire system pauses while it compresses the > outgoing block of data. It's just a fraction of a second, but long = enough > for end-users to complain about bad performance in X sessions. I had = to > throttle back to a 256MB write limit size to make the stuttering go = away > completely. It didn't affect write throughput much at all. >=20 > --=20 > Dan Nelson > dnelson@allantgroup.com I had to come up with more or less the same patch and it fixed my = problem with writes stalling the IO of the machine. Probably this has to be commited. Regards, Niki Denev From owner-freebsd-fs@FreeBSD.ORG Fri Mar 26 16:15:42 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C25FD1065676 for ; Fri, 26 Mar 2010 16:15:42 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta12.westchester.pa.mail.comcast.net (qmta12.westchester.pa.mail.comcast.net [76.96.59.227]) by mx1.freebsd.org (Postfix) with ESMTP id 708828FC1F for ; Fri, 26 Mar 2010 16:15:42 +0000 (UTC) Received: from omta23.westchester.pa.mail.comcast.net ([76.96.62.74]) by qmta12.westchester.pa.mail.comcast.net with comcast id xnLl1d0011c6gX85CsFive; Fri, 26 Mar 2010 16:15:42 +0000 Received: from koitsu.dyndns.org ([98.248.46.159]) by omta23.westchester.pa.mail.comcast.net with comcast id xsK31d0033S48mS3jsK3N3; Fri, 26 Mar 2010 16:19:04 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id F35ED9B436; Fri, 26 Mar 2010 09:15:39 -0700 (PDT) Date: Fri, 26 Mar 2010 09:15:39 -0700 From: Jeremy Chadwick To: Kirk McKusick Message-ID: <20100326161539.GA10618@icarus.home.lan> References: <4BACB3F5.7010905@freebsd.org> <201003261528.o2QFSAuI037251@chez.mckusick.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201003261528.o2QFSAuI037251@chez.mckusick.com> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-fs@freebsd.org, Andriy Gapon , freebsd-geom@freebsd.org Subject: Re: g_vfs_open and bread(devvp, ...) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Mar 2010 16:15:42 -0000 On Fri, Mar 26, 2010 at 08:28:10AM -0700, Kirk McKusick wrote: > I have reviewed your change and I believe that your analysis is > correct. I am in agreement with your making the change. > > As disk sector sizes will be growing in the near future, it would > be desirable to get away from having DEV_BSIZE hard-coded. But as > you note, that is a far bigger change than this one. I should note that they already have grown: Western Digital, as of a few months ago, began shipping drives that use 4KByte sectors. They're known as the "EARS" drives, due to their model string ending with "EARS": WD20EARS: http://www.wdc.com/en/products/Products.asp?DriveID=773 WD15EARS: http://www.wdc.com/en/products/products.asp?driveid=772 WD10EARS: http://www.wdc.com/en/products/products.asp?driveid=763 (I should warn folks these are Caviar Green drives, which may suffer from excessive Load Cycles (parking/unparking actuator arm). I don't have one of these drives so I can't validate if the issue happens on this model or not) A discussion and an including an incredibly cheesy video review are below. The video review does discuss the 4KB sector size, in addition to jumpers that revert the drive to using 512-byte sectors for older OSes such as Windows XP -- and presumably FreeBSD. http://www.tomshardware.com/reviews/wd-4k-sector,2554.html http://www.youtube.com/watch?v=QeFj2QTaA3Y -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Fri Mar 26 18:21:22 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7D2C5106564A; Fri, 26 Mar 2010 18:21:22 +0000 (UTC) (envelope-from dimitry@andric.com) Received: from tensor.andric.com (cl-327.ede-01.nl.sixxs.net [IPv6:2001:7b8:2ff:146::2]) by mx1.freebsd.org (Postfix) with ESMTP id 0BCB48FC17; Fri, 26 Mar 2010 18:21:22 +0000 (UTC) Received: from [IPv6:2001:7b8:3a7:0:8009:ab55:dc54:fb7f] (unknown [IPv6:2001:7b8:3a7:0:8009:ab55:dc54:fb7f]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by tensor.andric.com (Postfix) with ESMTPSA id 50D8C5C59; Fri, 26 Mar 2010 19:21:20 +0100 (CET) Message-ID: <4BACFB20.6070601@andric.com> Date: Fri, 26 Mar 2010 19:21:20 +0100 From: Dimitry Andric User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.2.2pre) Gecko/20100311 Lanikai/3.1b2pre MIME-Version: 1.0 To: Jeremy Chadwick References: <4BACB3F5.7010905@freebsd.org> <201003261528.o2QFSAuI037251@chez.mckusick.com> <20100326161539.GA10618@icarus.home.lan> In-Reply-To: <20100326161539.GA10618@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Kirk McKusick , freebsd-fs@freebsd.org, Andriy Gapon , freebsd-geom@freebsd.org Subject: Re: g_vfs_open and bread(devvp, ...) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Mar 2010 18:21:22 -0000 On 2010-03-26 17:15, Jeremy Chadwick wrote: > I should note that they already have grown: Western Digital, as of a few > months ago, began shipping drives that use 4KByte sectors. ... > A discussion and an including an incredibly cheesy video review are > below. The video review does discuss the 4KB sector size, in addition > to jumpers that revert the drive to using 512-byte sectors for older > OSes such as Windows XP -- and presumably FreeBSD. Please note these drives *always* expose 512-byte sectors to any OS, at least for now. The jumper you refer to is only a hack to force sector 63 (the usual starting position for the first partition) to be aligned on a 4096-byte boundary. If you would remove it after partitioning, all sectors would shift up one sector, and there would be trouble. :) From owner-freebsd-fs@FreeBSD.ORG Fri Mar 26 18:45:59 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AAB161065673; Fri, 26 Mar 2010 18:45:59 +0000 (UTC) (envelope-from swhetzel@gmail.com) Received: from mail-iw0-f183.google.com (mail-iw0-f183.google.com [209.85.223.183]) by mx1.freebsd.org (Postfix) with ESMTP id 618408FC14; Fri, 26 Mar 2010 18:45:59 +0000 (UTC) Received: by iwn13 with SMTP id 13so6917015iwn.14 for ; Fri, 26 Mar 2010 11:45:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:received:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=npSCHa0URS1HJZGkQAB3UtgbNSH5NfpEWdqor/Mw9V0=; b=lJHJVw0G94g8VS4szM8XttDGm/C7ErEV54JfYEtmBbOjCRjcooRpiVEHO1IScjb4EM 2jTU9K7ttEStTprPs99ce1dQ9V9lrDd3sZH7ABT++B7x2o8mdUBx0WKdoTLquJ0NiVT8 GS+CVy5mL8QuchIEC5YJ8/0EcueTG0oCGGXqw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=o1oWkEfLBQILx6PIOK+JQoCRzJcR2izxE4PWC+VavlS0j4SHjM1z34vJAzSTnn2Djz kL4AzU3yDkZnVrapOHGceeKoYaXsQsHN+fBBuoUGqQGiRrghmoGZElH7RLW+c/2aVD2t y7whKDn71eGWy32PuLy68MudOMWJhUU5fnXu8= MIME-Version: 1.0 Received: by 10.231.17.199 with HTTP; Fri, 26 Mar 2010 11:21:41 -0700 (PDT) In-Reply-To: <20100326161539.GA10618@icarus.home.lan> References: <4BACB3F5.7010905@freebsd.org> <201003261528.o2QFSAuI037251@chez.mckusick.com> <20100326161539.GA10618@icarus.home.lan> Date: Fri, 26 Mar 2010 13:21:41 -0500 Received: by 10.231.153.205 with SMTP id l13mr576702ibw.64.1269627701340; Fri, 26 Mar 2010 11:21:41 -0700 (PDT) Message-ID: <790a9fff1003261121p5d72e74bw61d0a66a7d418aae@mail.gmail.com> From: Scot Hetzel To: Jeremy Chadwick Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Kirk McKusick , freebsd-fs@freebsd.org, Andriy Gapon , freebsd-geom@freebsd.org Subject: Re: g_vfs_open and bread(devvp, ...) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Mar 2010 18:45:59 -0000 On Fri, Mar 26, 2010 at 11:15 AM, Jeremy Chadwick wrote: > On Fri, Mar 26, 2010 at 08:28:10AM -0700, Kirk McKusick wrote: >> I have reviewed your change and I believe that your analysis is >> correct. I am in agreement with your making the change. >> >> As disk sector sizes will be growing in the near future, it would >> be desirable to get away from having DEV_BSIZE hard-coded. But as >> you note, that is a far bigger change than this one. > > I should note that they already have grown: Western Digital, as of a few > months ago, began shipping drives that use 4KByte sectors. =A0They're > known as the "EARS" drives, due to their model string ending with > "EARS": > > WD20EARS: http://www.wdc.com/en/products/Products.asp?DriveID=3D773 > WD15EARS: http://www.wdc.com/en/products/products.asp?driveid=3D772 > WD10EARS: http://www.wdc.com/en/products/products.asp?driveid=3D763 > > (I should warn folks these are Caviar Green drives, which may suffer > from excessive Load Cycles (parking/unparking actuator arm). =A0I don't > have one of these drives so I can't validate if the issue happens on > this model or not) > > A discussion and an including an incredibly cheesy video review are > below. =A0The video review does discuss the 4KB sector size, in addition > to jumpers that revert the drive to using 512-byte sectors for older > OSes such as Windows XP -- and presumably FreeBSD. > > http://www.tomshardware.com/reviews/wd-4k-sector,2554.html > http://www.youtube.com/watch?v=3DQeFj2QTaA3Y > After reviewing these links, my understanding of these drives that they still provide 512 Byte sectors to the O/S, but when they write to the drive, it will pack eight 512 Byte sectors into a 4K sector on the drive. When the drive needs to modify a sector it has to read the entire 4K sector before writing the change to the drive. This could lead to excessive Read-Modify-Writes if the partition is not aligned on a 4K sector as it will will reduce the performance of these drives. Each partition must be aligned to start and end on a 4K sector. The problem with Windows XP, is that it always creates the first partition starting at sector 63, which is not on a 4k boundary. When the jumpers are set, the drive adds 1 to all 512 byte sector request. For example, the OS asks for sector 63, the drive returns the contents of sector 64, thus forcing the alignment of the partitions to start at a 4K sector. The other option is to use the WD align program on Windows XP. This software re-aligns the partions and data so that they are aligned to the 4k sector. In order for FreeBSD to use these drives, you just need to ensure that all slices/partitions start at a 4k boundary. Scot From owner-freebsd-fs@FreeBSD.ORG Fri Mar 26 19:10:20 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A7923106566C; Fri, 26 Mar 2010 19:10:20 +0000 (UTC) (envelope-from avg@freebsd.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 997E08FC13; Fri, 26 Mar 2010 19:10:19 +0000 (UTC) Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id VAA09149; Fri, 26 Mar 2010 21:10:17 +0200 (EET) (envelope-from avg@freebsd.org) Received: from localhost.topspin.kiev.ua ([127.0.0.1]) by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1NvEvB-0004Mn-BA; Fri, 26 Mar 2010 21:10:17 +0200 Message-ID: <4BAD0697.3020500@freebsd.org> Date: Fri, 26 Mar 2010 21:10:15 +0200 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.24 (X11/20100321) MIME-Version: 1.0 To: freebsd-fs@freebsd.org, freebsd-geom@freebsd.org References: <4BACB3F5.7010905@freebsd.org> <201003261528.o2QFSAuI037251@chez.mckusick.com> <20100326161539.GA10618@icarus.home.lan> <790a9fff1003261121p5d72e74bw61d0a66a7d418aae@mail.gmail.com> In-Reply-To: <790a9fff1003261121p5d72e74bw61d0a66a7d418aae@mail.gmail.com> X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Subject: Re: g_vfs_open and bread(devvp, ...) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Mar 2010 19:10:20 -0000 To no one in particular: guys, I think there are better threads to have a discussion on 4K HDD sectors. This one was started on something only tangentially related. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Fri Mar 26 19:15:00 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AE93A106564A for ; Fri, 26 Mar 2010 19:15:00 +0000 (UTC) (envelope-from Axel.Rau@Chaos1.DE) Received: from mail2.chaos1.de (Mail2.Chaos1.DE [213.160.12.60]) by mx1.freebsd.org (Postfix) with ESMTP id 6FA748FC0C for ; Fri, 26 Mar 2010 19:15:00 +0000 (UTC) Received: from mail3.chaos1.de ([213.160.12.56]) by mail2.chaos1.de with esmtp (Exim 4.44) id KZWKM2-000JL0-6K for freebsd-fs@freebsd.org; Fri, 26 Mar 2010 19:56:26 +0100 Received: from axel.rau@chaos1.de by mail3.chaos1.de (Archiveopteryx 3.1.3) with esmtpsa id 1269629784-93469-93468/5/60; Fri, 26 Mar 2010 18:56:24 +0000 Message-Id: From: Axel Rau To: freebsd-fs@freebsd.org Content-Type: text/plain; format=flowed; delsp=yes Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Apple Message framework v936) Date: Fri, 26 Mar 2010 19:56:23 +0100 X-Mailer: Apple Mail (2.936) Subject: question about absolute sector address X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Mar 2010 19:15:00 -0000 Hi, what is the absolute address of the 1st sector of, say, da0s0a? In fdisk, 63 are used and bsdlabel shows 16 in front of the 1st slice. Is the answer 63+16=3D79? Background: I want to adjust my slices in multiples of SSD cells. Thanks, Axel =2D-- axel.rau@chaos1.de PGP-Key:29E99DD6 +49 151 2300 9283 computing @ =20 chaos claudius From owner-freebsd-fs@FreeBSD.ORG Sat Mar 27 22:40:01 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A64841065675 for ; Sat, 27 Mar 2010 22:40:01 +0000 (UTC) (envelope-from ari@ish.com.au) Received: from fish.ish.com.au (eth5921.nsw.adsl.internode.on.net [59.167.240.32]) by mx1.freebsd.org (Postfix) with ESMTP id 358648FC20 for ; Sat, 27 Mar 2010 22:40:00 +0000 (UTC) Received: from [10.29.62.2] (port=52921 helo=Aris-MacBook-Pro.local) by fish.ish.com.au with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from ) id 1NveUo-0004eS-0t for freebsd-fs@freebsd.org; Sun, 28 Mar 2010 09:28:46 +1100 Message-ID: <4BAE869C.6070601@ish.com.au> Date: Sun, 28 Mar 2010 09:28:44 +1100 From: Aristedes Maniatis User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.8) Gecko/20100227 Lightning/1.0b1 Thunderbird/3.0.3 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: boot code compatibility with ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 27 Mar 2010 22:40:01 -0000 FreeBSD has recently been able to boot directly from ZFS as deailed here: http://wiki.freebsd.org/RootOnZFS Naturally having functional boot code has been key to this, but it is very confusing as to why sometimes I can get FreeBSD to boot nicely and sometimes not. From comments in threads and my experience, I've got the following rough timeline: FreeBSD 7.1: ZFS version 6, no ZFS boot capability FreeBSD 7.2: ZFS version 6, zfsboot only, boot from single vdev pool only FreeBSD 7.3: ZFS version 13, zfsboot broken, no ZFS boot capability FreeBSD 8.0: ZFS version 13, gptzfsboot and zfsboot, boot from mirrored pool or single vdev, RAIDZ not supported, degraded pool not supported My questions: 1. Is the above list correct? 2. What is the situation for the two stable branches? 3. In an effort to get a machine booting properly I tried installing the boot code from 8-STABLE with a 8.0 kernel. That didn't work. What is the situation with backward and forward compatibility of the boot code? Will there come a time when an OS update will require an update of the boot code? If someone were able to put these answers on the wiki page as well, I believe this would help others. Thanks Ari Maniatis -- --------------------------> Aristedes Maniatis ish http://www.ish.com.au Level 1, 30 Wilson Street Newtown 2042 Australia phone +61 2 9550 5001 fax +61 2 9550 4001 GPG fingerprint CBFB 84B4 738D 4E87 5E5C 5EFA EF6A 7D2E 3E49 102A -- --------------------------> Aristedes Maniatis ish http://www.ish.com.au Level 1, 30 Wilson Street Newtown 2042 Australia phone +61 2 9550 5001 fax +61 2 9550 4001 GPG fingerprint CBFB 84B4 738D 4E87 5E5C 5EFA EF6A 7D2E 3E49 102A From owner-freebsd-fs@FreeBSD.ORG Sat Mar 27 23:42:27 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E6F6D1065670 for ; Sat, 27 Mar 2010 23:42:26 +0000 (UTC) (envelope-from mashtizadeh@gmail.com) Received: from mail-yx0-f185.google.com (mail-yx0-f185.google.com [209.85.210.185]) by mx1.freebsd.org (Postfix) with ESMTP id 98C258FC13 for ; Sat, 27 Mar 2010 23:42:26 +0000 (UTC) Received: by yxe15 with SMTP id 15so5661681yxe.7 for ; Sat, 27 Mar 2010 16:42:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:received:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=+E/WWcf0BYGoZW+5sMwW3UdJhioyjqQSYt1U5I9Hl/0=; b=PGFU0loBTrDwAXSkqOmzi7nGPhndlv0eJLf7d/3fO4xem6c9rew09DXXvyZ3sXpzhx wpUIuVRNDiB3+3Oy4la/y4DToW2GI+uOKouDz5FvfrzMFchc3mFoxO9flwjs2sAcfxPP m/nsgTv+lW7Xg84KMdbFfFD5e3RTPlMXh/L04= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=VcHy0JBachy8QmZt9YR00nKeCo2jXxWP2QOPDprHD8NzZNMNO/XhU+MC3WAF6EwJ2d o4tAdRl1fuhBiEbxWFeojcMKoquXFn78+b0EF+eZG3oTslmUfWbqYs5RXf4XnZPQlPP9 GQBWb1PrIw02NFtYGtXUdSjZD1tZ53flCt1f4= MIME-Version: 1.0 Received: by 10.231.171.196 with HTTP; Sat, 27 Mar 2010 16:15:07 -0700 (PDT) In-Reply-To: <4BAE869C.6070601@ish.com.au> References: <4BAE869C.6070601@ish.com.au> Date: Sat, 27 Mar 2010 16:15:07 -0700 Received: by 10.100.235.11 with SMTP id i11mr1113221anh.128.1269731707133; Sat, 27 Mar 2010 16:15:07 -0700 (PDT) Message-ID: <440b3e931003271615w1ad38306i55d01b059a306af0@mail.gmail.com> From: Ali Mashtizadeh To: Aristedes Maniatis Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: boot code compatibility with ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 27 Mar 2010 23:42:27 -0000 HEAD and 8-STABLE have support for booting off of raidz volumes. 8-STABLE this was merged in 204251. On Sat, Mar 27, 2010 at 3:28 PM, Aristedes Maniatis wrote: > FreeBSD has recently been able to boot directly from ZFS as deailed here: > > =C2=A0http://wiki.freebsd.org/RootOnZFS > > Naturally having functional boot code has been key to this, but it is ver= y > confusing as to why sometimes I can get FreeBSD to boot nicely and someti= mes > not. From comments in threads and my experience, I've got the following > rough timeline: > > FreeBSD 7.1: ZFS version 6, no ZFS boot capability > FreeBSD 7.2: ZFS version 6, zfsboot only, boot from single vdev pool only > FreeBSD 7.3: ZFS version 13, zfsboot broken, no ZFS boot capability > > FreeBSD 8.0: ZFS version 13, gptzfsboot and zfsboot, boot from mirrored p= ool > or single vdev, RAIDZ not supported, degraded pool not supported > > > My questions: > > 1. Is the above list correct? > > 2. What is the situation for the two stable branches? > > 3. In an effort to get a machine booting properly I tried installing the > boot code from 8-STABLE with a 8.0 kernel. That didn't work. What is the > situation with backward and forward compatibility of the boot code? Will > there come a time when an OS update will require an update of the boot co= de? > > > If someone were able to put these answers on the wiki page as well, I > believe this would help others. > > > Thanks > Ari Maniatis > > -- > --------------------------> > Aristedes Maniatis > ish > http://www.ish.com.au > Level 1, 30 Wilson Street Newtown 2042 Australia > phone +61 2 9550 5001 =C2=A0 fax +61 2 9550 4001 > GPG fingerprint CBFB 84B4 738D 4E87 5E5C =C2=A05EFA EF6A 7D2E 3E49 102A > > -- > --------------------------> > Aristedes Maniatis > ish > http://www.ish.com.au > Level 1, 30 Wilson Street Newtown 2042 Australia > phone +61 2 9550 5001 =C2=A0 fax +61 2 9550 4001 > GPG fingerprint CBFB 84B4 738D 4E87 5E5C =C2=A05EFA EF6A 7D2E 3E49 102A > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > --=20 Ali Mashtizadeh =D8=B9=D9=84=DB=8C =D9=85=D8=B4=D8=AA=DB=8C =D8=B2=D8=A7=D8=AF=D9=87