From owner-freebsd-fs@FreeBSD.ORG  Sun Sep 16 09:33:19 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 31EBF106566C
	for <freebsd-fs@freebsd.org>; Sun, 16 Sep 2012 09:33:19 +0000 (UTC)
	(envelope-from c.kworr@gmail.com)
Received: from mail-bk0-f54.google.com (mail-bk0-f54.google.com
	[209.85.214.54])
	by mx1.freebsd.org (Postfix) with ESMTP id A4A238FC14
	for <freebsd-fs@freebsd.org>; Sun, 16 Sep 2012 09:33:18 +0000 (UTC)
Received: by bkcje9 with SMTP id je9so1837378bkc.13
	for <freebsd-fs@freebsd.org>; Sun, 16 Sep 2012 02:33:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=message-id:date:from:user-agent:mime-version:to:cc:subject
	:references:in-reply-to:content-type:content-transfer-encoding;
	bh=f2QpA+Z4v22Sdk0RmRT1MR2+WlYRw05fnzr4d7NPqFY=;
	b=Zc01qlzM7Wajge04pP3tks9nI9cNvmVGXAboihMvV4YHNKztXbcF0TItGwJmk3Gdys
	68Vh0jUovmwi5NnDX8qZ5LqEOWtlKaOWdVkdv0+Cx4MuIiMjDyuhalmMMJA2yEKqgsK6
	dhmT1YfKkRNJB8Uf2t3l4b9dERXcCHQkSaAq6DpA5jae7Mppl0IRBD0hSyGJGHhFhHp9
	ZjEBiUwLGCM7I6V+tV3GITMfDBxu+VwHD4xoGkKmTf9BD5mFp225Xl4F5zbMmVV6qert
	tWhNB5BjhxGrzF/w64JmnXY/BFt8GYtpANgLLaBwVs6b04gP6QeIhi5inXL0HRLzobWH
	zSyA==
Received: by 10.204.130.209 with SMTP id u17mr3312910bks.35.1347787997703;
	Sun, 16 Sep 2012 02:33:17 -0700 (PDT)
Received: from limbo.xim.bz ([46.150.100.6])
	by mx.google.com with ESMTPS id 25sm3354060bkx.9.2012.09.16.02.33.14
	(version=SSLv3 cipher=OTHER); Sun, 16 Sep 2012 02:33:16 -0700 (PDT)
Message-ID: <50559CD8.1070700@gmail.com>
Date: Sun, 16 Sep 2012 12:33:12 +0300
From: Volodymyr Kostyrko <c.kworr@gmail.com>
User-Agent: Mozilla/5.0 (X11; FreeBSD i386;
	rv:15.0) Gecko/20120911 Thunderbird/15.0.1
MIME-Version: 1.0
To: =?UTF-8?B?IlRob21hcyBHw7ZsbG5lciAoTmV3c2xldHRlciki?= <Newsletter@goelli.de>
References: <001a01cd900d$bcfcc870$36f65950$@goelli.de>
	<504F282D.8030808@gmail.com>
	<000a01cd90aa$0a277310$1e765930$@goelli.de>
	<5050461A.9050608@gmail.com>
	<000001cd9239$ed734c80$c859e580$@goelli.de>
	<5052EC5D.4060403@gmail.com>
	<000a01cd9274$0aa0bba0$1fe232e0$@goelli.de>
	<505322C9.70200@gmail.com>
	<000001cd9377$e9e9b010$bdbd1030$@goelli.de>
In-Reply-To: <000001cd9377$e9e9b010$bdbd1030$@goelli.de>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Cc: freebsd-fs@freebsd.org
Subject: Re: AW: AW: AW: AW: ZFS: Corrupted pool metadata after adding vdev
 to a pool - no opportunity to rescue data from healthy vdevs?
 Remove a vdev? Rewrite metadata?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Sep 2012 09:33:19 -0000

15.09.2012 22:25, Thomas Göllner (Newsletter) wrote:
>>>>> I also think there is no way to write new or edit the lables of the discs?
>>>>
>>>> This idea is called Block Pointer Rewrite and is not implemented yet. I have found no code to do that.
>>>
>>> I thought it may come to this -.- Because during my last reading I had to learn, that I have to find the "root block pointer" to recover the maybe overwritten labels... As it changes place and content with each copy on write process (each txg?) it will be a search for the needle in the haystack...
>>
>> Not at all, what are you referring to is MOS and the one is contained in each UberBlock.
>
> So as this thing is so far beyond my skills, I am sad to point out that I have to give up here. Without someone who will take me by the hand and say what to do step by step I think recovering/rewriting the right labels of my discs is something I will not be able to do within one year or so. It's a pity that ZFS still has no tools for recovering metadata built in. This would be a task to think of in future.
>
> Thanks again for your help Volodymyr. It is a bit of consolation that at least I know now that I have done everything I could.

If you can afford putting your drives aside you can try to wait before 
some tool occasionally emerges. I will not promise anything but I'm 
slowly making some progress with my script. I'm motivated about that as 
I have broken pool with photos. Trying to import that pool is causing a 
core dump on any system I tested like OpenSolaris, Illumos or 
SystemRescueCD.

-- 
Sphinx of black quartz judge my vow.

From owner-freebsd-fs@FreeBSD.ORG  Sun Sep 16 21:41:33 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 845C6106566B
	for <freebsd-fs@freebsd.org>; Sun, 16 Sep 2012 21:41:33 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
	[131.104.91.44])
	by mx1.freebsd.org (Postfix) with ESMTP id 435628FC16
	for <freebsd-fs@freebsd.org>; Sun, 16 Sep 2012 21:41:32 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ap4EAO5GVlCDaFvO/2dsb2JhbAA+BxaFcbcSgkqBCwINGQJfiBMLmSeOQ5F0gSGKIYU1gRIDlWKBFI8NgwKBPiIb
X-IronPort-AV: E=Sophos;i="4.80,432,1344225600"; d="scan'208";a="179287787"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 16 Sep 2012 17:41:25 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id EDA5579451
	for <freebsd-fs@freebsd.org>; Sun, 16 Sep 2012 17:41:25 -0400 (EDT)
Date: Sun, 16 Sep 2012 17:41:25 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: FS List <freebsd-fs@freebsd.org>
Message-ID: <1531430179.669311.1347831685957.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Subject: testing/review of atomic export update patch
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Sep 2012 21:41:33 -0000

Hi,

There is a simple patch at:
  http://people.freebsd.org/~rmacklem/atomic-export.patch
that can be applied to a kernel + mountd, so that the new
nfsd can be suspended by mountd while the exports are being
reloaded. It adds a new "-S" flag to mountd to enable this.
(This avoids the long standing bug where clients receive ESTALE
 replies to RPCs while mountd is reloading exports.)

I am emailing to request testing and/or review of this patch
by anyone who is interested. (One site has reported that the
patch worked well for them and another is testing it as I type
this.)

Thanks in advance for any comments, rick

From owner-freebsd-fs@FreeBSD.ORG  Mon Sep 17 07:17:16 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9E61D106566B
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2012 07:17:16 +0000 (UTC)
	(envelope-from freebsd@pki2.com)
Received: from btw.pki2.com (btw.pki2.com [IPv6:2001:470:a:6fd::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 6827A8FC08
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2012 07:17:16 +0000 (UTC)
Received: from [127.0.0.1] (localhost [127.0.0.1])
	by btw.pki2.com (8.14.5/8.14.5) with ESMTP id q8H7H6EU010887
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2012 00:17:06 -0700 (PDT)
	(envelope-from freebsd@pki2.com)
From: Dennis Glatting <freebsd@pki2.com>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset="ISO-8859-1"
Date: Mon, 17 Sep 2012 00:17:06 -0700
Message-ID: <1347866226.5619.1.camel@btw.pki2.com>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port 
Content-Transfer-Encoding: 7bit
X-yoursite-MailScanner-Information: Dennis Glatting
X-yoursite-MailScanner-ID: q8H7H6EU010887
X-yoursite-MailScanner: Found to be clean
X-MailScanner-From: freebsd@pki2.com
Subject: How to clear this ZFS error?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Sep 2012 07:17:16 -0000

There was a system failure when I replaced the disk. How do I rid
myself of 15107145887069556078?

 
config:

	NAME                        STATE     READ WRITE CKSUM
	disk-2                      DEGRADED     0     0     0
	  raidz2-0                  DEGRADED     0     0     0
	    replacing-0             DEGRADED     0     0     0
	      da5                   ONLINE       0     0     0
	      15107145887069556078  OFFLINE      0     0     0
was /dev/da1.nop
	    da0                     ONLINE       0     0     0
	    da4                     ONLINE       0     0     0
	    da1                     ONLINE       0     0     0
	    da2                     ONLINE       0     0     0
	cache
	  ada3                      ONLINE       0     0     0

errors: No known data errors


From owner-freebsd-fs@FreeBSD.ORG  Mon Sep 17 07:21:22 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 87A5F1065670
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2012 07:21:22 +0000 (UTC)
	(envelope-from edho@myconan.net)
Received: from mail-wg0-f42.google.com (mail-wg0-f42.google.com [74.125.82.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 101398FC0A
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2012 07:21:21 +0000 (UTC)
Received: by wgbfm10 with SMTP id fm10so1948340wgb.1
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2012 00:21:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=myconan.net; s=myconan;
	h=mime-version:in-reply-to:references:from:date:message-id:subject:to
	:cc:content-type;
	bh=j3aqvucq86q1edr7tiBf3X0UkHotxw1ctGH2nUC0v9o=;
	b=gQX4k7ExLKPt6o315faNZ203OLeGY5yVvVcuI+UFb6Df2N5xwURvex6U6Ptxqz0qWG
	fu11fQyRZix0df4oBdJjgGB1UxSi5hsVYX9h3BEW4673hFiZZzdDhaButRUt0b4SQdXt
	pKeqa4qposngEYEjyMNXp39YQ+bBkISyDsIeo=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=google.com; s=20120113;
	h=mime-version:in-reply-to:references:from:date:message-id:subject:to
	:cc:content-type:x-gm-message-state;
	bh=j3aqvucq86q1edr7tiBf3X0UkHotxw1ctGH2nUC0v9o=;
	b=MoKz01zLEADxdt1F46/qPqphPWVpgbzcFKds45UIiDAm2YVaR5p7DYWkggf7bS4qet
	JKrxktZ1r5XvVF0rRx1ZfKQITw6s3Mi/9x2jBgykxh1s0JDY9+iwaerFNoVhhJNs9cC6
	2R54evLWMVdhvPsgnwumzpkNOvNXTt04L97AKWuhG7VgMkzBzc0rGIfMzc5OgvZKh562
	tW9IzoAkIsAxwQjBEuPzXS0O1JJNSWQ6d3YVuFXYb4fDJXN7+cOSlbVz4RKaUbhpFcLA
	xlxrlZCeHf2AKgDfTfMLw1dwwByu90lCepbyjF90mvSkIN3TqhhjFzWI5Zq818imA4qu
	qK+Q==
Received: by 10.180.102.136 with SMTP id fo8mr14047541wib.19.1347866480775;
	Mon, 17 Sep 2012 00:21:20 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.194.63.238 with HTTP; Mon, 17 Sep 2012 00:20:50 -0700 (PDT)
In-Reply-To: <1347866226.5619.1.camel@btw.pki2.com>
References: <1347866226.5619.1.camel@btw.pki2.com>
From: Edho Arief <edho@myconan.net>
Date: Mon, 17 Sep 2012 14:20:50 +0700
Message-ID: <CAAtReCkiqOFEq3ZrSCeFB1AwGzH7gKy1c0gOV7zgq4Wd7rnk6Q@mail.gmail.com>
To: Dennis Glatting <freebsd@pki2.com>
Content-Type: text/plain; charset=UTF-8
X-Gm-Message-State: ALoCoQl2PaOzuIEGbk65ksg/J7sxjvf2+q9vJ5N6tl1k1GadrpNyKO08ob/FImwGSkydWJ8TeL3A
Cc: freebsd-fs@freebsd.org
Subject: Re: How to clear this ZFS error?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Sep 2012 07:21:22 -0000

On Mon, Sep 17, 2012 at 2:17 PM, Dennis Glatting <freebsd@pki2.com> wrote:
> There was a system failure when I replaced the disk. How do I rid
> myself of 15107145887069556078?
>

have you tried `zpool detach`?

From owner-freebsd-fs@FreeBSD.ORG  Mon Sep 17 07:27:12 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D7C5A106566B
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2012 07:27:12 +0000 (UTC)
	(envelope-from freebsd@penx.com)
Received: from btw.pki2.com (btw.pki2.com [IPv6:2001:470:a:6fd::2])
	by mx1.freebsd.org (Postfix) with ESMTP id A32138FC08
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2012 07:27:12 +0000 (UTC)
Received: from [127.0.0.1] (localhost [127.0.0.1])
	by btw.pki2.com (8.14.5/8.14.5) with ESMTP id q8H7R5i1046512;
	Mon, 17 Sep 2012 00:27:05 -0700 (PDT)
	(envelope-from freebsd@penx.com)
From: Dennis Glatting <freebsd@penx.com>
To: Edho Arief <edho@myconan.net>
In-Reply-To: <CAAtReCkiqOFEq3ZrSCeFB1AwGzH7gKy1c0gOV7zgq4Wd7rnk6Q@mail.gmail.com>
References: <1347866226.5619.1.camel@btw.pki2.com>
	<CAAtReCkiqOFEq3ZrSCeFB1AwGzH7gKy1c0gOV7zgq4Wd7rnk6Q@mail.gmail.com>
Content-Type: text/plain; charset="us-ascii"
Date: Mon, 17 Sep 2012 00:27:05 -0700
Message-ID: <1347866825.7373.2.camel@btw.pki2.com>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port 
Content-Transfer-Encoding: 7bit
X-yoursite-MailScanner-Information: Dennis Glatting
X-yoursite-MailScanner-ID: q8H7R5i1046512
X-yoursite-MailScanner: Found to be clean
X-MailScanner-From: freebsd@penx.com
Cc: freebsd-fs@freebsd.org
Subject: Re: How to clear this ZFS error?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Sep 2012 07:27:12 -0000

On Mon, 2012-09-17 at 14:20 +0700, Edho Arief wrote:
> On Mon, Sep 17, 2012 at 2:17 PM, Dennis Glatting <freebsd@pki2.com> wrote:
> > There was a system failure when I replaced the disk. How do I rid
> > myself of 15107145887069556078?
> >
> 
> have you tried `zpool detach`?


Extremely helpful. Worked. Thanks.


From owner-freebsd-fs@FreeBSD.ORG  Mon Sep 17 08:13:44 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id D9FCC106566B
	for <freebsd-fs@FreeBSD.org>; Mon, 17 Sep 2012 08:13:44 +0000 (UTC)
	(envelope-from mattblists@icritical.com)
Received: from mail1.icritical.com (mail1.icritical.com [93.95.13.41])
	by mx1.freebsd.org (Postfix) with SMTP id 322418FC15
	for <freebsd-fs@FreeBSD.org>; Mon, 17 Sep 2012 08:13:43 +0000 (UTC)
Received: (qmail 23348 invoked from network); 17 Sep 2012 08:00:22 -0000
Received: from localhost (127.0.0.1)
	by mail1.icritical.com with SMTP; 17 Sep 2012 08:00:22 -0000
Received: (qmail 23339 invoked by uid 599); 17 Sep 2012 08:00:21 -0000
Received: from unknown (HELO PDC002.icritical.int) (212.57.254.146)
	by mail1.icritical.com (qpsmtpd/0.28) with ESMTP;
	Mon, 17 Sep 2012 09:00:21 +0100
Message-ID: <5056D896.8060607@icritical.com>
Date: Mon, 17 Sep 2012 09:00:22 +0100
From: Matt Burke <mattblists@icritical.com>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:15.0) Gecko/20120906 Thunderbird/15.0
MIME-Version: 1.0
To: <freebsd-fs@FreeBSD.org>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
X-TLS-Incoming: YES
X-Virus-Scanned: by iCritical at mail1.icritical.com
Cc: 
Subject: [patch] DTrace disk IO
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Sep 2012 08:13:45 -0000

I've recently been trying to get a grip on measuring disk IO latency (per transaction), and have found it to be rather difficult given the asynchronous nature of the beast, and also I can't find a way of translating the bio start of transaction timestamps to anything I can use in DTrace when pulling them out.

So I knocked up this little patch against releng/9.1 to put a couple of DTrace probes in the right places to pick up crucial data like the now+then timestamps while they're present.

The predicate on the probe is needed to pick up the right firing - for reasons I've not been able to fathom because gstat et al give correct data, devstat_end_transaction is called multiple times for a given operation - from g_disk_done(), then g_io_deliver() - without anything useful in the bio struct (device name, number, etc). There also seem to be a lot of firings coming from the following path which I don't understand, again without anything useful in the bio:
              kernel`devstat_end_transaction+0x13b
              kernel`g_io_deliver+0x1b0
              kernel`g_io_schedule_up+0xa6
              kernel`g_up_procbody+0x5c
              kernel`fork_exit+0x11f
              kernel`0xffffffff80c1c3fe

Catching flushes is also proving problematic. It would seem that devstat_end_transaction_bio() is called, but the bio and devstat structs are virtually empty. bp->bio_dev, bp->bio_disk, ds->device_name, ds->device_number, ds_unit_number are all null/empty, so I know that one disk has flushed, and I know how long it took, but I can't find out which disk it was.


Thoughts?


Index: sys/kern/subr_devstat.c
===================================================================
--- sys/kern/subr_devstat.c	(revision 240481)
+++ sys/kern/subr_devstat.c	(working copy)
@@ -29,6 +29,7 @@
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
+#include "opt_kdtrace.h"
 #include <sys/param.h>
 #include <sys/kernel.h>
 #include <sys/systm.h>
@@ -41,9 +42,22 @@
 #include <sys/conf.h>
 #include <vm/vm.h>
 #include <vm/pmap.h>
+#include <sys/sdt.h>
 
 #include <machine/atomic.h>
 
+SDT_PROVIDER_DEFINE(devstat);
+SDT_PROBE_DEFINE(devstat, subr_devstat, devstat_end_transaction, stat, stat);
+SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction, stat, 0, "struct devstat *");
+SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction, stat, 1, "uint32_t");
+SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction, stat, 2, "struct bintime *");
+SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction, stat, 3, "struct bintime *");
+SDT_PROBE_DEFINE(devstat, subr_devstat, devstat_end_transaction_bio, stat, stat);
+SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction_bio, stat, 0, "struct devstat *");
+SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction_bio, stat, 1, "struct bio *");
+
+
+
 static int devstat_num_devs;
 static long devstat_generation = 1;
 static int devstat_version = DEVSTAT_VERSION;
@@ -312,6 +326,8 @@
 
 	ds->end_count++;
 	atomic_add_rel_int(&ds->sequence0, 1);
+
+	SDT_PROBE(devstat, subr_devstat, devstat_end_transaction, stat, ds, bytes, now, then, 0);
 }
 
 void
@@ -332,6 +348,8 @@
 	else 
 		flg = DEVSTAT_NO_DATA;
 
+	SDT_PROBE(devstat, subr_devstat, devstat_end_transaction_bio, stat, ds, bp, 0, 0, 0);
+
 	devstat_end_transaction(ds, bp->bio_bcount - bp->bio_resid,
 				DEVSTAT_TAG_SIMPLE, flg, NULL, &bp->bio_t0);
 }


Sample dtrace script:
=====================

BEGIN
{
        bio_cmds[1] =  "READ";
        bio_cmds[2] =  "WRITE";
        bio_cmds[4] =  "DELETE";
        bio_cmds[8] =  "GETATTR";
        bio_cmds[16] = "FLUSH";
}

devstat::devstat_end_transaction_bio:
{
        self->bio = args[1];
}

devstat::devstat_end_transaction:
/self->bio && args[0]->device_number/
{
        diff_frac = args[2]->frac - args[3]->frac;
        diff_ufrac = (diff_frac < 0) ? (args[3]->frac - args[2]->frac) : diff_frac;
        diff = (1000000000*(diff_ufrac>>32))>>32;

        printf("%d\t%s%d\t%s\t%d\t0.%09d\n", timestamp,
                args[0]->device_name, args[0]->unit_number,
                bio_cmds[self->bio->bio_cmd],
                args[1],
                diff
                );
}


-- 
The information contained in this message is confidential and intended for the addressee only. If you have received this message in error, or there are any problems with its content, please contact the sender. 

iCritical is a trading name of Critical Software Ltd. Registered in England: 04909220.
Registered Office: IC2, Keele Science Park, Keele, Staffordshire, ST5 5NH.

This message has been scanned for security threats by iCritical. www.icritical.com


From owner-freebsd-fs@FreeBSD.ORG  Mon Sep 17 11:07:05 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 28C081065689
	for <freebsd-fs@FreeBSD.org>; Mon, 17 Sep 2012 11:07:05 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 110628FC15
	for <freebsd-fs@FreeBSD.org>; Mon, 17 Sep 2012 11:07:05 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q8HB751c004430
	for <freebsd-fs@FreeBSD.org>; Mon, 17 Sep 2012 11:07:05 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q8HB74uD004428
	for freebsd-fs@FreeBSD.org; Mon, 17 Sep 2012 11:07:04 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 17 Sep 2012 11:07:04 GMT
Message-Id: <201209171107.q8HB74uD004428@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
	owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@FreeBSD.org>
To: freebsd-fs@FreeBSD.org
Cc: 
Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Sep 2012 11:07:05 -0000

Note: to view an individual PR, use:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=(number).

The following is a listing of current problems submitted by FreeBSD users.
These represent problem reports covering all versions including
experimental development code and obsolete releases.


S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/171415  fs         [zfs] zfs recv fails with "cannot receive incremental 
o kern/170945  fs         [gpt] disk layout not portable between direct connect 
o kern/170914  fs         [zfs] [patch] Import patchs related with issues 3090 a
o kern/170912  fs         [zfs] [patch] unnecessarily setting DS_FLAG_INCONSISTE
o bin/170778   fs         [zfs] [panic] FreeBSD panics randomly
o kern/170680  fs         [nfs] Multiple NFS Client bug in the FreeBSD 7.4-RELEA
o kern/170497  fs         [xfs][panic] kernel will panic whenever I ls a mounted
o kern/170238  fs         [zfs] [panic] Panic when deleting data
o kern/169945  fs         [zfs] [panic] Kernel panic while importing zpool (afte
o kern/169480  fs         [zfs] ZFS stalls on heavy I/O
o kern/169398  fs         [zfs] Can't remove file with permanent error
o kern/169339  fs         panic while " : > /etc/123"
o kern/169319  fs         [zfs] zfs resilver can't complete
o kern/168947  fs         [nfs] [zfs] .zfs/snapshot directory is messed up when 
o kern/168942  fs         [nfs] [hang] nfsd hangs after being restarted (not -HU
o kern/168158  fs         [zfs] incorrect parsing of sharenfs options in zfs (fs
o kern/167979  fs         [ufs] DIOCGDINFO ioctl does not work on 8.2 file syste
o kern/167977  fs         [smbfs] mount_smbfs results are differ when utf-8 or U
o kern/167688  fs         [fusefs] Incorrect signal handling with direct_io
o kern/167685  fs         [zfs] ZFS on USB drive prevents shutdown / reboot
o kern/167612  fs         [portalfs] The portal file system gets stuck inside po
o kern/167272  fs         [zfs] ZFS Disks reordering causes ZFS to pick the wron
o kern/167260  fs         [msdosfs] msdosfs disk was mounted the second time whe
o kern/167109  fs         [zfs] [panic] zfs diff kernel panic Fatal trap 9: gene
o kern/167105  fs         [nfs] mount_nfs can not handle source exports wiht mor
o kern/167067  fs         [zfs] [panic] ZFS panics the server
o kern/167066  fs         [zfs] ZVOLs not appearing in /dev/zvol
o kern/167065  fs         [zfs] boot fails when a spare is the boot disk
o kern/167048  fs         [nfs] [patch] RELEASE-9 crash when using ZFS+NULLFS+NF
o kern/166912  fs         [ufs] [panic] Panic after converting Softupdates to jo
o kern/166851  fs         [zfs] [hang] Copying directory from the mounted UFS di
o kern/166477  fs         [nfs] NFS data corruption.
o kern/165950  fs         [ffs] SU+J and fsck problem
o kern/165923  fs         [nfs] Writing to NFS-backed mmapped files fails if flu
o kern/165521  fs         [zfs] [hang] livelock on 1 Gig of RAM with zfs when 31
o kern/165392  fs         Multiple mkdir/rmdir fails with errno 31
o kern/165087  fs         [unionfs] lock violation in unionfs
o kern/164472  fs         [ufs] fsck -B panics on particular data inconsistency
o kern/164370  fs         [zfs] zfs destroy for snapshot fails on i386 and sparc
o kern/164261  fs         [nullfs] [patch] fix panic with NFS served from NULLFS
o kern/164256  fs         [zfs] device entry for volume is not created after zfs
o kern/164184  fs         [ufs] [panic] Kernel panic with ufs_makeinode
o kern/163801  fs         [md] [request] allow mfsBSD legacy installed in 'swap'
o kern/163770  fs         [zfs] [hang] LOR between zfs&syncer + vnlru leading to
o kern/163501  fs         [nfs] NFS exporting a dir and a subdir in that dir to 
o kern/162944  fs         [coda] Coda file system module looks broken in 9.0
o kern/162860  fs         [zfs] Cannot share ZFS filesystem to hosts with a hyph
o kern/162751  fs         [zfs] [panic] kernel panics during file operations
o kern/162591  fs         [nullfs] cross-filesystem nullfs does not work as expe
o kern/162519  fs         [zfs] "zpool import" relies on buggy realpath() behavi
o kern/162362  fs         [snapshots] [panic] ufs with snapshot(s) panics when g
o kern/161968  fs         [zfs] [hang] renaming snapshot with -r including a zvo
p kern/161897  fs         [zfs] [patch] zfs partition probing causing long delay
o kern/161864  fs         [ufs] removing journaling from UFS partition fails on 
o bin/161807   fs         [patch] add option for explicitly specifying metadata 
o kern/161579  fs         [smbfs] FreeBSD sometimes panics when an smb share is 
o kern/161533  fs         [zfs] [panic] zfs receive panic: system ioctl returnin
o kern/161438  fs         [zfs] [panic] recursed on non-recursive spa_namespace_
o kern/161424  fs         [nullfs] __getcwd() calls fail when used on nullfs mou
o kern/161280  fs         [zfs] Stack overflow in gptzfsboot
o kern/161205  fs         [nfs] [pfsync] [regression] [build] Bug report freebsd
o kern/161169  fs         [zfs] [panic] ZFS causes kernel panic in dbuf_dirty
o kern/161112  fs         [ufs] [lor] filesystem LOR in FreeBSD 9.0-BETA3
o kern/160893  fs         [zfs] [panic] 9.0-BETA2 kernel panic
o kern/160860  fs         [ufs] Random UFS root filesystem corruption with SU+J 
o kern/160801  fs         [zfs] zfsboot on 8.2-RELEASE fails to boot from root-o
o kern/160790  fs         [fusefs] [panic] VPUTX: negative ref count with FUSE
o kern/160777  fs         [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/impo
o kern/160706  fs         [zfs] zfs bootloader fails when a non-root vdev exists
o kern/160591  fs         [zfs] Fail to boot on zfs root with degraded raidz2 [r
o kern/160410  fs         [smbfs] [hang] smbfs hangs when transferring large fil
o kern/160283  fs         [zfs] [patch] 'zfs list' does abort in make_dataset_ha
o kern/159930  fs         [ufs] [panic] kernel core
o kern/159402  fs         [zfs][loader] symlinks cause I/O errors
o kern/159357  fs         [zfs] ZFS MAXNAMELEN macro has confusing name (off-by-
o kern/159356  fs         [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s
o kern/159351  fs         [nfs] [patch] - divide by zero in mountnfs()
o kern/159251  fs         [zfs] [request]: add FLETCHER4 as DEDUP hash option
o kern/159077  fs         [zfs] Can't cd .. with latest zfs version
o kern/159048  fs         [smbfs] smb mount corrupts large files
o kern/159045  fs         [zfs] [hang] ZFS scrub freezes system
o kern/158839  fs         [zfs] ZFS Bootloader Fails if there is a Dead Disk
o kern/158802  fs         amd(8) ICMP storm and unkillable process.
o kern/158231  fs         [nullfs] panic on unmounting nullfs mounted over ufs o
f kern/157929  fs         [nfs] NFS slow read
o kern/157399  fs         [zfs] trouble with: mdconfig force delete && zfs strip
o kern/157179  fs         [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov
o kern/156797  fs         [zfs] [panic] Double panic with FreeBSD 9-CURRENT and 
o kern/156781  fs         [zfs] zfs is losing the snapshot directory,
p kern/156545  fs         [ufs] mv could break UFS on SMP systems
o kern/156193  fs         [ufs] [hang] UFS snapshot hangs && deadlocks processes
o kern/156039  fs         [nullfs] [unionfs] nullfs + unionfs do not compose, re
o kern/155615  fs         [zfs] zfs v28 broken on sparc64 -current
o kern/155587  fs         [zfs] [panic] kernel panic with zfs
p kern/155411  fs         [regression] [8.2-release] [tmpfs]: mount: tmpfs : No 
o kern/155199  fs         [ext2fs] ext3fs mounted as ext2fs gives I/O errors
o bin/155104   fs         [zfs][patch] use /dev prefix by default when importing
o kern/154930  fs         [zfs] cannot delete/unlink file from full volume -> EN
o kern/154828  fs         [msdosfs] Unable to create directories on external USB
o kern/154491  fs         [smbfs] smb_co_lock: recursive lock for object 1
p kern/154228  fs         [md] md getting stuck in wdrain state
o kern/153996  fs         [zfs] zfs root mount error while kernel is not located
o kern/153753  fs         [zfs] ZFS v15 - grammatical error when attempting to u
o kern/153716  fs         [zfs] zpool scrub time remaining is incorrect
o kern/153695  fs         [patch] [zfs] Booting from zpool created on 4k-sector 
o kern/153680  fs         [xfs] 8.1 failing to mount XFS partitions
o kern/153520  fs         [zfs] Boot from GPT ZFS root on HP BL460c G1 unstable
o kern/153418  fs         [zfs] [panic] Kernel Panic occurred writing to zfs vol
o kern/153351  fs         [zfs] locking directories/files in ZFS
o bin/153258   fs         [patch][zfs] creating ZVOLs requires `refreservation' 
s kern/153173  fs         [zfs] booting from a gzip-compressed dataset doesn't w
o bin/153142   fs         [zfs] ls -l outputs `ls: ./.zfs: Operation not support
o kern/153126  fs         [zfs] vdev failure, zpool=peegel type=vdev.too_small
o kern/152022  fs         [nfs] nfs service hangs with linux client [regression]
o kern/151942  fs         [zfs] panic during ls(1) zfs snapshot directory
o kern/151905  fs         [zfs] page fault under load in /sbin/zfs
o bin/151713   fs         [patch] Bug in growfs(8) with respect to 32-bit overfl
o kern/151648  fs         [zfs] disk wait bug
o kern/151629  fs         [fs] [patch] Skip empty directory entries during name 
o kern/151330  fs         [zfs] will unshare all zfs filesystem after execute a 
o kern/151326  fs         [nfs] nfs exports fail if netgroups contain duplicate 
o kern/151251  fs         [ufs] Can not create files on filesystem with heavy us
o kern/151226  fs         [zfs] can't delete zfs snapshot
o kern/151111  fs         [zfs] vnodes leakage during zfs unmount
o kern/150503  fs         [zfs] ZFS disks are UNAVAIL and corrupted after reboot
o kern/150501  fs         [zfs] ZFS vdev failure vdev.bad_label on amd64
o kern/150390  fs         [zfs] zfs deadlock when arcmsr reports drive faulted
o kern/150336  fs         [nfs] mountd/nfsd became confused; refused to reload n
o kern/149208  fs         mksnap_ffs(8) hang/deadlock
o kern/149173  fs         [patch] [zfs] make OpenSolaris <sys/nvpair.h> installa
o kern/149015  fs         [zfs] [patch] misc fixes for ZFS code to build on Glib
o kern/149014  fs         [zfs] [patch] declarations in ZFS libraries/utilities 
o kern/149013  fs         [zfs] [patch] make ZFS makefiles use the libraries fro
o kern/148504  fs         [zfs] ZFS' zpool does not allow replacing drives to be
o kern/148490  fs         [zfs]: zpool attach - resilver bidirectionally, and re
o kern/148368  fs         [zfs] ZFS hanging forever on 8.1-PRERELEASE
o kern/148138  fs         [zfs] zfs raidz pool commands freeze
o kern/147903  fs         [zfs] [panic] Kernel panics on faulty zfs device
o kern/147881  fs         [zfs] [patch] ZFS "sharenfs" doesn't allow different "
p kern/147560  fs         [zfs] [boot] Booting 8.1-PRERELEASE raidz system take 
o kern/147420  fs         [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt 
o kern/146941  fs         [zfs] [panic] Kernel Double Fault - Happens constantly
o kern/146786  fs         [zfs] zpool import hangs with checksum errors
o kern/146708  fs         [ufs] [panic] Kernel panic in softdep_disk_write_compl
o kern/146528  fs         [zfs] Severe memory leak in ZFS on i386
o kern/146502  fs         [nfs] FreeBSD 8 NFS Client Connection to Server
s kern/145712  fs         [zfs] cannot offline two drives in a raidz2 configurat
o kern/145411  fs         [xfs] [panic] Kernel panics shortly after mounting an 
f bin/145309   fs         bsdlabel: Editing disk label invalidates the whole dev
o kern/145272  fs         [zfs] [panic] Panic during boot when accessing zfs on 
o kern/145246  fs         [ufs] dirhash in 7.3 gratuitously frees hashes when it
o kern/145238  fs         [zfs] [panic] kernel panic on zpool clear tank
o kern/145229  fs         [zfs] Vast differences in ZFS ARC behavior between 8.0
o kern/145189  fs         [nfs] nfsd performs abysmally under load
o kern/144929  fs         [ufs] [lor] vfs_bio.c + ufs_dirhash.c
p kern/144447  fs         [zfs] sharenfs fsunshare() & fsshare_main() non functi
o kern/144416  fs         [panic] Kernel panic on online filesystem optimization
s kern/144415  fs         [zfs] [panic] kernel panics on boot after zfs crash
o kern/144234  fs         [zfs] Cannot boot machine with recent gptzfsboot code 
o kern/143825  fs         [nfs] [panic] Kernel panic on NFS client
o bin/143572   fs         [zfs] zpool(1): [patch] The verbose output from iostat
o kern/143212  fs         [nfs] NFSv4 client strange work ...
o kern/143184  fs         [zfs] [lor] zfs/bufwait LOR
o kern/142878  fs         [zfs] [vfs] lock order reversal
o kern/142597  fs         [ext2fs] ext2fs does not work on filesystems with real
o kern/142489  fs         [zfs] [lor] allproc/zfs LOR
o kern/142466  fs         Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re
o kern/142306  fs         [zfs] [panic] ZFS drive (from OSX Leopard) causes two 
o kern/142068  fs         [ufs] BSD labels are got deleted spontaneously
o kern/141897  fs         [msdosfs] [panic] Kernel panic. msdofs: file name leng
o kern/141463  fs         [nfs] [panic] Frequent kernel panics after upgrade fro
o kern/141305  fs         [zfs] FreeBSD ZFS+sendfile severe performance issues (
o kern/141091  fs         [patch] [nullfs] fix panics with DIAGNOSTIC enabled
o kern/141086  fs         [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS
o kern/141010  fs         [zfs] "zfs scrub" fails when backed by files in UFS2
o kern/140888  fs         [zfs] boot fail from zfs root while the pool resilveri
o kern/140661  fs         [zfs] [patch] /boot/loader fails to work on a GPT/ZFS-
o kern/140640  fs         [zfs] snapshot crash
o kern/140068  fs         [smbfs] [patch] smbfs does not allow semicolon in file
o kern/139725  fs         [zfs] zdb(1) dumps core on i386 when examining zpool c
o kern/139715  fs         [zfs] vfs.numvnodes leak on busy zfs
p bin/139651   fs         [nfs] mount(8): read-only remount of NFS volume does n
o kern/139407  fs         [smbfs] [panic] smb mount causes system crash if remot
o kern/138662  fs         [panic] ffs_blkfree: freeing free block
o kern/138421  fs         [ufs] [patch] remove UFS label limitations
o kern/138202  fs         mount_msdosfs(1) see only 2Gb
o kern/136968  fs         [ufs] [lor] ufs/bufwait/ufs (open)
o kern/136945  fs         [ufs] [lor] filedesc structure/ufs (poll)
o kern/136944  fs         [ffs] [lor] bufwait/snaplk (fsync)
o kern/136873  fs         [ntfs] Missing directories/files on NTFS volume
o kern/136865  fs         [nfs] [patch] NFS exports atomic and on-the-fly atomic
p kern/136470  fs         [nfs] Cannot mount / in read-only, over NFS
o kern/135546  fs         [zfs] zfs.ko module doesn't ignore zpool.cache filenam
o kern/135469  fs         [ufs] [panic] kernel crash on md operation in ufs_dirb
o kern/135050  fs         [zfs] ZFS clears/hides disk errors on reboot
o kern/134491  fs         [zfs] Hot spares are rather cold...
o kern/133676  fs         [smbfs] [panic] umount -f'ing a vnode-based memory dis
o kern/132960  fs         [ufs] [panic] panic:ffs_blkfree: freeing free frag
o kern/132397  fs         reboot causes filesystem corruption (failure to sync b
o kern/132331  fs         [ufs] [lor] LOR ufs and syncer
o kern/132237  fs         [msdosfs] msdosfs has problems to read MSDOS Floppy
o kern/132145  fs         [panic] File System Hard Crashes
o kern/131441  fs         [unionfs] [nullfs] unionfs and/or nullfs not combineab
o kern/131360  fs         [nfs] poor scaling behavior of the NFS server under lo
o kern/131342  fs         [nfs] mounting/unmounting of disks causes NFS to fail
o bin/131341   fs         makefs: error "Bad file descriptor"  on the mount poin
o kern/130920  fs         [msdosfs] cp(1) takes 100% CPU time while copying file
o kern/130210  fs         [nullfs] Error by check nullfs
o kern/129760  fs         [nfs] after 'umount -f' of a stale NFS share FreeBSD l
o kern/129488  fs         [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: 
o kern/129231  fs         [ufs] [patch] New UFS mount (norandom) option - mostly
o kern/129152  fs         [panic] non-userfriendly panic when trying to mount(8)
o kern/127787  fs         [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs
o bin/127270   fs         fsck_msdosfs(8) may crash if BytesPerSec is zero
o kern/127029  fs         [panic] mount(8): trying to mount a write protected zi
o kern/126287  fs         [ufs] [panic] Kernel panics while mounting an UFS file
o kern/125895  fs         [ffs] [panic] kernel: panic: ffs_blkfree: freeing free
s kern/125738  fs         [zfs] [request] SHA256 acceleration in ZFS
o kern/123939  fs         [msdosfs] corrupts new files
o kern/122380  fs         [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash
o bin/122172   fs         [fs]: amd(8) automount daemon dies on 6.3-STABLE i386,
o bin/121898   fs         [nullfs] pwd(1)/getcwd(2) fails with Permission denied
o bin/121072   fs         [smbfs] mount_smbfs(8) cannot normally convert the cha
o kern/120483  fs         [ntfs] [patch] NTFS filesystem locking changes
o kern/120482  fs         [ntfs] [patch] Sync style changes between NetBSD and F
o kern/118912  fs         [2tb] disk sizing/geometry problem with large array
o kern/118713  fs         [minidump] [patch] Display media size required for a k
o kern/118318  fs         [nfs] NFS server hangs under special circumstances
o bin/118249   fs         [ufs] mv(1): moving a directory changes its mtime
o kern/118126  fs         [nfs] [patch] Poor NFS server write performance
o kern/118107  fs         [ntfs] [panic] Kernel panic when accessing a file at N
o kern/117954  fs         [ufs] dirhash on very large directories blocks the mac
o bin/117315   fs         [smbfs] mount_smbfs(8) and related options can't mount
o kern/117158  fs         [zfs] zpool scrub causes panic if geli vdevs detach on
o bin/116980   fs         [msdosfs] [patch] mount_msdosfs(8) resets some flags f
o conf/116931  fs         lack of fsck_cd9660 prevents mounting iso images with 
o kern/116583  fs         [ffs] [hang] System freezes for short time when using 
o bin/115361   fs         [zfs] mount(8) gets into a state where it won't set/un
o kern/114955  fs         [cd9660] [patch] [request] support for mask,dirmask,ui
o kern/114847  fs         [ntfs] [patch] [request] dirmask support for NTFS ala 
o kern/114676  fs         [ufs] snapshot creation panics: snapacct_ufs2: bad blo
o bin/114468   fs         [patch] [request] add -d option to umount(8) to detach
o kern/113852  fs         [smbfs] smbfs does not properly implement DFS referral
o bin/113838   fs         [patch] [request] mount(8): add support for relative p
o bin/113049   fs         [patch] [request] make quot(8) use getopt(3) and show 
o kern/112658  fs         [smbfs] [patch] smbfs and caching problems (resolves b
o kern/111843  fs         [msdosfs] Long Names of files are incorrectly created 
o kern/111782  fs         [ufs] dump(8) fails horribly for large filesystems
s bin/111146   fs         [2tb] fsck(8) fails on 6T filesystem
o bin/107829   fs         [2TB] fdisk(8): invalid boundary checking in fdisk / w
o kern/106107  fs         [ufs] left-over fsck_snapshot after unfinished backgro
o kern/104406  fs         [ufs] Processes get stuck in "ufs" state under persist
o kern/104133  fs         [ext2fs] EXT2FS module corrupts EXT2/3 filesystems
o kern/103035  fs         [ntfs] Directories in NTFS mounted disc images appear 
o kern/101324  fs         [smbfs] smbfs sometimes not case sensitive when it's s
o kern/99290   fs         [ntfs] mount_ntfs ignorant of cluster sizes
s bin/97498    fs         [request] newfs(8) has no option to clear the first 12
o kern/97377   fs         [ntfs] [patch] syntax cleanup for ntfs_ihash.c
o kern/95222   fs         [cd9660] File sections on ISO9660 level 3 CDs ignored
o kern/94849   fs         [ufs] rename on UFS filesystem is not atomic
o bin/94810    fs         fsck(8) incorrectly reports 'file system marked clean'
o kern/94769   fs         [ufs] Multiple file deletions on multi-snapshotted fil
o kern/94733   fs         [smbfs] smbfs may cause double unlock
o kern/93942   fs         [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D
o kern/92272   fs         [ffs] [hang] Filling a filesystem while creating a sna
o kern/91134   fs         [smbfs] [patch] Preserve access and modification time 
a kern/90815   fs         [smbfs] [patch] SMBFS with character conversions somet
o kern/88657   fs         [smbfs] windows client hang when browsing a samba shar
o kern/88555   fs         [panic] ffs_blkfree: freeing free frag on AMD 64
o kern/88266   fs         [smbfs] smbfs does not implement UIO_NOCOPY and sendfi
o bin/87966    fs         [patch] newfs(8): introduce -A flag for newfs to enabl
o kern/87859   fs         [smbfs] System reboot while umount smbfs.
o kern/86587   fs         [msdosfs] rm -r /PATH fails with lots of small files
o bin/85494    fs         fsck_ffs: unchecked use of cg_inosused macro etc.
o kern/80088   fs         [smbfs] Incorrect file time setting on NTFS mounted vi
o bin/74779    fs         Background-fsck checks one filesystem twice and omits 
o kern/73484   fs         [ntfs] Kernel panic when doing `ls` from the client si
o bin/73019    fs         [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino
o kern/71774   fs         [ntfs] NTFS cannot "see" files on a WinXP filesystem
o bin/70600    fs         fsck(8) throws files away when it can't grow lost+foun
o kern/68978   fs         [panic] [ufs] crashes with failing hard disk, loose po
o kern/65920   fs         [nwfs] Mounted Netware filesystem behaves strange
o kern/65901   fs         [smbfs] [patch] smbfs fails fsx write/truncate-down/tr
o kern/61503   fs         [smbfs] mount_smbfs does not work as non-root
o kern/55617   fs         [smbfs] Accessing an nsmb-mounted drive via a smb expo
o kern/51685   fs         [hang] Unbounded inode allocation causes kernel to loc
o kern/36566   fs         [smbfs] System reboot with dead smb mount and umount
o bin/27687    fs         fsck(8) wrapper is not properly passing options to fsc
o kern/18874   fs         [2TB] 32bit NFS servers export wrong negative values t

289 problems total.


From owner-freebsd-fs@FreeBSD.ORG  Mon Sep 17 12:23:34 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 267F21065670
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2012 12:23:34 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id 935128FC08
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2012 12:23:32 +0000 (UTC)
Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q8HCNcqi057349;
	Mon, 17 Sep 2012 15:23:38 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id
	q8HCNP15037012; Mon, 17 Sep 2012 15:23:25 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q8HCNPGK037011; 
	Mon, 17 Sep 2012 15:23:25 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Mon, 17 Sep 2012 15:23:25 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Message-ID: <20120917122325.GR37286@deviant.kiev.zoral.com.ua>
References: <1531430179.669311.1347831685957.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="vk2EvGhio7iZz8DU"
Content-Disposition: inline
In-Reply-To: <1531430179.669311.1347831685957.JavaMail.root@erie.cs.uoguelph.ca>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: FS List <freebsd-fs@freebsd.org>
Subject: Re: testing/review of atomic export update patch
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Sep 2012 12:23:34 -0000


--vk2EvGhio7iZz8DU
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Sep 16, 2012 at 05:41:25PM -0400, Rick Macklem wrote:
> Hi,
>=20
> There is a simple patch at:
>   http://people.freebsd.org/~rmacklem/atomic-export.patch
> that can be applied to a kernel + mountd, so that the new
> nfsd can be suspended by mountd while the exports are being
> reloaded. It adds a new "-S" flag to mountd to enable this.
> (This avoids the long standing bug where clients receive ESTALE
>  replies to RPCs while mountd is reloading exports.)

This looks simple, but also somewhat worrisome. What would happen
if the mountd crashes after nfsd suspension is requested, but before
resume was performed ?

Might be, mountd should check for suspended nfsd on start and unsuspend
it, if some flag is specified ?

--vk2EvGhio7iZz8DU
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAlBXFjwACgkQC3+MBN1Mb4h/OACeIEjMZo6AWDlO0dSHDCrkncG6
oZYAnjVapZW44ulwTmWudOhlwpCCFUEF
=U8MR
-----END PGP SIGNATURE-----

--vk2EvGhio7iZz8DU--

From owner-freebsd-fs@FreeBSD.ORG  Mon Sep 17 12:34:24 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 751C0106564A
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2012 12:34:24 +0000 (UTC)
	(envelope-from olivier@gid0.org)
Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com
	[209.85.217.182])
	by mx1.freebsd.org (Postfix) with ESMTP id EF0908FC08
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2012 12:34:23 +0000 (UTC)
Received: by lbbgg13 with SMTP id gg13so5094755lbb.13
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2012 05:34:22 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=google.com; s=20120113;
	h=mime-version:date:message-id:subject:from:to:content-type
	:x-gm-message-state;
	bh=tg2K/0LTxJQ0+88i4mopVB0VWnoKZIz7tbV6QfvNMBw=;
	b=A7kDnc7WIJlXUsYQQm50X+aUiug3aP64rWzuJ19swobFk3jo9YOzUqUMu+w9pztthr
	5m7T9Au5dyOF37H5cjBQ7uhwnqUjnygnBtyQopGobEkuKGsbDoXOoh3DVOWQQcwjnd31
	a+8o7KQ+epxnPBBfJdPIDEZZ8QQ/0ftp4ZUTeX5SyqylwuzkmKa4MYv0Ab4hTPkO91XU
	gxnfY7+bnEp4JoY4kkbeLmXhbJOSkiwQqJqCC4pxaB3D980KeA/MZTDaCB4MK74FlROX
	pqx673CgTd77dqUnv6CO3fxuqrgUGuibGREtb19XLS8psHDW2jdPE0h4Zte8gT7eNYD6
	5zIA==
MIME-Version: 1.0
Received: by 10.152.113.165 with SMTP id iz5mr4756173lab.48.1347885262612;
	Mon, 17 Sep 2012 05:34:22 -0700 (PDT)
Received: by 10.112.2.36 with HTTP; Mon, 17 Sep 2012 05:34:22 -0700 (PDT)
Date: Mon, 17 Sep 2012 14:34:22 +0200
Message-ID: <CABzXLYMO3NkNws_8eLE21qMxOiTBK5zq9m_Nb_WrRA_fu=D1_Q@mail.gmail.com>
From: Olivier Smedts <olivier@gid0.org>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
X-Gm-Message-State: ALoCoQkMT4qu+qvyeE6UAJ0e+RFcr3axUalvUXwG0pMRFAdH9fN5P0KpjLOd9b9yLsBwvpxZVWcw
Subject: zpool add log to root pool
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Sep 2012 12:34:24 -0000

Hello ZFS folks,

Is there anyone here using a separate log device (or "ZIL") on a root pool ?

# zpool add tank log gpt/zil
cannot add to 'tank': root pool can not have multiple vdevs or separate logs

Under 9-STABLE, using zpool v28. This seems to be a limitation from
OpenSolaris. For example, FreeBSD supports booting from a
multiple-vdev root pool. I found that most people use the "unset
bootfs property, add vdev, set bootfs again" trick to have a working
multiple-vdev root pool under FreeBSD. I think I can do the same for
the log device but don't want to loose my data.

Is there anyone successfuly using a log device / zil on a root pool
under FreeBSD ?

Thanks

-- 
Olivier Smedts                                                 _
                                        ASCII ribbon campaign ( )
e-mail: olivier@gid0.org        - against HTML email & vCards  X
www: http://www.gid0.org    - against proprietary attachments / \

  "Il y a seulement 10 sortes de gens dans le monde :
  ceux qui comprennent le binaire,
  et ceux qui ne le comprennent pas."

From owner-freebsd-fs@FreeBSD.ORG  Mon Sep 17 14:33:11 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 70987106566B
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2012 14:33:11 +0000 (UTC)
	(envelope-from c.kworr@gmail.com)
Received: from mail-bk0-f54.google.com (mail-bk0-f54.google.com
	[209.85.214.54])
	by mx1.freebsd.org (Postfix) with ESMTP id EDFF58FC08
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2012 14:33:10 +0000 (UTC)
Received: by bkcje9 with SMTP id je9so2367918bkc.13
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2012 07:33:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=message-id:date:from:user-agent:mime-version:to:cc:subject
	:references:in-reply-to:content-type:content-transfer-encoding;
	bh=fN2bMV5GyphMiQ/9HcUHSCtvxEHat8/7jG2jkyCEPDM=;
	b=vLsa3oT64EX94grrKS3RCbTdePJVGFuGgp/lwp+q8QTVToydVKmhRjL6pugf79PNSA
	2tMEagmHHxCnKG1VnAdUIKmPDcn9ajnfdjvR4txemQkT8G2p9iF+EIkNR1T2bC+B41n2
	M0MQdm3D4DzNbReWiPZb9BwvJj/YEK6Xs6ObZwxvGVz9an9r1uJBbVv1P8UBnvN0Cg8P
	a+kjYXW46jNAomDQJsnzY0Box/2GoBVwucy7p0QrN/pK/DsYFNNHf1JNIHFAqnRaVAEs
	M4K7W5N9dwH1WlHU+RaSZ/RSpQR8dMBIFBYXJSKzREcDfX4DgShty7OrwejcpcM06oJ5
	E7mQ==
Received: by 10.204.11.209 with SMTP id u17mr1977856bku.130.1347892389542;
	Mon, 17 Sep 2012 07:33:09 -0700 (PDT)
Received: from green.local (227-7-132-95.pool.ukrtel.net. [95.132.7.227])
	by mx.google.com with ESMTPS id a17sm2555331bkw.5.2012.09.17.07.33.07
	(version=SSLv3 cipher=OTHER); Mon, 17 Sep 2012 07:33:08 -0700 (PDT)
Message-ID: <505734A1.9000501@gmail.com>
Date: Mon, 17 Sep 2012 17:33:05 +0300
From: Volodymyr Kostyrko <c.kworr@gmail.com>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:15.0) Gecko/20120911 Thunderbird/15.0.1
MIME-Version: 1.0
To: Olivier Smedts <olivier@gid0.org>
References: <CABzXLYMO3NkNws_8eLE21qMxOiTBK5zq9m_Nb_WrRA_fu=D1_Q@mail.gmail.com>
In-Reply-To: <CABzXLYMO3NkNws_8eLE21qMxOiTBK5zq9m_Nb_WrRA_fu=D1_Q@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
Subject: Re: zpool add log to root pool
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Sep 2012 14:33:11 -0000

17.09.2012 15:34, Olivier Smedts wrote:
> Is there anyone here using a separate log device (or "ZIL") on a root pool ?

# zpool set bootfs= tank

> # zpool add tank log gpt/zil
> cannot add to 'tank': root pool can not have multiple vdevs or separate logs
>
> Under 9-STABLE, using zpool v28. This seems to be a limitation from
> OpenSolaris. For example, FreeBSD supports booting from a
> multiple-vdev root pool. I found that most people use the "unset
> bootfs property, add vdev, set bootfs again" trick to have a working
> multiple-vdev root pool under FreeBSD. I think I can do the same for
> the log device but don't want to loose my data.
>
> Is there anyone successfuly using a log device / zil on a root pool
> under FreeBSD ?

Me.

# zpool iostat -v faz0
                                            capacity     operations 
bandwidth
pool                                    alloc   free   read  write 
read  write
--------------------------------------  -----  -----  -----  ----- 
-----  -----
faz0                                     121G   173G     22    149 
130K   671K
   mirror                                 121G   173G     22    149 
130K   659K
     gptid/b88daece-7a48-11df-8703-0018f36885d5      -      -     10 
  56   111K   660K
     gptid/23ddb9f0-7b04-11df-8867-0018f36885d5      -      -     10 
  56   111K   660K
logs                                        -      -      -      - 
-      -
   gptid/3592d260-c98e-11e0-9ef6-0018f36885d5  1,86M  1014M      0 
0      0  11,6K
cache                                       -      -      -      - 
-      -
   gptid/3809bef7-c98e-11e0-9ef6-0018f36885d5  36,3G     8M     71 
8   653K   374K
--------------------------------------  -----  -----  -----  ----- 
-----  -----

# zpool get bootfs faz0
NAME  PROPERTY  VALUE   SOURCE
faz0  bootfs    -       default

-- 
Sphinx of black quartz judge my vow.

From owner-freebsd-fs@FreeBSD.ORG  Mon Sep 17 16:29:45 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 853E710656B7
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2012 16:29:45 +0000 (UTC)
	(envelope-from Newsletter@goelli.de)
Received: from mo6-p05-ob.rzone.de (mo6-p05-ob.rzone.de
	[IPv6:2a01:238:20a:202:5305::1])
	by mx1.freebsd.org (Postfix) with ESMTP id DA8F68FC29
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2012 16:29:44 +0000 (UTC)
X-RZG-CLASS-ID: mo05
X-RZG-AUTH: :ImkTZkytb+s5KUDumTG4i0mGDH1K4fweaf9O+/5rQT5pvsrb4VLk35Jv6Ak/ChY=
Received: from goelliNotebook
	(dslb-094-219-102-167.pools.arcor-ip.net [94.219.102.167])
	by smtp.strato.de (jored mo30) (RZmta 30.14 DYNA|AUTH)
	with ESMTPA id n00393o8HG55ru ; Mon, 17 Sep 2012 18:29:43 +0200 (CEST)
From: =?utf-8?Q?Thomas_G=C3=B6llner_=28Newsletter=29?= <Newsletter@goelli.de>
To: "'Volodymyr Kostyrko'" <c.kworr@gmail.com>
References: <001a01cd900d$bcfcc870$36f65950$@goelli.de>
	<504F282D.8030808@gmail.com>
	<000a01cd90aa$0a277310$1e765930$@goelli.de>
	<5050461A.9050608@gmail.com>
	<000001cd9239$ed734c80$c859e580$@goelli.de>
	<5052EC5D.4060403@gmail.com>
	<000a01cd9274$0aa0bba0$1fe232e0$@goelli.de>
	<505322C9.70200@gmail.com>
	<000001cd9377$e9e9b010$bdbd1030$@goelli.de>
	<50559CD8.1070700@gmail.com>
In-Reply-To: <50559CD8.1070700@gmail.com>
Date: Mon, 17 Sep 2012 18:29:42 +0200
Message-ID: <000001cd94f1$a4157030$ec405090$@goelli.de>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Mailer: Microsoft Outlook 14.0
Thread-Index: AQJeoooVG2eNN70r/NZmey5KTcQ1/AH9GaMbAUqasjACcrqh/gFBByGfA1g04pUCQLd3HAGGeVDSATGcaCEBYrMtgJXnIg+g
Content-Language: de
Cc: freebsd-fs@freebsd.org
Subject: AW: AW: AW: AW: AW: ZFS: Corrupted pool metadata after adding vdev
	to a pool - no opportunity to rescue data from healthy vdevs?
	Remove a vdev? Rewrite metadata?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Sep 2012 16:29:45 -0000

> If you can afford putting your drives aside you can try to wait before =
some tool occasionally emerges. I will not promise anything
> but I'm slowly making some progress with my script. I'm motivated =
about that as I have broken pool with photos. Trying to import
> that pool is causing a core dump on any system I tested like =
OpenSolaris, Illumos or SystemRescueCD.

It would be great if you script would be able to deal with pools with =
broken labels. I will put the three 3TB disks aside and use the old =
1.5TB disks instead. So if there is some progress in your script or =
someone else is gonna write some tool for restoring labels or reading =
data of broken pools, perhaps I can get some data back. I think it would =
take some time to get this fresh 3TB pool full ;-)

This would also solve the next problem I discovered...
These 1.5TB disks have 512byte sectors. I have one spare. If the second =
disk falls out, first I thought, I will replace it with a 4TB disk and =
so on until I have replaced all of them. So I can expand the pool. But =
as I read now, this is not possible, isn't it? Because the 4TB drives =
would have 4k sectors.


From owner-freebsd-fs@FreeBSD.ORG  Mon Sep 17 20:09:34 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C72C3106564A
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2012 20:09:34 +0000 (UTC)
	(envelope-from spork@bway.net)
Received: from xena.bway.net (xena.bway.net [216.220.96.26])
	by mx1.freebsd.org (Postfix) with ESMTP id 5E6CA8FC15
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2012 20:09:34 +0000 (UTC)
Received: (qmail 89294 invoked by uid 0); 17 Sep 2012 20:02:53 -0000
Received: from smtp.bway.net (216.220.96.25)
	by xena.bway.net with ESMTPS (DHE-RSA-AES256-SHA encrypted);
	17 Sep 2012 20:02:53 -0000
Received: (qmail 89283 invoked by uid 90); 17 Sep 2012 20:02:52 -0000
Received: from unknown (HELO frankentosh.sporklab.com) (spork@96.57.144.66)
	by smtp.bway.net with ESMTPA; 17 Sep 2012 20:02:52 -0000
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
From: Charles Sprickman <spork@bway.net>
In-Reply-To: <CABzXLYMO3NkNws_8eLE21qMxOiTBK5zq9m_Nb_WrRA_fu=D1_Q@mail.gmail.com>
Date: Mon, 17 Sep 2012 16:02:51 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <9A49FDEB-325E-4A25-8FB9-C4FF8F9BAF67@bway.net>
References: <CABzXLYMO3NkNws_8eLE21qMxOiTBK5zq9m_Nb_WrRA_fu=D1_Q@mail.gmail.com>
To: Olivier Smedts <olivier@gid0.org>
X-Mailer: Apple Mail (2.1084)
Cc: freebsd-fs@freebsd.org
Subject: Re: zpool add log to root pool
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Sep 2012 20:09:34 -0000

On Sep 17, 2012, at 8:34 AM, Olivier Smedts wrote:

> Hello ZFS folks,
>=20
> Is there anyone here using a separate log device (or "ZIL") on a root =
pool ?
>=20
> # zpool add tank log gpt/zil
> cannot add to 'tank': root pool can not have multiple vdevs or =
separate logs
>=20
> Under 9-STABLE, using zpool v28. This seems to be a limitation from
> OpenSolaris. For example, FreeBSD supports booting from a
> multiple-vdev root pool. I found that most people use the "unset
> bootfs property, add vdev, set bootfs again" trick to have a working
> multiple-vdev root pool under FreeBSD.

I did that and it seems to work.

  pool: zroot
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Tue Aug 21 01:28:27 =
2012
config:

	NAME           STATE     READ WRITE CKSUM
	zroot          ONLINE       0     0     0
	  mirror-0     ONLINE       0     0     0
	    ada1p3     ONLINE       0     0     0
	    ada2p3     ONLINE       0     0     0
	  mirror-1     ONLINE       0     0     0
	    ada3p3     ONLINE       0     0     0
	    ada0p3     ONLINE       0     0     0
	logs
	  mirror-2     ONLINE       0     0     0
	    gpt/zil-a  ONLINE       0     0     0
	    gpt/zil-b  ONLINE       0     0     0
	cache
	  gpt/l2arc-a  ONLINE       0     0     0
	  gpt/l2arc-b  ONLINE       0     0     0

errors: No known data errors

Lost my /dev/gpt entries for the existing mirror slices in the process =
though...

> I think I can do the same for
> the log device but don't want to loose my data.
>=20
> Is there anyone successfuly using a log device / zil on a root pool
> under FreeBSD ?

Mine works, but I've never been able to find any official confirmation =
from the -fs folks as to how "proper" or supported the configuration is. =
 Not too many other options on 1U boxes though really...

Charles


> Thanks
>=20
> --=20
> Olivier Smedts                                                 _
>                                        ASCII ribbon campaign ( )
> e-mail: olivier@gid0.org        - against HTML email & vCards  X
> www: http://www.gid0.org    - against proprietary attachments / \
>=20
>  "Il y a seulement 10 sortes de gens dans le monde :
>  ceux qui comprennent le binaire,
>  et ceux qui ne le comprennent pas."
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"


From owner-freebsd-fs@FreeBSD.ORG  Mon Sep 17 20:36:22 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id AF4FC106566B
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2012 20:36:22 +0000 (UTC)
	(envelope-from fjwcash@gmail.com)
Received: from mail-lpp01m010-f54.google.com (mail-lpp01m010-f54.google.com
	[209.85.215.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 30B0A8FC14
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2012 20:36:21 +0000 (UTC)
Received: by lage12 with SMTP id e12so5504761lag.13
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2012 13:36:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	bh=g9s7g5Kt3InPSjhFv+koR36b8pu+Tr0j6FZYjBpvNaY=;
	b=NibPrDQuuRBSNiwYhelp/o2GcM0mwYxY5Z0x7aXJk73U43Dl+4TfvVkoIB1BmfnZMA
	UNav0me5I77kQB1ZtNqL91rL/ppuDRAmeVhKo5BAlLoHnUY+o4NkKle/pX7Md9VEvDmx
	tLvnU+hAcuvyUxGI6y4jFpJ47LcglPbwM0uyEhfaHPvHGOO06Tkn2TkVrHCPAohbXp89
	0cDmwFreektfpUFvdia9nLSNxuH5fQOGmewCiJrf3hwFKPIQj6PZRv9ExcIL7eG6fr7o
	pSPewPwPPBG5QmPxC+ZAMq3n+xSlRoKn221NOjT7DnCJ3aMEOhsd8pFFoJF0TbtyXdP2
	dffw==
MIME-Version: 1.0
Received: by 10.152.110.9 with SMTP id hw9mr10638390lab.55.1347914180412; Mon,
	17 Sep 2012 13:36:20 -0700 (PDT)
Received: by 10.114.23.230 with HTTP; Mon, 17 Sep 2012 13:36:20 -0700 (PDT)
In-Reply-To: <9A49FDEB-325E-4A25-8FB9-C4FF8F9BAF67@bway.net>
References: <CABzXLYMO3NkNws_8eLE21qMxOiTBK5zq9m_Nb_WrRA_fu=D1_Q@mail.gmail.com>
	<9A49FDEB-325E-4A25-8FB9-C4FF8F9BAF67@bway.net>
Date: Mon, 17 Sep 2012 13:36:20 -0700
Message-ID: <CAOjFWZ4bfgYesMOUr8yNRzmJ7SCB+wkoh0-QLvSmn4rKG==68A@mail.gmail.com>
From: Freddie Cash <fjwcash@gmail.com>
To: Charles Sprickman <spork@bway.net>
Content-Type: text/plain; charset=UTF-8
Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org>
Subject: Re: zpool add log to root pool
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Sep 2012 20:36:22 -0000

On Mon, Sep 17, 2012 at 1:02 PM, Charles Sprickman <spork@bway.net> wrote:
> I did that and it seems to work.
>
>   pool: zroot
>  state: ONLINE
>   scan: scrub repaired 0 in 0h0m with 0 errors on Tue Aug 21 01:28:27 2012
> config:
>
>         NAME           STATE     READ WRITE CKSUM
>         zroot          ONLINE       0     0     0
>           mirror-0     ONLINE       0     0     0
>             ada1p3     ONLINE       0     0     0
>             ada2p3     ONLINE       0     0     0
>           mirror-1     ONLINE       0     0     0
>             ada3p3     ONLINE       0     0     0
>             ada0p3     ONLINE       0     0     0
>         logs
>           mirror-2     ONLINE       0     0     0
>             gpt/zil-a  ONLINE       0     0     0
>             gpt/zil-b  ONLINE       0     0     0
>         cache
>           gpt/l2arc-a  ONLINE       0     0     0
>           gpt/l2arc-b  ONLINE       0     0     0
>
> errors: No known data errors
>
> Lost my /dev/gpt entries for the existing mirror slices in the process though...

Have you tried booting from a LiveCD (like Frenzy or the 9.0
installer), doing an import of the pool, and export of the pool, and
then a "zpool import -d /dev/gpt zroot"?  You may be able to skip the
iniatial import/export.

-- 
Freddie Cash
fjwcash@gmail.com

From owner-freebsd-fs@FreeBSD.ORG  Mon Sep 17 21:33:55 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0053A1065673
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2012 21:33:54 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
	[131.104.91.36])
	by mx1.freebsd.org (Postfix) with ESMTP id D796E8FC15
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2012 21:33:53 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ap8EAECWV1CDaFvO/2dsb2JhbAA+BxaFcbchgiABAQUjBFIbDgoCAg0ZAlkGiBMLp1SSc4EhigAhhTWBEgOVYoEUjw2DAoE+Ihs
X-IronPort-AV: E=Sophos;i="4.80,439,1344225600"; d="scan'208";a="182014531"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-annu-pri.mail.uoguelph.ca with ESMTP; 17 Sep 2012 17:32:44 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id C46C3B3EFE;
	Mon, 17 Sep 2012 17:32:44 -0400 (EDT)
Date: Mon, 17 Sep 2012 17:32:44 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Konstantin Belousov <kostikbel@gmail.com>
Message-ID: <1777840817.743780.1347917564789.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20120917122325.GR37286@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: FS List <freebsd-fs@freebsd.org>
Subject: Re: testing/review of atomic export update patch
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Sep 2012 21:33:55 -0000

Konstantin Belousov wrote:
> On Sun, Sep 16, 2012 at 05:41:25PM -0400, Rick Macklem wrote:
> > Hi,
> >
> > There is a simple patch at:
> >   http://people.freebsd.org/~rmacklem/atomic-export.patch
> > that can be applied to a kernel + mountd, so that the new
> > nfsd can be suspended by mountd while the exports are being
> > reloaded. It adds a new "-S" flag to mountd to enable this.
> > (This avoids the long standing bug where clients receive ESTALE
> >  replies to RPCs while mountd is reloading exports.)
> 
> This looks simple, but also somewhat worrisome. What would happen
> if the mountd crashes after nfsd suspension is requested, but before
> resume was performed ?
> 
> Might be, mountd should check for suspended nfsd on start and
> unsuspend
> it, if some flag is specified ?
Well, I think that happens with the patch as it stands.

suspend is done if the "-S" option is specified, but that is a no op
if it is already suspended. The resume is done no matter what flags
are provided, so mountd will always try and do a "resume".
--> get_exportlist() is always called when mountd is started up and
    it does the resume unconditionally when it completes.
    If mountd repeatedly crashes before completing get_exportlist()
    when it is started up, the exports will be all messed up, so
    having the nfsd threads suspended doesn't seem so bad for this
    case (which hopefully never happens;-).

Both suspend and resume are just no ops for unpatched kernels.

Maybe the comment in front of "resume" should explicitly explain
this, instead of saying resume is harmless to do under all conditions?

Thanks for looking at it, rick


From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 18 01:08:12 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: by hub.freebsd.org (Postfix, from userid 821)
	id 66E1A106566C; Tue, 18 Sep 2012 01:08:12 +0000 (UTC)
Date: Tue, 18 Sep 2012 01:08:12 +0000
From: John <jwd@FreeBSD.org>
To: FreeBSD FS <freebsd-fs@freebsd.org>
Message-ID: <20120918010812.GA71005@FreeBSD.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.4.2.1i
Subject: XFS/istgt backed 8TB xfs filesystem configuration?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2012 01:08:12 -0000

Hi Folks,

   I've been asked to export an 8TB volume (via istgt) and was
curious if anyone has any experience with optimal xfs configurations
in this area.

   Typically, volumes for linux are created similar to:

zfs create -b 32768 -V $lunsize $physname

   On a dual 10g data backbone network, we use mpio with 2 channels
per net:

[PortalGroup4]
   Comment "Two networks - Two ports"
   Portal DA1 10.59.10.10:5000
   Portal DA2 10.60.10.10:5000
   Portal DA3 10.59.10.10:5001
   Portal DA4 10.60.10.10:5001
   Comment "END: PortalGroup4"

   which typically seems to give the best performance. The luns are
being brought together on the linux side (RHEL 6.1) with multipath.

   I've google'd around a bit and don't see much about zfs filesystems
on top of iscsi exported zfs volumes :-)

   Anyone have any experience in this area? Suggestions? I'm told
the data patterns will be mostly database reads, minimal writes.

Thanks,
John


From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 18 06:10:01 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 33A74106566C
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2012 06:10:01 +0000 (UTC)
	(envelope-from c.kworr@gmail.com)
Received: from mail-bk0-f54.google.com (mail-bk0-f54.google.com
	[209.85.214.54])
	by mx1.freebsd.org (Postfix) with ESMTP id A54618FC0A
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2012 06:10:00 +0000 (UTC)
Received: by bkcje9 with SMTP id je9so2721386bkc.13
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2012 23:09:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=message-id:date:from:user-agent:mime-version:to:cc:subject
	:references:in-reply-to:content-type:content-transfer-encoding;
	bh=B2Bp3D2dEOgVe6RlSyz9Ivmpn52RqYb/RdnIJaW9e5U=;
	b=zvAxJh+Fg6W1UCxQrPSyoMDc3r3n0QNNGdiffQIGWbkmji3dIh0Y0hdyKCwYAGs3Bz
	As4zfdn+jLpKSMgpXSLokQdbxvsnRfTNRrgbuE2XK9DWIXnygKLyKjFw49cqybNB6vF0
	Ve1xrper8GjuQ3n4elu90NYPkUPxdosDkk5SMSrdQQEkGDl0dHcHOtxJyzmhssxEg+AD
	fKZdWUillqIgeb3i0uW4LXmrVyOrmGfm0YOmj8bRl7OMdQoZ9JWImvqpsWb5NjUuebct
	ifhiv7nOQQWNFVZUDjc9a5fvwNYXRdPCP9HilQ16LTsr3nRgGf9T9mFPzvBzf3N6Nndl
	i/Fg==
Received: by 10.204.156.18 with SMTP id u18mr1825406bkw.131.1347948599526;
	Mon, 17 Sep 2012 23:09:59 -0700 (PDT)
Received: from limbo.xim.bz ([46.150.100.6])
	by mx.google.com with ESMTPS id t23sm6800049bks.4.2012.09.17.23.09.57
	(version=SSLv3 cipher=OTHER); Mon, 17 Sep 2012 23:09:58 -0700 (PDT)
Message-ID: <50581033.4040102@gmail.com>
Date: Tue, 18 Sep 2012 09:09:55 +0300
From: Volodymyr Kostyrko <c.kworr@gmail.com>
User-Agent: Mozilla/5.0 (X11; FreeBSD i386;
	rv:15.0) Gecko/20120911 Thunderbird/15.0.1
MIME-Version: 1.0
To: =?UTF-8?B?IlRob21hcyBHw7ZsbG5lciAoTmV3c2xldHRlciki?= <Newsletter@goelli.de>
References: <001a01cd900d$bcfcc870$36f65950$@goelli.de>
	<504F282D.8030808@gmail.com>
	<000a01cd90aa$0a277310$1e765930$@goelli.de>
	<5050461A.9050608@gmail.com>
	<000001cd9239$ed734c80$c859e580$@goelli.de>
	<5052EC5D.4060403@gmail.com>
	<000a01cd9274$0aa0bba0$1fe232e0$@goelli.de>
	<505322C9.70200@gmail.com>
	<000001cd9377$e9e9b010$bdbd1030$@goelli.de>
	<50559CD8.1070700@gmail.com>
	<000001cd94f1$a4157030$ec405090$@goelli.de>
In-Reply-To: <000001cd94f1$a4157030$ec405090$@goelli.de>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Cc: freebsd-fs@freebsd.org
Subject: Re: AW: AW: AW: AW: AW: ZFS: Corrupted pool metadata after adding
 vdev to a pool - no opportunity to rescue data from healthy vdevs? Remove
 a vdev? Rewrite metadata?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2012 06:10:01 -0000

17.09.2012 19:29, Thomas Göllner (Newsletter) write:
>> If you can afford putting your drives aside you can try to wait before some tool occasionally emerges. I will not promise anything
>> but I'm slowly making some progress with my script. I'm motivated about that as I have broken pool with photos. Trying to import
>> that pool is causing a core dump on any system I tested like OpenSolaris, Illumos or SystemRescueCD.
>
> It would be great if you script would be able to deal with pools with broken labels. I will put the three 3TB disks aside and use the old 1.5TB disks instead. So if there is some progress in your script or someone else is gonna write some tool for restoring labels or reading data of broken pools, perhaps I can get some data back. I think it would take some time to get this fresh 3TB pool full ;-)
>
> This would also solve the next problem I discovered...
> These 1.5TB disks have 512byte sectors. I have one spare. If the second disk falls out, first I thought, I will replace it with a 4TB disk and so on until I have replaced all of them. So I can expand the pool. But as I read now, this is not possible, isn't it? Because the 4TB drives would have 4k sectors.

 From my point of view all hype about moving to 4k sectors is highly 
irrelevant to ZFS and current products on the market.

1. ZFS tends to use big recordsize for storing any data. This means most 
files on your drives are already stored in 128k sectors. Storing small 
tails in 512b or 4k sectors shouldn't give big difference.

2. For older drives each drive should be partitioned with respect to 4k 
sectors. This is what -a option of gpart does: it aligns created 
partitions to 4k sector bounds. But half a year ago I already found some 
drives that can auto-shift all disk transactions to optimize read and 
write performance. Courtesy of Microsoft Windows, OS that does not care 
about anything not written in license terms, same as the users do, so 
using this drives would be more straightforward and would not cause 
decent pain to IT stuff about realigning partitions the way it would 
just work.

-- 
Sphinx of black quartz judge my vow.

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 18 08:59:46 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 58BB6106566C
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2012 08:59:46 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id E327F8FC0C
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2012 08:59:45 +0000 (UTC)
Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q8I8xrUp084443;
	Tue, 18 Sep 2012 11:59:54 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id
	q8I8xfmZ043766; Tue, 18 Sep 2012 11:59:41 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q8I8xfWK043765; 
	Tue, 18 Sep 2012 11:59:41 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Tue, 18 Sep 2012 11:59:41 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Message-ID: <20120918085941.GZ37286@deviant.kiev.zoral.com.ua>
References: <20120917122325.GR37286@deviant.kiev.zoral.com.ua>
	<1777840817.743780.1347917564789.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="hZpDuTGHUtM8eGVR"
Content-Disposition: inline
In-Reply-To: <1777840817.743780.1347917564789.JavaMail.root@erie.cs.uoguelph.ca>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: FS List <freebsd-fs@freebsd.org>
Subject: Re: testing/review of atomic export update patch
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2012 08:59:46 -0000


--hZpDuTGHUtM8eGVR
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Sep 17, 2012 at 05:32:44PM -0400, Rick Macklem wrote:
> Konstantin Belousov wrote:
> > On Sun, Sep 16, 2012 at 05:41:25PM -0400, Rick Macklem wrote:
> > > Hi,
> > >
> > > There is a simple patch at:
> > >   http://people.freebsd.org/~rmacklem/atomic-export.patch
> > > that can be applied to a kernel + mountd, so that the new
> > > nfsd can be suspended by mountd while the exports are being
> > > reloaded. It adds a new "-S" flag to mountd to enable this.
> > > (This avoids the long standing bug where clients receive ESTALE
> > >  replies to RPCs while mountd is reloading exports.)
> >=20
> > This looks simple, but also somewhat worrisome. What would happen
> > if the mountd crashes after nfsd suspension is requested, but before
> > resume was performed ?
> >=20
> > Might be, mountd should check for suspended nfsd on start and
> > unsuspend
> > it, if some flag is specified ?
> Well, I think that happens with the patch as it stands.
>=20
> suspend is done if the "-S" option is specified, but that is a no op
> if it is already suspended. The resume is done no matter what flags
> are provided, so mountd will always try and do a "resume".
> --> get_exportlist() is always called when mountd is started up and
>     it does the resume unconditionally when it completes.
>     If mountd repeatedly crashes before completing get_exportlist()
>     when it is started up, the exports will be all messed up, so
>     having the nfsd threads suspended doesn't seem so bad for this
>     case (which hopefully never happens;-).
>=20
> Both suspend and resume are just no ops for unpatched kernels.
>=20
> Maybe the comment in front of "resume" should explicitly explain
> this, instead of saying resume is harmless to do under all conditions?
>=20
> Thanks for looking at it, rick
I see.

My another note is that there is no any protection against parallel
instances of suspend/resume happen. For instance, one thread could set
suspend_nfsd =3D 1 and be descheduled, while another executes resume
code sequence meantime. Then it would see suspend_nfsd !=3D 0, while
nfsv4rootfs_lock not held, and tries to unlock it. It seems that
nfsv4_unlock would silently exit. The suspending thread resumes,
and obtains the lock. You end up with suspend_nfsd =3D=3D 0 but lock held.

--hZpDuTGHUtM8eGVR
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAlBYN/0ACgkQC3+MBN1Mb4iPGgCeM/a6BN9tZLpmw3fstmO+Gd1Q
mKEAniRaUuIkellq4m3LLYRfLo8MzYvE
=Kqj8
-----END PGP SIGNATURE-----

--hZpDuTGHUtM8eGVR--

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 18 10:37:56 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 356CD1065675
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2012 10:37:56 +0000 (UTC)
	(envelope-from daniel@digsys.bg)
Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230])
	by mx1.freebsd.org (Postfix) with ESMTP id A828A8FC14
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2012 10:37:55 +0000 (UTC)
Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5])
	(authenticated bits=0)
	by smtp-sofia.digsys.bg (8.14.5/8.14.5) with ESMTP id q8IASHPS013435
	(version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO)
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2012 13:28:18 +0300 (EEST)
	(envelope-from daniel@digsys.bg)
Message-ID: <50584CC1.3030300@digsys.bg>
Date: Tue, 18 Sep 2012 13:28:17 +0300
From: Daniel Kalchev <daniel@digsys.bg>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:10.0.6esrpre) Gecko/20120728 Thunderbird/10.0.6
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
References: <001a01cd900d$bcfcc870$36f65950$@goelli.de>
	<504F282D.8030808@gmail.com>
	<000a01cd90aa$0a277310$1e765930$@goelli.de>
	<5050461A.9050608@gmail.com>
	<000001cd9239$ed734c80$c859e580$@goelli.de>
	<5052EC5D.4060403@gmail.com>
	<000a01cd9274$0aa0bba0$1fe232e0$@goelli.de>
	<505322C9.70200@gmail.com>
	<000001cd9377$e9e9b010$bdbd1030$@goelli.de>
	<50559CD8.1070700@gmail.com>
	<000001cd94f1$a4157030$ec405090$@goelli.de>
	<50581033.4040102@gmail.com>
In-Reply-To: <50581033.4040102@gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: AW: AW: AW: AW: AW: ZFS: Corrupted pool metadata after adding
 vdev to a pool - no opportunity to rescue data from healthy vdevs? Remove
 a vdev? Rewrite metadata?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2012 10:37:56 -0000


On 18.09.12 09:09, Volodymyr Kostyrko wrote:
>
> From my point of view all hype about moving to 4k sectors is highly 
> irrelevant to ZFS and current products on the market.
>
> 1. ZFS tends to use big recordsize for storing any data. This means 
> most files on your drives are already stored in 128k sectors. Storing 
> small tails in 512b or 4k sectors shouldn't give big difference.

Truth is, ZFS will write blocks of size from your media sector size up 
to 128K.

The problem is that ZFS writes these records (even 128K) aligned to the 
sector size. So, once you write some data that is under 4k, your pool 
will become misaligned.

There are two problems with the 4k drives:

- many of these drives lie about their sector size. You must instruct 
your software to threat them as 4k sector drives, otherwise the 
performance penalty (mostly for writing) is very significant.

- new drives you buy will inevitably come with 4k sectors (or more) and 
if you need to replace a drive in large zpool you will start having 
abysmal write performance.

>
> 2. For older drives each drive should be partitioned with respect to 
> 4k sectors. This is what -a option of gpart does: it aligns created 
> partitions to 4k sector bounds. But half a year ago I already found 
> some drives that can auto-shift all disk transactions to optimize read 
> and write performance. Courtesy of Microsoft Windows, OS that does not 
> care about anything not written in license terms, same as the users 
> do, so using this drives would be more straightforward and would not 
> cause decent pain to IT stuff about realigning partitions the way it 
> would just work.
>

This is only hype. There is no way any disk firmware can shift any 
transactions. All these drives do when you write 512 bytes in any 4k 
sector is read the 4k sector, replace 512 bytes of it and write it back. 
Best you could hope is that sector is already in the disk cache, which 
of course is rare.

The problem is not Windows itself, but the old MBR concept, that first 
partition starts at sector 63.

Today, it is wise to always make sure new zpools are created with ashift=12.

Daniel

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 18 11:20:50 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 63658106564A;
	Tue, 18 Sep 2012 11:20:50 +0000 (UTC) (envelope-from feld@feld.me)
Received: from feld.me (unknown [IPv6:2607:f4e0:100:300::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 2D3B18FC1A;
	Tue, 18 Sep 2012 11:20:50 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=feld.me;
	s=blargle; 
	h=In-Reply-To:Message-Id:From:Mime-Version:Date:References:Subject:To:Content-Type;
	bh=3q/ZkSyRRVX+/nhhCMO2adRsEqVNgkSQzlqQWUL+7Js=; 
	b=dCzH0W+jfnGLBdzlSHtfiUbpWtwbMGKW8zQiKN/Piug2Wyz+Q2I9qWJmQTMRh1SFAcja8sQ4b8dkjWCDiP1Iee8XCcRMGlN6U0zDkIPl1Kz53nkRYJag0XOqzp8PoPww;
Received: from localhost ([127.0.0.1] helo=mwi1.coffeenet.org)
	by feld.me with esmtp (Exim 4.80 (FreeBSD))
	(envelope-from <feld@feld.me>)
	id 1TDvr7-0009oV-C6; Tue, 18 Sep 2012 06:20:47 -0500
Received: from feld@feld.me by mwi1.coffeenet.org (Archiveopteryx 3.1.4)
	with esmtpa id 1347967235-3100-3099/5/74; Tue, 18 Sep 2012 11:20:35
	+0000
Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
To: FreeBSD FS <freebsd-fs@freebsd.org>, John <jwd@freebsd.org>
References: <20120918010812.GA71005@FreeBSD.org>
Date: Tue, 18 Sep 2012 06:20:35 -0500
Mime-Version: 1.0
From: Mark Felder <feld@feld.me>
Message-Id: <op.wktwglik34t2sn@tech304>
In-Reply-To: <20120918010812.GA71005@FreeBSD.org>
User-Agent: Opera Mail/12.02 (FreeBSD)
X-SA-Report: ALL_TRUSTED=-1, KHOP_THREADED=-0.5
X-SA-Score: -1.5
Cc: 
Subject: Re: XFS/istgt backed 8TB xfs filesystem configuration?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2012 11:20:50 -0000

Am I correct that you're proposing the following configuration:

FreeBSD Server -> zpool -> zvol -> istgt -> Linux server -> iscsi  
initiator -> XFS filesystem

If so then yes, I am using a setup like this. It works quite well. :-)

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 18 11:24:04 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 1D01F106566B
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2012 11:24:04 +0000 (UTC)
	(envelope-from gpalmer@freebsd.org)
Received: from noop.in-addr.com (mail.in-addr.com [IPv6:2001:470:8:162::1])
	by mx1.freebsd.org (Postfix) with ESMTP id D79348FC0A
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2012 11:24:03 +0000 (UTC)
Received: from gjp by noop.in-addr.com with local (Exim 4.80 (FreeBSD))
	(envelope-from <gpalmer@freebsd.org>)
	id 1TDvuF-000PYY-Sa; Tue, 18 Sep 2012 07:23:55 -0400
Date: Tue, 18 Sep 2012 07:23:55 -0400
From: Gary Palmer <gpalmer@freebsd.org>
To: Volodymyr Kostyrko <c.kworr@gmail.com>
Message-ID: <20120918112355.GB77784@in-addr.com>
References: <000a01cd90aa$0a277310$1e765930$@goelli.de>
	<5050461A.9050608@gmail.com>
	<000001cd9239$ed734c80$c859e580$@goelli.de>
	<5052EC5D.4060403@gmail.com>
	<000a01cd9274$0aa0bba0$1fe232e0$@goelli.de>
	<505322C9.70200@gmail.com>
	<000001cd9377$e9e9b010$bdbd1030$@goelli.de>
	<50559CD8.1070700@gmail.com>
	<000001cd94f1$a4157030$ec405090$@goelli.de>
	<50581033.4040102@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <50581033.4040102@gmail.com>
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: gpalmer@freebsd.org
X-SA-Exim-Scanned: No (on noop.in-addr.com); SAEximRunCond expanded to false
Cc: freebsd-fs@freebsd.org,
	"\"Thomas G??llner \(Newsletter\)\"" <Newsletter@goelli.de>
Subject: Re: AW: AW: AW: AW: AW: ZFS: Corrupted pool metadata after adding
 vdev to a pool - no opportunity to rescue data from healthy vdevs? Remove a
 vdev? Rewrite metadata?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2012 11:24:04 -0000

On Tue, Sep 18, 2012 at 09:09:55AM +0300, Volodymyr Kostyrko wrote:
> 17.09.2012 19:29, Thomas G??llner (Newsletter) write:
> >> If you can afford putting your drives aside you can try to wait before some tool occasionally emerges. I will not promise anything
> >> but I'm slowly making some progress with my script. I'm motivated about that as I have broken pool with photos. Trying to import
> >> that pool is causing a core dump on any system I tested like OpenSolaris, Illumos or SystemRescueCD.
> >
> > It would be great if you script would be able to deal with pools with broken labels. I will put the three 3TB disks aside and use the old 1.5TB disks instead. So if there is some progress in your script or someone else is gonna write some tool for restoring labels or reading data of broken pools, perhaps I can get some data back. I think it would take some time to get this fresh 3TB pool full ;-)
> >
> > This would also solve the next problem I discovered...
> > These 1.5TB disks have 512byte sectors. I have one spare. If the second disk falls out, first I thought, I will replace it with a 4TB disk and so on until I have replaced all of them. So I can expand the pool. But as I read now, this is not possible, isn't it? Because the 4TB drives would have 4k sectors.
> 
>  From my point of view all hype about moving to 4k sectors is highly 
> irrelevant to ZFS and current products on the market.
> 
> 1. ZFS tends to use big recordsize for storing any data. This means most 
> files on your drives are already stored in 128k sectors. Storing small 
> tails in 512b or 4k sectors shouldn't give big difference.

Performance testing has shown that running "advanced format" (aka 4kilobyte
sector disks) with 512 byte alignment with ZFS seriously degrades performance
compared to running with 4 kilobyte alignment.  

Regards,

Gary

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 18 11:25:03 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9ECA31065673
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2012 11:25:03 +0000 (UTC)
	(envelope-from olivier@gid0.org)
Received: from mail-lpp01m010-f54.google.com (mail-lpp01m010-f54.google.com
	[209.85.215.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 16FA38FC19
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2012 11:25:02 +0000 (UTC)
Received: by lage12 with SMTP id e12so5987478lag.13
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2012 04:24:55 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=google.com; s=20120113;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:x-gm-message-state;
	bh=r8/56poSFT0wgrIE/N9ERfflmxTP2deA7BPt0thPVIU=;
	b=YMeUpuDxFdMnYO+XF66hH2/5Hc3mWYMYejPa0q0nf5hDQ4G1Bvak+NT2KrQOarZd3d
	EbrOnEc1wom6DI3JQ/jmXWYEwnS2lYy55jH0hcfCj4sLD6tydecDCG+hmPBK+X4hZlsm
	Zx8H2JQ8y+UvzVQao4UzOwMwarNjMwPA7BcAhCdON64eZBhoOsgx74pVgqPuM0NnQNJZ
	oPEkjtJPeU025aB2d6T6xTRFKuipCGelrjN8X59dhbPRPvjjTpH7mBTBWYSr1Lnhg8nx
	7KTX2E6CDpO8sYLXWLQu2UTehbObFjVgxdpRIfHZflFUy73vvJ/IfuK0PDr8xX+MnRp3
	dIjw==
MIME-Version: 1.0
Received: by 10.112.42.103 with SMTP id n7mr123699lbl.69.1347967495873; Tue,
	18 Sep 2012 04:24:55 -0700 (PDT)
Received: by 10.112.2.36 with HTTP; Tue, 18 Sep 2012 04:24:55 -0700 (PDT)
In-Reply-To: <9A49FDEB-325E-4A25-8FB9-C4FF8F9BAF67@bway.net>
References: <CABzXLYMO3NkNws_8eLE21qMxOiTBK5zq9m_Nb_WrRA_fu=D1_Q@mail.gmail.com>
	<9A49FDEB-325E-4A25-8FB9-C4FF8F9BAF67@bway.net>
Date: Tue, 18 Sep 2012 13:24:55 +0200
Message-ID: <CABzXLYNXeTyzN2gdDwTd1pcUNUO6L9BnVcFT=Wv915zFAj9TKg@mail.gmail.com>
From: Olivier Smedts <olivier@gid0.org>
To: Charles Sprickman <spork@bway.net>
Content-Type: text/plain; charset=ISO-8859-1
X-Gm-Message-State: ALoCoQlSOjnxaUu3TNGrbh73sRD6vrLSnayvT6yVNfQ0Jx83a6nCRxInUpf8DAW0uf7Udgzdku8d
Cc: freebsd-fs@freebsd.org
Subject: Re: zpool add log to root pool
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2012 11:25:03 -0000

2012/9/17 Charles Sprickman <spork@bway.net>:
> On Sep 17, 2012, at 8:34 AM, Olivier Smedts wrote:
>> Is there anyone successfuly using a log device / zil on a root pool
>> under FreeBSD ?
>
> Mine works, but I've never been able to find any official confirmation from the -fs folks as to how "proper" or supported the configuration is.  Not too many other options on 1U boxes though really...

Thanks, I tried and it works for me :
# zpool set bootfs= tank
# zpool add tank log gpt/zil
# zpool set bootfs=tank/freebsd tank
Rebooted, seems to work... maybe the bootfs property check should be
disabled in the code ?


-- 
Olivier Smedts                                                 _
                                        ASCII ribbon campaign ( )
e-mail: olivier@gid0.org        - against HTML email & vCards  X
www: http://www.gid0.org    - against proprietary attachments / \

  "Il y a seulement 10 sortes de gens dans le monde :
  ceux qui comprennent le binaire,
  et ceux qui ne le comprennent pas."

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 18 12:27:16 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 6F3E0106568A
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2012 12:27:16 +0000 (UTC)
	(envelope-from c.kworr@gmail.com)
Received: from mail-ey0-f182.google.com (mail-ey0-f182.google.com
	[209.85.215.182])
	by mx1.freebsd.org (Postfix) with ESMTP id F136F8FC08
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2012 12:27:15 +0000 (UTC)
Received: by eaak11 with SMTP id k11so2903418eaa.13
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2012 05:27:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=message-id:date:from:user-agent:mime-version:to:cc:subject
	:references:in-reply-to:content-type:content-transfer-encoding;
	bh=nOrc22YorVIyf8PAEfpLbrbjtX3D/8bG/YNUes1eBVU=;
	b=pG9oOwLOTHOha/I83KnswFH7dufwinDBWExEEneVPNoow/bKuWPytX2Rg4W93zsYOh
	6xbD02qhBpF3rbmzQ+NvIaMqXB/urZX4rMop7lYQdn9FwXpRjSaaSLTFsb9RBGMyV7bC
	4sISl4U/pi8UtRjGuG0uYXLJQiAECtOajoNgHIGYTZVUNHhBo8B7umVhTcNOyAU+TbJO
	eqZo7tBGVOaAymkGlK81M1P+d66lluZIks7/qY/8Ouw7Ds5R5gLLdTXTG7sgEHDP4+rR
	rl3H5PhvUJ+VVkzYf8uEHXeihZVDGdaQi+6yMwlRgcjLra6Sk2b5IN1yT0Pd1WWhxGVZ
	E31w==
Received: by 10.14.198.65 with SMTP id u41mr17451850een.22.1347971234791;
	Tue, 18 Sep 2012 05:27:14 -0700 (PDT)
Received: from green.local (90-224-132-95.pool.ukrtel.net. [95.132.224.90])
	by mx.google.com with ESMTPS id r45sm35929439eem.6.2012.09.18.05.27.11
	(version=SSLv3 cipher=OTHER); Tue, 18 Sep 2012 05:27:13 -0700 (PDT)
Message-ID: <5058689E.5060302@gmail.com>
Date: Tue, 18 Sep 2012 15:27:10 +0300
From: Volodymyr Kostyrko <c.kworr@gmail.com>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:15.0) Gecko/20120911 Thunderbird/15.0.1
MIME-Version: 1.0
To: Olivier Smedts <olivier@gid0.org>
References: <CABzXLYMO3NkNws_8eLE21qMxOiTBK5zq9m_Nb_WrRA_fu=D1_Q@mail.gmail.com>
	<9A49FDEB-325E-4A25-8FB9-C4FF8F9BAF67@bway.net>
	<CABzXLYNXeTyzN2gdDwTd1pcUNUO6L9BnVcFT=Wv915zFAj9TKg@mail.gmail.com>
In-Reply-To: <CABzXLYNXeTyzN2gdDwTd1pcUNUO6L9BnVcFT=Wv915zFAj9TKg@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
Subject: Re: zpool add log to root pool
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2012 12:27:16 -0000

18.09.2012 14:24, Olivier Smedts wrote:
> 2012/9/17 Charles Sprickman <spork@bway.net>:
>> On Sep 17, 2012, at 8:34 AM, Olivier Smedts wrote:
>>> Is there anyone successfuly using a log device / zil on a root pool
>>> under FreeBSD ?
>>
>> Mine works, but I've never been able to find any official confirmation from the -fs folks as to how "proper" or supported the configuration is.  Not too many other options on 1U boxes though really...
>
> Thanks, I tried and it works for me :
> # zpool set bootfs= tank
> # zpool add tank log gpt/zil
> # zpool set bootfs=tank/freebsd tank
> Rebooted, seems to work... maybe the bootfs property check should be
> disabled in the code ?

There might be some uncertain areas like:

1. Does our bootcode support replaying/reconstructing ZIL before booting?
2. Would the machine boot if log device is missed?
3. How much data can be thrashed when log device fails.

UPS answers most of this questions to me, but my machine is local test 
server, not a production one.

-- 
Sphinx of black quartz judge my vow.

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 18 12:40:04 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CD92C106567D;
	Tue, 18 Sep 2012 12:40:04 +0000 (UTC)
	(envelope-from c.kworr@gmail.com)
Received: from mail-ee0-f54.google.com (mail-ee0-f54.google.com [74.125.83.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 27EE68FC0A;
	Tue, 18 Sep 2012 12:40:03 +0000 (UTC)
Received: by eeke52 with SMTP id e52so4141055eek.13
	for <multiple recipients>; Tue, 18 Sep 2012 05:39:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=message-id:date:from:user-agent:mime-version:to:cc:subject
	:references:in-reply-to:content-type:content-transfer-encoding;
	bh=IKyKxpCK8Is+gtF7J8z9yhWBhayYNEYFS/LOsPG4UJI=;
	b=j0TD+egvjteELxTI6V+QcRQIgl2Q+iVxiUfSZ3dsLlpJh54uYlv1XWjBpf1nJHhULH
	HQaSh/02ARsXc8lO5EiYooFhnLK3PMtM67mMQYCYK7P0TgY5P9knkpF3KWkJTk6egod6
	/NJasnVVAGJBj0c5Cm8SaW+GI1k/NNROzKKIDuefDM6qIgAd9VbTVljtrd+ruIIca6xI
	TsgVmkgiD+6SAMkndwNM60tnykbdIK4AZwiC0Q9iDPKCk/D1oQLaYj8qdqEPbdbB49kw
	DXMmjnJTJr7IhBsyhURhgBjQvhi2lLQkMWgcxIC8ZctdyaK/3SSYshYh2AqHmyP2OELO
	GWrg==
Received: by 10.14.213.137 with SMTP id a9mr17191823eep.38.1347971997284;
	Tue, 18 Sep 2012 05:39:57 -0700 (PDT)
Received: from green.local (90-224-132-95.pool.ukrtel.net. [95.132.224.90])
	by mx.google.com with ESMTPS id k49sm36024104een.4.2012.09.18.05.39.55
	(version=SSLv3 cipher=OTHER); Tue, 18 Sep 2012 05:39:55 -0700 (PDT)
Message-ID: <50586B99.40108@gmail.com>
Date: Tue, 18 Sep 2012 15:39:53 +0300
From: Volodymyr Kostyrko <c.kworr@gmail.com>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:15.0) Gecko/20120911 Thunderbird/15.0.1
MIME-Version: 1.0
To: Gary Palmer <gpalmer@freebsd.org>
References: <000a01cd90aa$0a277310$1e765930$@goelli.de>
	<5050461A.9050608@gmail.com>
	<000001cd9239$ed734c80$c859e580$@goelli.de>
	<5052EC5D.4060403@gmail.com>
	<000a01cd9274$0aa0bba0$1fe232e0$@goelli.de>
	<505322C9.70200@gmail.com>
	<000001cd9377$e9e9b010$bdbd1030$@goelli.de>
	<50559CD8.1070700@gmail.com>
	<000001cd94f1$a4157030$ec405090$@goelli.de>
	<50581033.4040102@gmail.com> <20120918112355.GB77784@in-addr.com>
In-Reply-To: <20120918112355.GB77784@in-addr.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
Subject: Re: AW: AW: AW: AW: AW: ZFS: Corrupted pool metadata after adding
 vdev to a pool - no opportunity to rescue data from healthy vdevs? Remove
 a vdev? Rewrite metadata?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2012 12:40:05 -0000

18.09.2012 14:23, Gary Palmer wrote:
>>   From my point of view all hype about moving to 4k sectors is highly
>> irrelevant to ZFS and current products on the market.
>>
>> 1. ZFS tends to use big recordsize for storing any data. This means most
>> files on your drives are already stored in 128k sectors. Storing small
>> tails in 512b or 4k sectors shouldn't give big difference.
>
> Performance testing has shown that running "advanced format" (aka 4kilobyte
> sector disks) with 512 byte alignment with ZFS seriously degrades performance
> compared to running with 4 kilobyte alignment.

Please understand me correctly, this is only my point of view on the 
problem as I never saw any tests that show difference between correct 
alignment of _partitions_ and alignment on _records_ on ZFS. This area 
is not thoroughly covered with test data.

-- 
Sphinx of black quartz judge my vow.

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 18 12:42:14 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: by hub.freebsd.org (Postfix, from userid 821)
	id 6F83D106566C; Tue, 18 Sep 2012 12:42:14 +0000 (UTC)
Date: Tue, 18 Sep 2012 12:42:14 +0000
From: John <jwd@freebsd.org>
To: FreeBSD FS <freebsd-fs@freebsd.org>
Message-ID: <20120918124214.GA79439@FreeBSD.org>
References: <20120918010812.GA71005@FreeBSD.org> <op.wktwglik34t2sn@tech304>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <op.wktwglik34t2sn@tech304>
User-Agent: Mutt/1.4.2.1i
Cc: Mark Felder <feld@feld.me>
Subject: Re: XFS/istgt backed 8TB xfs filesystem configuration?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2012 12:42:14 -0000

----- Mark Felder's Original Message -----
> Am I correct that you're proposing the following configuration:
> 
> FreeBSD Server -> zpool -> zvol -> istgt -> Linux server -> iscsi  
> initiator -> XFS filesystem
> 
> If so then yes, I am using a setup like this. It works quite well. :-)

Yes. Since I'm using multipath, I might add:

FreeBSD Server -> zpool -> zvol -> istgt -> Linux server -> iscsi(/dev/sd[bcde]) -> multipath(/dev/dm-X)

Have you found any blocksizes, cache sizes, etc, that seem optimal?

Thanks,
John

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 18 12:45:45 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 2FC9C106564A;
	Tue, 18 Sep 2012 12:45:45 +0000 (UTC) (envelope-from feld@feld.me)
Received: from feld.me (unknown [IPv6:2607:f4e0:100:300::2])
	by mx1.freebsd.org (Postfix) with ESMTP id D91BC8FC14;
	Tue, 18 Sep 2012 12:45:44 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=feld.me;
	s=blargle; 
	h=In-Reply-To:Message-Id:From:Mime-Version:Date:References:Subject:To:Content-Type;
	bh=BqCoZjhFi+dRO74BJUy/YLZbbB5ylu7v44fCy/3g9SY=; 
	b=WOEJmZSSS6/EvmBwmPccIxBbYhkiYB9tC/P+E4xhJzF/1cWK1btJzOcSSxzJ1K/F3pbTZOlobnDKrEdZy39zK0PRaI/159EjnUT+pndzytvZFvIs70ZcQQFB6uxtGI4V;
Received: from localhost ([127.0.0.1] helo=mwi1.coffeenet.org)
	by feld.me with esmtp (Exim 4.80 (FreeBSD))
	(envelope-from <feld@feld.me>)
	id 1TDxBK-000Crf-Eo; Tue, 18 Sep 2012 07:45:44 -0500
Received: from feld@feld.me by mwi1.coffeenet.org (Archiveopteryx 3.1.4)
	with esmtpa id 1347972332-3100-3099/5/75; Tue, 18 Sep 2012 12:45:32
	+0000
Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
To: FreeBSD FS <freebsd-fs@freebsd.org>, John <jwd@freebsd.org>
References: <20120918010812.GA71005@FreeBSD.org> <op.wktwglik34t2sn@tech304>
	<20120918124214.GA79439@FreeBSD.org>
Date: Tue, 18 Sep 2012 07:45:31 -0500
Mime-Version: 1.0
From: Mark Felder <feld@feld.me>
Message-Id: <op.wkt0d5k034t2sn@tech304>
In-Reply-To: <20120918124214.GA79439@FreeBSD.org>
User-Agent: Opera Mail/12.02 (FreeBSD)
X-SA-Report: ALL_TRUSTED=-1, KHOP_THREADED=-0.5
X-SA-Score: -1.5
Cc: 
Subject: Re: XFS/istgt backed 8TB xfs filesystem configuration?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2012 12:45:45 -0000

On Tue, 18 Sep 2012 07:42:14 -0500, John <jwd@freebsd.org> wrote:

>
> Have you found any blocksizes, cache sizes, etc, that seem optimal?
>

I'm providing iSCSI for XEN and VMWare servers and currently have no  
optimal tunings. ZFS's variable block size seems to handle the load just  
fine, and I have plenty of SSDs doing work for me. Just make sure you have  
carefully calculated the number of Connections and Sessions for istgt --  
your multipath and number of LUNs being shared will affect these numbers.

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 18 13:14:48 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 6B1C0106566B
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2012 13:14:48 +0000 (UTC)
	(envelope-from gpalmer@freebsd.org)
Received: from noop.in-addr.com (mail.in-addr.com [IPv6:2001:470:8:162::1])
	by mx1.freebsd.org (Postfix) with ESMTP id 24FDE8FC14
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2012 13:14:48 +0000 (UTC)
Received: from gjp by noop.in-addr.com with local (Exim 4.80 (FreeBSD))
	(envelope-from <gpalmer@freebsd.org>)
	id 1TDxdR-000PhC-GT; Tue, 18 Sep 2012 09:14:41 -0400
Date: Tue, 18 Sep 2012 09:14:41 -0400
From: Gary Palmer <gpalmer@freebsd.org>
To: Volodymyr Kostyrko <c.kworr@gmail.com>
Message-ID: <20120918131441.GC77784@in-addr.com>
References: <000001cd9239$ed734c80$c859e580$@goelli.de>
	<5052EC5D.4060403@gmail.com>
	<000a01cd9274$0aa0bba0$1fe232e0$@goelli.de>
	<505322C9.70200@gmail.com>
	<000001cd9377$e9e9b010$bdbd1030$@goelli.de>
	<50559CD8.1070700@gmail.com>
	<000001cd94f1$a4157030$ec405090$@goelli.de>
	<50581033.4040102@gmail.com> <20120918112355.GB77784@in-addr.com>
	<50586B99.40108@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <50586B99.40108@gmail.com>
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: gpalmer@freebsd.org
X-SA-Exim-Scanned: No (on noop.in-addr.com); SAEximRunCond expanded to false
Cc: freebsd-fs@freebsd.org
Subject: Re: AW: AW: AW: AW: AW: ZFS: Corrupted pool metadata after adding
 vdev to a pool - no opportunity to rescue data from healthy vdevs? Remove a
 vdev? Rewrite metadata?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2012 13:14:48 -0000

On Tue, Sep 18, 2012 at 03:39:53PM +0300, Volodymyr Kostyrko wrote:
> 18.09.2012 14:23, Gary Palmer wrote:
> >>   From my point of view all hype about moving to 4k sectors is highly
> >> irrelevant to ZFS and current products on the market.
> >>
> >> 1. ZFS tends to use big recordsize for storing any data. This means most
> >> files on your drives are already stored in 128k sectors. Storing small
> >> tails in 512b or 4k sectors shouldn't give big difference.
> >
> > Performance testing has shown that running "advanced format" (aka 4kilobyte
> > sector disks) with 512 byte alignment with ZFS seriously degrades performance
> > compared to running with 4 kilobyte alignment.
> 
> Please understand me correctly, this is only my point of view on the 
> problem as I never saw any tests that show difference between correct 
> alignment of _partitions_ and alignment on _records_ on ZFS. This area 
> is not thoroughly covered with test data.

I seem to recall that people made 4 kilobyte aligned partitions on 
advanced format drives without doing the gnop trick and still
suffered worse performance than when they did the gnop trick to make
ashift=12.  Check the list archives.

If you believe there is insufficient testing here and are saying that 
conventional wisdom regarding this is wrong, it is resonable to request
that you prove your position.

Gary

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 18 13:19:40 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E3444106566B
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2012 13:19:39 +0000 (UTC)
	(envelope-from c.kworr@gmail.com)
Received: from mail-ee0-f54.google.com (mail-ee0-f54.google.com [74.125.83.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 6791A8FC08
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2012 13:19:39 +0000 (UTC)
Received: by eeke52 with SMTP id e52so4166540eek.13
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2012 06:19:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=message-id:date:from:user-agent:mime-version:to:cc:subject
	:references:in-reply-to:content-type:content-transfer-encoding;
	bh=HQcrnhn5FbgWTPGKFJRDpujRWO14AEy0VhnD0ssuKb8=;
	b=D/xmSwUgtuP+qFM6slwzZhNLxL4vDw14leMCp5LZZRBdNAvOoLkA1ait6DdAIcK0b7
	xoGpfzdRPB9N07p9SvF0E7e0n+1oXlm1ZHYdD/CYGa0ygBOhaXxJ0c+hqaoAzIHqTOmi
	2TkN8uwK42aePMphLe2bwg/CMsMFIuusZjRkgzIlG4oCkZtJdko9G0EsYWDTfDleVbvD
	JYQJbgMViyV3FnBkfv1p+P4lq1wE/YxsQ6HY+TPHl6HbDckr/9y5Xz63AdiEHrr3YnNA
	RtNeqccKLwPIkOkAjr3KtU+io+PY5DYOnTknrNBFyJUlnmOyvdIzwToA8P2DCtueG2IJ
	AnLA==
Received: by 10.14.4.198 with SMTP id 46mr215206eej.11.1347974378028;
	Tue, 18 Sep 2012 06:19:38 -0700 (PDT)
Received: from green.local (90-224-132-95.pool.ukrtel.net. [95.132.224.90])
	by mx.google.com with ESMTPS id r45sm36290476eem.6.2012.09.18.06.19.35
	(version=SSLv3 cipher=OTHER); Tue, 18 Sep 2012 06:19:36 -0700 (PDT)
Message-ID: <505874E6.2050109@gmail.com>
Date: Tue, 18 Sep 2012 16:19:34 +0300
From: Volodymyr Kostyrko <c.kworr@gmail.com>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:15.0) Gecko/20120911 Thunderbird/15.0.1
MIME-Version: 1.0
To: Daniel Kalchev <daniel@digsys.bg>
References: <001a01cd900d$bcfcc870$36f65950$@goelli.de>
	<504F282D.8030808@gmail.com>
	<000a01cd90aa$0a277310$1e765930$@goelli.de>
	<5050461A.9050608@gmail.com>
	<000001cd9239$ed734c80$c859e580$@goelli.de>
	<5052EC5D.4060403@gmail.com>
	<000a01cd9274$0aa0bba0$1fe232e0$@goelli.de>
	<505322C9.70200@gmail.com>
	<000001cd9377$e9e9b010$bdbd1030$@goelli.de>
	<50559CD8.1070700@gmail.com>
	<000001cd94f1$a4157030$ec405090$@goelli.de>
	<50581033.4040102@gmail.com> <50584CC1.3030300@digsys.bg>
In-Reply-To: <50584CC1.3030300@digsys.bg>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
Subject: Re: AW: AW: AW: AW: AW: ZFS: Corrupted pool metadata after adding
 vdev to a pool - no opportunity to rescue data from healthy vdevs? Remove
 a vdev? Rewrite metadata?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2012 13:19:40 -0000

18.09.2012 13:28, Daniel Kalchev wrote:
>> From my point of view all hype about moving to 4k sectors is highly
>> irrelevant to ZFS and current products on the market.
>>
>> 1. ZFS tends to use big recordsize for storing any data. This means
>> most files on your drives are already stored in 128k sectors. Storing
>> small tails in 512b or 4k sectors shouldn't give big difference.
>
> Truth is, ZFS will write blocks of size from your media sector size up
> to 128K.
>
> The problem is that ZFS writes these records (even 128K) aligned to the
> sector size. So, once you write some data that is under 4k, your pool
> will become misaligned.

Not exactly. https://blogs.oracle.com/bonwick/entry/space_maps

1. ZFS divides the space on each virtual device into a few hundred 
metaslabs.
2. As Metaslabs are quite big so it's quite logical to make them aligned 
with high ashift value (I miss documentations on wheter this is true, 
but at least they should be dividable by 128k as this is default 
recordsize).
3. In each metaslab all space allocation is done through space maps. I 
have no documentation on this one either but due to a presence of gang 
blocks in ZFS specification all new allocation should be aligned to 128k 
if we are allocating 128k block, aligned to 64k if we are allocating 64k 
block and so on (yet again, I miss documentation on wheter this is true, 
but as far I understand Solaris way it's more practical to have data 
aligned then later dealing with it).

I'm bad at reading code so I can't really say how allocations are 
aligned on ZFS metaslabs, but function dealing with metaslab allocation 
takes one 'align' variable.

>> 2. For older drives each drive should be partitioned with respect to
>> 4k sectors. This is what -a option of gpart does: it aligns created
>> partitions to 4k sector bounds. But half a year ago I already found
>> some drives that can auto-shift all disk transactions to optimize read
>> and write performance. Courtesy of Microsoft Windows, OS that does not
>> care about anything not written in license terms, same as the users
>> do, so using this drives would be more straightforward and would not
>> cause decent pain to IT stuff about realigning partitions the way it
>> would just work.
>>
>
> This is only hype. There is no way any disk firmware can shift any
> transactions.

How about Seagate Smart Align? It's documented to do so. I haven't 
touched any Seagate drives as I don't like them anyway...

-- 
Sphinx of black quartz judge my vow.

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 18 13:25:38 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 8BF6F1065674;
	Tue, 18 Sep 2012 13:25:38 +0000 (UTC)
	(envelope-from c.kworr@gmail.com)
Received: from mail-ee0-f54.google.com (mail-ee0-f54.google.com [74.125.83.54])
	by mx1.freebsd.org (Postfix) with ESMTP id E020D8FC1C;
	Tue, 18 Sep 2012 13:25:37 +0000 (UTC)
Received: by eeke52 with SMTP id e52so4170581eek.13
	for <multiple recipients>; Tue, 18 Sep 2012 06:25:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=message-id:date:from:user-agent:mime-version:to:cc:subject
	:references:in-reply-to:content-type:content-transfer-encoding;
	bh=gApzxRvsz0CRRU3haEdzzrw0frra1yMDOAJsEquEGeo=;
	b=qFwGWTwYsvS5QKhV7ZFkfRgfQoAjCAlKmPTr3KwN5bmiQKtSJMKBRDQgOPGAqIem7q
	stJkdOqtpnrgpFhtdGrQy5ihG4vZUqpM6NR2qrSxQtEvHgOYPQlUJUVt+IjlC6IpaSlA
	2s23N3SvzhgjSlAb9hRzICqgnsH43kcv0KoDIKsEJ/2Qcw5BIA3s/gKmFaMyr0cb1LWO
	bjPZ1kMrwQnQvly394RcBz3dn9VwMEq2oPCpQX3NnLL6ODb/0HWfBNOgeTyiMixWFOkv
	btlHnoiTDB1/RcphrM4lpshmwDHVVIsQGvwKcFNQwf/CDGXxQ8qBVOCLD+By37F5LsDM
	OfgQ==
Received: by 10.14.224.4 with SMTP id w4mr199891eep.21.1347974736902;
	Tue, 18 Sep 2012 06:25:36 -0700 (PDT)
Received: from green.local (90-224-132-95.pool.ukrtel.net. [95.132.224.90])
	by mx.google.com with ESMTPS id k49sm36338835een.4.2012.09.18.06.25.34
	(version=SSLv3 cipher=OTHER); Tue, 18 Sep 2012 06:25:35 -0700 (PDT)
Message-ID: <5058764D.1010403@gmail.com>
Date: Tue, 18 Sep 2012 16:25:33 +0300
From: Volodymyr Kostyrko <c.kworr@gmail.com>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:15.0) Gecko/20120911 Thunderbird/15.0.1
MIME-Version: 1.0
To: Gary Palmer <gpalmer@freebsd.org>
References: <000001cd9239$ed734c80$c859e580$@goelli.de>
	<5052EC5D.4060403@gmail.com>
	<000a01cd9274$0aa0bba0$1fe232e0$@goelli.de>
	<505322C9.70200@gmail.com>
	<000001cd9377$e9e9b010$bdbd1030$@goelli.de>
	<50559CD8.1070700@gmail.com>
	<000001cd94f1$a4157030$ec405090$@goelli.de>
	<50581033.4040102@gmail.com>
	<20120918112355.GB77784@in-addr.com> <50586B99.40108@gmail.com>
	<20120918131441.GC77784@in-addr.com>
In-Reply-To: <20120918131441.GC77784@in-addr.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
Subject: Re: AW: AW: AW: AW: AW: ZFS: Corrupted pool metadata after adding
 vdev to a pool - no opportunity to rescue data from healthy vdevs? Remove
 a vdev? Rewrite metadata?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2012 13:25:38 -0000

18.09.2012 16:14, Gary Palmer wrote:
>> Please understand me correctly, this is only my point of view on the
>> problem as I never saw any tests that show difference between correct
>> alignment of _partitions_ and alignment on _records_ on ZFS. This area
>> is not thoroughly covered with test data.
>
> I seem to recall that people made 4 kilobyte aligned partitions on
> advanced format drives without doing the gnop trick and still
> suffered worse performance than when they did the gnop trick to make
> ashift=12.  Check the list archives.
>
> If you believe there is insufficient testing here and are saying that
> conventional wisdom regarding this is wrong, it is resonable to request
> that you prove your position.

I have one of the first 4k drives yet it's not yet available for 
testing. I'm planning to rerun tests on it when it will be available.

-- 
Sphinx of black quartz judge my vow.

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 18 13:34:56 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EDE3D106566B
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2012 13:34:55 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
	[131.104.91.36])
	by mx1.freebsd.org (Postfix) with ESMTP id AB9BA8FC0C
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2012 13:34:55 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ap8EAEp3WFCDaFvO/2dsb2JhbAA+BxaFcbc0giABAQUjBFIbDgoCAg0ZAlkGiBMLpxuTFIEhigAhhTWBEgOVYoEUjw2DAoE+Ihs
X-IronPort-AV: E=Sophos;i="4.80,443,1344225600"; d="scan'208";a="182105131"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-annu-pri.mail.uoguelph.ca with ESMTP; 18 Sep 2012 09:34:54 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 5C44DB4017;
	Tue, 18 Sep 2012 09:34:54 -0400 (EDT)
Date: Tue, 18 Sep 2012 09:34:54 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Konstantin Belousov <kostikbel@gmail.com>
Message-ID: <21418398.765673.1347975294365.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20120918085941.GZ37286@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: FS List <freebsd-fs@freebsd.org>
Subject: Re: testing/review of atomic export update patch
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2012 13:34:56 -0000

Konstantin Belousov wrote:
> On Mon, Sep 17, 2012 at 05:32:44PM -0400, Rick Macklem wrote:
> > Konstantin Belousov wrote:
> > > On Sun, Sep 16, 2012 at 05:41:25PM -0400, Rick Macklem wrote:
> > > > Hi,
> > > >
> > > > There is a simple patch at:
> > > >   http://people.freebsd.org/~rmacklem/atomic-export.patch
> > > > that can be applied to a kernel + mountd, so that the new
> > > > nfsd can be suspended by mountd while the exports are being
> > > > reloaded. It adds a new "-S" flag to mountd to enable this.
> > > > (This avoids the long standing bug where clients receive ESTALE
> > > >  replies to RPCs while mountd is reloading exports.)
> > >
> > > This looks simple, but also somewhat worrisome. What would happen
> > > if the mountd crashes after nfsd suspension is requested, but
> > > before
> > > resume was performed ?
> > >
> > > Might be, mountd should check for suspended nfsd on start and
> > > unsuspend
> > > it, if some flag is specified ?
> > Well, I think that happens with the patch as it stands.
> >
> > suspend is done if the "-S" option is specified, but that is a no op
> > if it is already suspended. The resume is done no matter what flags
> > are provided, so mountd will always try and do a "resume".
> > --> get_exportlist() is always called when mountd is started up and
> >     it does the resume unconditionally when it completes.
> >     If mountd repeatedly crashes before completing get_exportlist()
> >     when it is started up, the exports will be all messed up, so
> >     having the nfsd threads suspended doesn't seem so bad for this
> >     case (which hopefully never happens;-).
> >
> > Both suspend and resume are just no ops for unpatched kernels.
> >
> > Maybe the comment in front of "resume" should explicitly explain
> > this, instead of saying resume is harmless to do under all
> > conditions?
> >
> > Thanks for looking at it, rick
> I see.
> 
> My another note is that there is no any protection against parallel
> instances of suspend/resume happen. For instance, one thread could set
> suspend_nfsd = 1 and be descheduled, while another executes resume
> code sequence meantime. Then it would see suspend_nfsd != 0, while
> nfsv4rootfs_lock not held, and tries to unlock it. It seems that
> nfsv4_unlock would silently exit. The suspending thread resumes,
> and obtains the lock. You end up with suspend_nfsd == 0 but lock held.
Yes. I had assumed that mountd would be the only thing using these syscalls
and it is single threaded. (The syscalls can only be done by root for the
obvious reasons.;-)

Maybe the following untested version of the syscalls would be better, since
they would allow multiple concurrent calls to either suspend or resume.
(There would still be an indeterminate case if one thread called resume
 concurrently with another few calling suspend, but that is unavoidable,
 I think?)

Again, thanks for the comments, rick
--- untested version of syscalls ---
	} else if ((uap->flag & NFSSVC_SUSPENDNFSD) != 0) {
		NFSLOCKV4ROOTMUTEX();
		if (suspend_nfsd == 0) {
			/* Lock out all nfsd threads */
			igotlock = 0;
			while (igotlock == 0 && suspend_nfsd == 0) {
				igotlock = nfsv4_lock(&nfsv4rootfs_lock, 1,
				    NULL, NFSV4ROOTLOCKMUTEXPTR, NULL);
			}
			suspend_nfsd = 1;
		}
		NFSUNLOCKV4ROOTMUTEX();
		error = 0;
	} else if ((uap->flag & NFSSVC_RESUMENFSD) != 0) {
		NFSLOCKV4ROOTMUTEX();
		if (suspend_nfsd != 0) {
			nfsv4_unlock(&nfsv4rootfs_lock, 0);
			suspend_nfsd = 0;
		}
		NFSUNLOCKV4ROOTMUTEX();
		error = 0;
	}

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 18 13:40:22 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 6B638106564A
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2012 13:40:22 +0000 (UTC)
	(envelope-from daniel@digsys.bg)
Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230])
	by mx1.freebsd.org (Postfix) with ESMTP id DCAF68FC08
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2012 13:40:21 +0000 (UTC)
Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5])
	(authenticated bits=0)
	by smtp-sofia.digsys.bg (8.14.5/8.14.5) with ESMTP id q8IDeBn6022485
	(version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
	Tue, 18 Sep 2012 16:40:12 +0300 (EEST)
	(envelope-from daniel@digsys.bg)
Message-ID: <505879BB.3000806@digsys.bg>
Date: Tue, 18 Sep 2012 16:40:11 +0300
From: Daniel Kalchev <daniel@digsys.bg>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:10.0.7) Gecko/20120918 Thunderbird/10.0.7
MIME-Version: 1.0
To: Volodymyr Kostyrko <c.kworr@gmail.com>
References: <001a01cd900d$bcfcc870$36f65950$@goelli.de>
	<504F282D.8030808@gmail.com>
	<000a01cd90aa$0a277310$1e765930$@goelli.de>
	<5050461A.9050608@gmail.com>
	<000001cd9239$ed734c80$c859e580$@goelli.de>
	<5052EC5D.4060403@gmail.com>
	<000a01cd9274$0aa0bba0$1fe232e0$@goelli.de>
	<505322C9.70200@gmail.com>
	<000001cd9377$e9e9b010$bdbd1030$@goelli.de>
	<50559CD8.1070700@gmail.com>
	<000001cd94f1$a4157030$ec405090$@goelli.de>
	<50581033.4040102@gmail.com> <50584CC1.3030300@digsys.bg>
	<505874E6.2050109@gmail.com>
In-Reply-To: <505874E6.2050109@gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
Subject: Re: AW: AW: AW: AW: AW: ZFS: Corrupted pool metadata after adding
 vdev to a pool - no opportunity to rescue data from healthy vdevs? Remove
 a vdev? Rewrite metadata?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2012 13:40:22 -0000


On 18.09.12 16:19, Volodymyr Kostyrko wrote:
> 18.09.2012 13:28, Daniel Kalchev wrote:
>>
>> The problem is that ZFS writes these records (even 128K) aligned to the
>> sector size. So, once you write some data that is under 4k, your pool
>> will become misaligned.
>
> Not exactly. https://blogs.oracle.com/bonwick/entry/space_maps

There is no statement in this post that contradicts with what I 
commented already. I may have been not precise enough -- the 
mis-alignment might happen within the metaslab, not the whole zpool. ZFS 
clearly does not write larger blocks than necessary, the smallest being 
the sector size.

The sector size is represented by the ashift value. Sector size being 
2^ashift. The ashift value is on per-vdev basis and is calculated as the 
largest sector size of the vdev members. So if you create an vdev mirror 
of two drives that report 512byte sectors to the OS, the resulting vdev 
will have ashift=9. If you create an mirror vdev from one drive that 
reports 512b sectors and another that report 4096b sectors, then you 
will have ashift=12.

You do not need to have all vdevs in an zpool having the same ashift 
value (and thus the same sector size).

>
>>> 2. For older drives each drive should be partitioned with respect to
>>> 4k sectors. This is what -a option of gpart does: it aligns created
>>> partitions to 4k sector bounds. But half a year ago I already found
>>> some drives that can auto-shift all disk transactions to optimize read
>>> and write performance. Courtesy of Microsoft Windows, OS that does not
>>> care about anything not written in license terms, same as the users
>>> do, so using this drives would be more straightforward and would not
>>> cause decent pain to IT stuff about realigning partitions the way it
>>> would just work.
>>>
>>
>> This is only hype. There is no way any disk firmware can shift any
>> transactions.
>
> How about Seagate Smart Align? It's documented to do so. I haven't 
> touched any Seagate drives as I don't like them anyway...
>

I have a lot of Seagate drives with 4k sectors in use with ZFS. Despite 
these claims, performance is far worse if writes are not aligned to 4k. 
It is also awful with UFS if you don't care to align partitions. This is 
just marketing. Their rewrite implementation might be better than 
others, but still is better avoided.

Daniel

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 18 15:05:59 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9CFEE106564A
	for <freebsd-fs@FreeBSD.org>; Tue, 18 Sep 2012 15:05:59 +0000 (UTC)
	(envelope-from gibbs@FreeBSD.org)
Received: from aslan.scsiguy.com (mail.scsiguy.com [70.89.174.89])
	by mx1.freebsd.org (Postfix) with ESMTP id 6B24B8FC0C
	for <freebsd-fs@FreeBSD.org>; Tue, 18 Sep 2012 15:05:58 +0000 (UTC)
Received: from [192.168.6.100] (207-225-98-3.dia.static.qwest.net
	[207.225.98.3]) (authenticated bits=0)
	by aslan.scsiguy.com (8.14.5/8.14.5) with ESMTP id q8IF5vUf036899
	(version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO);
	Tue, 18 Sep 2012 09:05:58 -0600 (MDT)
	(envelope-from gibbs@FreeBSD.org)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 6.0 \(1486\))
From: "Justin T. Gibbs" <gibbs@FreeBSD.org>
In-Reply-To: <1531430179.669311.1347831685957.JavaMail.root@erie.cs.uoguelph.ca>
Date: Tue, 18 Sep 2012 09:06:03 -0600
Content-Transfer-Encoding: 7bit
Message-Id: <C94CA78C-CDA9-4472-8BB7-CFD46FA0B3D9@FreeBSD.org>
References: <1531430179.669311.1347831685957.JavaMail.root@erie.cs.uoguelph.ca>
To: Rick Macklem <rmacklem@uoguelph.ca>
X-Mailer: Apple Mail (2.1486)
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
	(aslan.scsiguy.com [70.89.174.89]);
	Tue, 18 Sep 2012 09:05:58 -0600 (MDT)
Cc: FS List <freebsd-fs@FreeBSD.org>, Will Andrews <willa@spectralogic.com>
Subject: Re: testing/review of atomic export update patch
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2012 15:05:59 -0000

On Sep 16, 2012, at 3:41 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote:

> Hi,
> 
> There is a simple patch at:
>  http://people.freebsd.org/~rmacklem/atomic-export.patch
> that can be applied to a kernel + mountd, so that the new
> nfsd can be suspended by mountd while the exports are being
> reloaded. It adds a new "-S" flag to mountd to enable this.
> (This avoids the long standing bug where clients receive ESTALE
> replies to RPCs while mountd is reloading exports.)

At Spectra, we are successfully using the NFSE patch set from 
nfse.sourceforge.net (FreeBSD PR 136865).  It addresses
the ESTALE problem in addition to cleaning up several aspects
of exports processing.

Have you reviewed the NFSE work?  Do you have any issues
or concerns with it?  What is the right path for getting NFSE
integrated into FreeBSD?

--
Justin


From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 18 15:14:20 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6D936106564A
	for <fs@FreeBSD.org>; Tue, 18 Sep 2012 15:14:20 +0000 (UTC)
	(envelope-from gibbs@scsiguy.com)
Received: from aslan.scsiguy.com (www.scsiguy.com [70.89.174.89])
	by mx1.freebsd.org (Postfix) with ESMTP id 427018FC12
	for <fs@FreeBSD.org>; Tue, 18 Sep 2012 15:14:16 +0000 (UTC)
Received: from [192.168.6.100] (207-225-98-3.dia.static.qwest.net
	[207.225.98.3]) (authenticated bits=0)
	by aslan.scsiguy.com (8.14.5/8.14.5) with ESMTP id q8IFEGXn036944
	(version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO)
	for <fs@FreeBSD.org>; Tue, 18 Sep 2012 09:14:16 -0600 (MDT)
	(envelope-from gibbs@scsiguy.com)
From: "Justin T. Gibbs" <gibbs@scsiguy.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Message-Id: <76CBA055-021F-458D-8978-E9A973D9B783@scsiguy.com>
Date: Tue, 18 Sep 2012 09:14:22 -0600
To: fs@FreeBSD.org
Mime-Version: 1.0 (Mac OS X Mail 6.0 \(1486\))
X-Mailer: Apple Mail (2.1486)
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
	(aslan.scsiguy.com [70.89.174.89]);
	Tue, 18 Sep 2012 09:14:16 -0600 (MDT)
Cc: 
Subject: ZFS: Deadlock during vnode recycling
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2012 15:14:20 -0000

One of our systems became unresponsive due to an inability to recycle
vnodes.  We tracked this down to a deadlock in zfs_zget().  I've =
attached
the stack trace from the vnlru process to the end of this email.

We are currently testing the following patch. Since this issue is hard =
to
replicate I would appreciate review and feedback before I commit it to
FreeBSD.

Thanks,
Jusitn

Patch
=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D=
8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=
=3D8<
Change 635310 by justing@justing_ns1_spectrabsd on 2012/09/17 15:30:14

	For most vnode consumers of ZFS, the appropriate behavior
	when encountering a vnode that is in the process of being
	reclaimed is to wait for that process to complete and then
	allocate a new vnode.  This behavior is enforced in zfs_zget()
	by checking for the VI_DOOMED vnode flag.  In the case of
	the thread actually reclaiming the vnode, zfs_zget() must
	return the current vnode, otherwise a deadlock will occur.
=09
	sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_znode.h:
		Create a virtual znode field, z_reclaim_td, which is
		implemeted as a macro that redirects to =
z_task.ta_context.
=09
		z_task is only used by the reclaim code to perform the
		final cleanup of a znode in a secondary thread.  Since
		this can only occur after any calls to zfs_zget(), it
		is safe to reuse the ta_context field.
=09
	sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:
		In zfs_freebsd_reclaim(), record curthread in the
		znode being reclaimed.
=09
	sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:
		o Null out z_reclaim_td when znode_ts are constructed.
=09
		o In zfs_zget(), return a "doomed vnode" if the current
		  thread is actively reclaiming this object.

Affected files ...

... =
//SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs=
_znode.h#2 edit
... =
//SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vno=
ps.c#3 edit
... =
//SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_zno=
de.c#2 edit

Differences ...

=3D=3D=3D=3D =
//SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs=
_znode.h#2 (text) =3D=3D=3D=3D

@@ -241,6 +241,7 @@
 	struct task	z_task;
 } znode_t;
=20
+#define	z_reclaim_td z_task.ta_context
=20
 /*
  * Convert between znode pointers and vnode pointers

=3D=3D=3D=3D =
//SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vno=
ps.c#3 (text) =3D=3D=3D=3D

@@ -6083,6 +6083,13 @@
=20
 	ASSERT(zp !=3D NULL);
=20
+ 	/*
+	 * Mark the znode so that operations that typically block
+	 * waiting for reclamation to complete will return the current,
+	 * "doomed vnode", for this thread.
+	 */
+	zp->z_reclaim_td =3D curthread;
+
 	/*
 	 * Destroy the vm object and flush associated pages.
 	 */

=3D=3D=3D=3D =
//SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_zno=
de.c#2 (text) =3D=3D=3D=3D

@@ -158,6 +158,7 @@
 	zp->z_dirlocks =3D NULL;
 	zp->z_acl_cached =3D NULL;
 	zp->z_moved =3D 0;
+	zp->z_reclaim_td =3D NULL;
 	return (0);
 }
=20
@@ -1192,7 +1193,8 @@
 				dying =3D 1;
 			else {
 				VN_HOLD(vp);
-				if ((vp->v_iflag & VI_DOOMED) !=3D 0) {
+				if ((vp->v_iflag & VI_DOOMED) !=3D 0 &&
+				    zp->z_reclaim_td !=3D curthread) {
 					dying =3D 1;
 					/*
 					 * Don't VN_RELE() vnode here, =
because

vnlru_proc debug session
=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D=
8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=
=3D8<
#0  sched_switch (td=3D0xfffffe000f87b470, newtd=3D0xfffffe000d36c8e0, =
flags=3DVariable "flags" is not available.
) at /usr/src/sys/kern/sched_ule.c:1927
#1  0xffffffff8057f2b6 in mi_switch (flags=3D260, newtd=3D0x0) at =
/usr/src/sys/kern/kern_synch.c:485
#2  0xffffffff805b8982 in sleepq_timedwait (wchan=3D0xfffffe05c7515640, =
pri=3D0) at /usr/src/sys/kern/subr_sleepqueue.c:658
#3  0xffffffff8057f89f in _sleep (ident=3D0xfffffe05c7515640, lock=3D0x0, =
priority=3DVariable "priority" is not available.
) at /usr/src/sys/kern/kern_synch.c:246
#4  0xffffffff81093035 in zfs_zget (zfsvfs=3D0xfffffe001de4c000, =
obj_num=3D81963, zpp=3D0xffffff8c60dc51b0) at =
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/=
zfs_znode.c:1224
#5  0xffffffff810bec9a in zfs_get_data (arg=3D0xfffffe001de4c000, =
lr=3D0xffffff820f5330b8, buf=3D0x0, zio=3D0xfffffe0584625000) at =
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/=
zfs_vnops.c:1142
#6  0xffffffff81096891 in zil_commit (zilog=3D0xfffffe001c382800, =
foid=3DVariable "foid" is not available.
) at =
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/=
zil.c:1048
#7  0xffffffff810bceb0 in zfs_freebsd_write (ap=3DVariable "ap" is not =
available.
) at =
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/=
zfs_vnops.c:1083
#8  0xffffffff8081f112 in VOP_WRITE_APV (vop=3D0xffffffff8112cf40, =
a=3D0xffffff8c60dc5680) at vnode_if.c:951
#9  0xffffffff807b1a6b in vnode_pager_generic_putpages =
(vp=3D0xfffffe05c76171e0, ma=3D0xffffff8c60dc5890, bytecount=3DVariable =
"bytecount" is not available.
) at vnode_if.h:413
#10 0xffffffff807b1749 in vnode_pager_putpages =
(object=3D0xfffffe05e9ee9bc8, m=3D0xffffff8c60dc5890, count=3D61440, =
sync=3D1, rtvals=3D0xffffff8c60dc57a0) at vnode_if.h:1189
#11 0xffffffff807aaee0 in vm_pageout_flush (mc=3D0xffffff8c60dc5890, =
count=3D15, flags=3D1, mreq=3D0, prunlen=3D0xffffff8c60dc594c, =
eio=3D0xffffff8c60dc59c0) at vm_pager.h:145
#12 0xffffffff807a3da3 in vm_object_page_collect_flush (object=3DVariable =
"object" is not available.
) at /usr/src/sys/vm/vm_object.c:936
#13 0xffffffff807a3f23 in vm_object_page_clean =
(object=3D0xfffffe05e9ee9bc8, start=3DVariable "start" is not available.
) at /usr/src/sys/vm/vm_object.c:861
#14 0xffffffff807a42d4 in vm_object_terminate =
(object=3D0xfffffe05e9ee9bc8) at /usr/src/sys/vm/vm_object.c:706
#15 0xffffffff807b241e in vnode_destroy_vobject (vp=3D0xfffffe05c76171e0) =
at /usr/src/sys/vm/vnode_pager.c:167
#16 0xffffffff810beec7 in zfs_freebsd_reclaim (ap=3DVariable "ap" is not =
available.
) at =
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/=
zfs_vnops.c:6146
#17 0xffffffff806101e1 in vgonel (vp=3D0xfffffe05c76171e0) at =
vnode_if.h:830
#18 0xffffffff80616379 in vnlru_proc () at =
/usr/src/sys/kern/vfs_subr.c:734

(kgdb) frame 4
#4  0xffffffff81093035 in zfs_zget (zfsvfs=3D0xfffffe001de4c000, =
obj_num=3D81963, zpp=3D0xffffff8c60dc51b0) at =
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/=
zfs_znode.c:1224
1224                                    tsleep(zp, 0, "zcollide", 1);
(kgdb) l
1219                                    sa_buf_rele(db, NULL);
1220                                    mutex_exit(&zp->z_lock);
1221                                    ZFS_OBJ_HOLD_EXIT(zfsvfs, =
obj_num);
1222                                    if (vp !=3D NULL)
1223                                            VN_RELE(vp);
1224                                    tsleep(zp, 0, "zcollide", 1);
1225                                    goto again;
1226                            }
1227                            *zpp =3D zp;
1228                            err =3D 0;
(kgdb)


From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 18 16:30:42 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 7A532106566C
	for <freebsd-fs@FreeBSD.org>; Tue, 18 Sep 2012 16:30:42 +0000 (UTC)
	(envelope-from bf1783@googlemail.com)
Received: from mail-vc0-f182.google.com (mail-vc0-f182.google.com
	[209.85.220.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 2C39B8FC08
	for <freebsd-fs@FreeBSD.org>; Tue, 18 Sep 2012 16:30:41 +0000 (UTC)
Received: by vcbfw7 with SMTP id fw7so61033vcb.13
	for <freebsd-fs@FreeBSD.org>; Tue, 18 Sep 2012 09:30:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=googlemail.com; s=20120113;
	h=mime-version:reply-to:in-reply-to:references:date:message-id
	:subject:from:to:cc:content-type;
	bh=++XF6Fh2ZJnlFRBdZFpe6N9NzXUBMM+TSOzpafg2JRk=;
	b=qN5WcYYHUWktZRXAwx50/LRbYvp8RmZgyBA8Vd5r+JpK8mfEdrZWdD4uT/qcQ80f/y
	P9aWxT/E7iIT22ML+MbMp828JCvLfL9l5V3j+/KGO6yBM2+fXTtorcNLyUZ1iJwwQJD+
	KaCz4x0Ju9Wk1863xy1BIjCwiYIemPNdRy4e5zlE0RwhsKjODec4Wb6EnHtUkQGeBkr/
	t28FY2bjbfmFxtvsuxPbfViXi56V+GEas3S/SIg7ELrhC+FqtVSl0tFqsd4D0dZwT5ow
	dJZ7zh9EJw3tWbJ6GMnqwNMMX1T/M2hMwvj6/3Kb86JOhlKonfVlN0eCNAlA4kv0qbwv
	f4iA==
MIME-Version: 1.0
Received: by 10.220.119.204 with SMTP id a12mr218109vcr.66.1347985840878; Tue,
	18 Sep 2012 09:30:40 -0700 (PDT)
Received: by 10.58.4.166 with HTTP; Tue, 18 Sep 2012 09:30:40 -0700 (PDT)
In-Reply-To: <20120918084924.GY37286@deviant.kiev.zoral.com.ua>
References: <CAGFTUwMVAmoN49u1bT_8LuxKo6JKrB6shbw773MpmJ5i7q=Qeg@mail.gmail.com>
	<20120917121925.GQ37286@deviant.kiev.zoral.com.ua>
	<20120917183654.GA13273@x2.osted.lan>
	<20120918084924.GY37286@deviant.kiev.zoral.com.ua>
Date: Tue, 18 Sep 2012 12:30:40 -0400
Message-ID: <CAGFTUwNP9qB1D+1E6Pw_ud1ESomcDs2bAOY9c_VRYLZ5AiF+5g@mail.gmail.com>
From: "b. f." <bf1783@googlemail.com>
To: freebsd-fs@FreeBSD.org
Content-Type: text/plain; charset=ISO-8859-1
Cc: 
Subject: Re: Problems after recent nullfs,vfs changes in 10.0-CURRENT
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: bf1783@gmail.com
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2012 16:30:42 -0000

The following deals with some problems exposed by r240283-5,
particularly (but not only) when used with changes to tmpfs that were
first proposed by kib@ on 21 June 2010 on this list, in a thread
entitled "Tmpfs elimination of double copy":

http://docs.freebsd.org/cgi/getmsg.cgi?fetch=20463+0+archive/2010/freebsd-fs/20100627.freebsd-fs

On 9/18/12, Konstantin Belousov <kostikbel@gmail.com> wrote:
> On Mon, Sep 17, 2012 at 08:36:54PM +0200, Peter Holm wrote:
>> On Mon, Sep 17, 2012 at 03:19:25PM +0300, Konstantin Belousov wrote:
>> > Please mail fs@, possibly Cc:-ing me.
>> >
>> > On Mon, Sep 17, 2012 at 03:04:46AM -0400, b. f. wrote:
>> > > The recent nullfs or vfs changes (r240283-5) have exposed some
>> > > problems with my tinderbox.  In this tinderbox, I've been using
>> > > recent
>> > > versions of -CURRENT with Gleb's tmpfs rbtree patch:
>> > >
>> > > http://people.freebsd.org/~gleb/tmpfs-nrbtree.1.patch
>> > >
>> > > and a merged version of your tmpfs single-buffer patch:
>> > >
>> > > http://people.freebsd.org/~kib/misc/tmpfs.12.patch
>> > >
>> > > The tinderbox performs builds in a tmpfs filesystem that is nullfs
>> > > grafted to a ufs filesystem.  After r240283-5, builds of
>> > > ports/lang/ocaml failed when a cp(1) of an executable failed with
>> > > ETXTBSY. After reverting r240285, the builds of ocaml succeeded.
>> > >
>> > > I've attached logs of the failed and successful builds.  Can you
>> > > guess
>> > > whether the problem is solely due to the recent nullfs and vfs
>> > > changes, or to some defect in Gleb's proposed changes, or to a
>> > > problem
>> > > with your proposed tmpfs change, or my merging of it?  What further
>> > > changes or tests would you suggest to help find the source of the
>> > > problem?
>> > >
>> > > I've attached a diff of the relevant changes to the system sources
>> > > used in the tinderbox, and logs of the successful (*.log) and
>> > > unsuccessful (*.log.error) ocaml builds.
>> >
>> > Please show me the mount -v output, and specify which filesystems
>> > are used where.

The following is a typical layout for one run of the tinderbox (which
is in /home/shared/freebsd/tinderbox):

/dev/ufs/d1root on / (ufs, local, noatime, writes: sync 13 async 25,
reads: sync 553 async 42, fsid 8aabfa4d68614a9f)
devfs on /dev (devfs, local, fsid 00ff007171000000)
tmpfs on /tmp (tmpfs, local, nosuid, fsid 01ff008787000000)
/dev/ufs/d1var on /var (ufs, local, noatime, journaled soft-updates,
writes: sync 15 async 269, reads: sync 664 async 12, fsid
a5abfa4d331091c9)
/dev/ufs/d1usr on /usr (ufs, local, noatime, journaled soft-updates,
writes: sync 2 async 0, reads: sync 765 async 12, fsid
b4abfa4d94c0f782)
/dev/ufs/d1usrlocal on /usr/local (ufs, local, noatime, journaled
soft-updates, writes: sync 32 async 298, reads: sync 2867 async 106,
fsid c4abfa4d96ab4351)
/dev/ufs/d1home on /home (ufs, local, noatime, journaled soft-updates,
writes: sync 16 async 123, reads: sync 2065 async 268, fsid
ceabfa4d9bb85870)

the filesystem used for the port builds:

/tmp/tinderbox/7.4-amd64-u1 on
/home/shared/freebsd/tinderbox/7.4-amd64-u1 (nullfs, local, fsid
03ff002929000000)
/home/shared/freebsd/ports/head on
/home/shared/freebsd/tinderbox/7.4-amd64-u1/a/ports (nullfs, local,
read-only, fsid 04ff002929000000)
/home/shared/freebsd/tinderbox/jails/7.4-amd64/src on
/home/shared/freebsd/tinderbox/7.4-amd64-u1/usr/src (nullfs, local,
read-only, fsid 05ff002929000000)
devfs on /home/shared/freebsd/tinderbox/7.4-amd64-u1/dev (devfs,
local, fsid 06ff007171000000)
/home/shared/freebsd/distfiles on
/home/shared/freebsd/tinderbox/7.4-amd64-u1/distcache (nullfs, local,
fsid 07ff002929000000)
linprocfs on /home/shared/freebsd/tinderbox/7.4-amd64-u1/compat/linux/proc
(linprocfs, local, fsid 08ff00b5b5000000)
procfs on /home/shared/freebsd/tinderbox/7.4-amd64-u1/proc (procfs,
local, fsid 09ff000202000000)

>> >
>> > The issue almost definitely is the held reference on the vm object.
>> > Lets remove Gleb' patches from the picture at all.
>> >
>> > After rethinking VV_TEXT handling both for nullfs and tmpfs (patched),
>> > I see two issues ATM:
>> >
>> > 1. VV_TEXT may be set either on the lower vnode, or on the nullfs
>> > vnode.
>> > So if you executed a file from nullfs alias, lower vnode does not get
>> > VV_TEXT set, and executable can still be opened for write.
>> >
>> > 2. For tmpfs, the hack I added to clear VV_TEXT if swap vm object
>> > reference
>> > count == 1, is not called often enough. This allows to VV_TEXT to leak,
>> > esp.
>> > because nullfs after r240283 is not eager to reclaim its vnodes.
>> >
>> > I updated my branch with tmpfs patches with the following changes:
>> >
>> > 1. nullfs now bypasses the VV_TEXT set and clear operations to the
>> > lower
>> > vnode.
>> >
>> > 2. the tmpfs_clear_text() hack is removed, instead
>> > vm_object_deallocate()
>> > clears VV_TEXT on the tmpfs vnode if reference count goes to 1.
>> >
>> > Updated patch is at
>> > http://people.freebsd.org/~kib/misc/tmpfs.13.patch
>> > I tested it very lightly, so to say.
>>
>> I see the problem on a pristine r240611. Test scenario included.
>>
>> + mdconfig -a -t swap -s 1g -u 5
>> + bsdlabel -w md5 auto
>> + newfs -U md5a
>> + mount /dev/md5a /mnt2
>> + chmod 777 /mnt2
>> + mount
>> + grep /mnt
>> + grep -q tmpfs
>> + mount -t tmpfs tmpfs /mnt
>> + chmod 777 /mnt
>> + mkdir /mnt2/mp
>> + mount -t nullfs /mnt /mnt2/mp
>> + cp /usr/bin/true /mnt2/mp/true
>> + /mnt/true
>> +
>> + rm -f /mnt/true
>> + cp /usr/bin/true /mnt2/mp/true
>> + /mnt2/mp/true
>> +
>> ./nullfs12.sh: cannot create /mnt2/mp/true: Text file busy
>> + echo FAIL 2
>> FAIL 2
>> + mount
>> + egrep 'tmpfs|nullfs|/mnt |/mnt2 '
>> /dev/md5a on /mnt2 (ufs, local, soft-updates)
>> tmpfs on /mnt (tmpfs, NFS exported, local)
>> /mnt on /mnt2/mp (nullfs, local)
>> + rm -f /mnt2/mp/true
>
> Yes, this is very close if not identical to the only test which I performed
> with the tmpfs.13.patch.
>

I can no longer reproduce the port build failures on r240651 amd64
after applying your tmpfs.13.patch, and I haven't encountered any
other obvious problems in the short time that I've been using it.  I
did not rerun Peter Holm's nullfs12.sh test, since you had already
subjected your patch to a similar test.

Regards,
                b.

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 18 23:14:05 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 485D9106564A;
	Tue, 18 Sep 2012 23:14:05 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
	[131.104.91.44])
	by mx1.freebsd.org (Postfix) with ESMTP id B52898FC0C;
	Tue, 18 Sep 2012 23:14:04 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ap8EACz/WFCDaFvO/2dsb2JhbAA+BxaFc7dCgiABAQUjBFIbGAICDRkCWQYTiAALp0WTDIEhiXohhT2BEgOVY4EUjw2DAoE+Ihs
X-IronPort-AV: E=Sophos;i="4.80,445,1344225600"; d="scan'208";a="179618068"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 18 Sep 2012 19:12:54 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id EAB29B4012;
	Tue, 18 Sep 2012 19:12:54 -0400 (EDT)
Date: Tue, 18 Sep 2012 19:12:54 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: "Justin T. Gibbs" <gibbs@FreeBSD.org>
Message-ID: <2050472507.821722.1348009974939.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <C94CA78C-CDA9-4472-8BB7-CFD46FA0B3D9@FreeBSD.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: FS List <freebsd-fs@FreeBSD.org>, Will Andrews <willa@spectralogic.com>
Subject: Re: testing/review of atomic export update patch
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2012 23:14:05 -0000

Justin T. Gibbs wrote:
> On Sep 16, 2012, at 3:41 PM, Rick Macklem <rmacklem@uoguelph.ca>
> wrote:
> 
> > Hi,
> >
> > There is a simple patch at:
> >  http://people.freebsd.org/~rmacklem/atomic-export.patch
> > that can be applied to a kernel + mountd, so that the new
> > nfsd can be suspended by mountd while the exports are being
> > reloaded. It adds a new "-S" flag to mountd to enable this.
> > (This avoids the long standing bug where clients receive ESTALE
> > replies to RPCs while mountd is reloading exports.)
> 
> At Spectra, we are successfully using the NFSE patch set from
> nfse.sourceforge.net (FreeBSD PR 136865). It addresses
> the ESTALE problem in addition to cleaning up several aspects
> of exports processing.
> 
> Have you reviewed the NFSE work? Do you have any issues
> or concerns with it? What is the right path for getting NFSE
> integrated into FreeBSD?
> 
I, personally, have not found the time to review it. As such,
I can't state specifics, however there have been concerns w.r.t.
a switch from mountd->nfse resulting in different behaviour when
used with the same /etc/exports file used for mountd.

Some questions that need to be answered w.r.t. nfse, which I
haven't had the time to do:
- Are the differences listed here significant enough for a
  change to be considered a POLA violation?
    http://nfse.sourceforge.net/COMPATIBILITY

- If the server mount point is /sub1 and the only line
  referring to this server volume in /etc/exports looks like:

  /sub1/sub2 client.net

  Does the following mount command work on client.net
  # mount -t nfs -o nfsv3 server.net:/sub1 /mnt
  when nfse is run with -C using the /etc/exports file?
  (If this mount works, many would consider this a POLA
   violation.)

  This is typically referred to as an "administrative control",
  since it is only enforced by mountd for the Mount protocol,
  but is considered an important feature by some (rwatson@
  expressed a desire/need for it).

- Does the nfse patch handle exporting of all file systems types
  and, in particular, the `zfs share` case.

Beyond that, someone with the time to shepherd it into head as
a mountd replacement. (I`ll admit I`m mainly interested in NFSv4.1
these days and proposed the simple patch because I do not have
the time to look at nfse seriously and figured it might be
sufficient to keep people happy.)

rick

> --
> Justin

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 19 00:37:57 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E4671106566B;
	Wed, 19 Sep 2012 00:37:57 +0000 (UTC)
	(envelope-from thomas@gibfest.dk)
Received: from mail.tyknet.dk (mail.tyknet.dk [IPv6:2a01:4f8:141:52a3:186::])
	by mx1.freebsd.org (Postfix) with ESMTP id 72B7B8FC0C;
	Wed, 19 Sep 2012 00:37:57 +0000 (UTC)
Received: from [10.10.1.100] (unknown [217.71.4.82])
	(using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.tyknet.dk (Postfix) with ESMTPSA id 8A63410287D;
	Wed, 19 Sep 2012 02:37:48 +0200 (CEST)
X-DKIM: OpenDKIM Filter v2.5.2 mail.tyknet.dk 8A63410287D
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=gibfest.dk; s=default;
	t=1348015068; bh=eeTVbCgeGbGPrHP5HH2Fvyd496gfMCEq+iyJGlaX0FM=;
	h=Date:From:To:CC:Subject:References:In-Reply-To;
	b=bbxPfpKTvBFbIqknmBOi5Nyjz+eJsjY49g8mIc6OjnKGPHyGd/lkR5VTx4Rec9PCB
	5GOyPbndoGh12bN/o3bx44LS4WEveYGKnYmuXK9ZmhIfby0REqOPfc677X9cuwXR6m
	KzY6z/KeLwHtnNPkse+38amM3nLUcVFKpbHyBG/A=
Message-ID: <505913DB.1060200@gibfest.dk>
Date: Wed, 19 Sep 2012 02:37:47 +0200
From: Thomas Steen Rasmussen <thomas@gibfest.dk>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:12.0) Gecko/20120604 Thunderbird/12.0.1
MIME-Version: 1.0
To: Glen Barber <gjb@FreeBSD.org>
References: <50438BF5.8030004@gibfest.dk>
	<CAPS9+SvR5APTZof99RZ=0_zqTFWLLLzefaTBG94R464wLdVe8Q@mail.gmail.com>
	<5043B0CB.8040907@gibfest.dk> <20120902193100.GG1266@glenbarber.us>
	<5043C9A5.5070409@FreeBSD.org>
	<20120902213425.GA1507@glenbarber.us>
	<5043D6E7.8090308@FreeBSD.org> <5043DA77.6000304@gibfest.dk>
	<20120903031502.GH1507@glenbarber.us>
In-Reply-To: <20120903031502.GH1507@glenbarber.us>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
Subject: Re: zfs send -r missing - but documented in zfs(8)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Sep 2012 00:37:58 -0000

On 03.09.2012 05:15, Glen Barber wrote:
>
> Right now, I'd say a FreeBSD PR is not necessary since this is an
> upstream bug as well.

Hello Glen and list,

A couple of weeks has passed with no news from upstream,
and since these things tend to take a bit of time, I believe it
would be best to remove references to -r until we actually
have code to support it.

I've opened up http://www.freebsd.org/cgi/query-pr.cgi?pr=171761:
misc/171761: Small patch to (temporarily) remove -r from zfs send usage 
and zfs(8)
with a patch to do just that.

Just an FYI :)


Best regards,

Thomas Steen Rasmussen

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 19 02:48:06 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CD2BC106564A;
	Wed, 19 Sep 2012 02:48:06 +0000 (UTC)
	(envelope-from asmrookie@gmail.com)
Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com
	[209.85.217.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 3B5ED8FC0C;
	Wed, 19 Sep 2012 02:48:05 +0000 (UTC)
Received: by lbbgg13 with SMTP id gg13so584307lbb.13
	for <multiple recipients>; Tue, 18 Sep 2012 19:48:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:reply-to:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:content-type;
	bh=FomcIhMz1egk12lxRcHB3pA+3pDeqlG3uJoGLlPwzkc=;
	b=XPOPKlTtbjzDdE/LMGiLiPVZW2Bx7+SlWwm2T/VyAa5Ph9iGiYLVBVtu/Iad4BHSkf
	bpxQeFx0WwvyRh4MkTA6IwPX+TIGCZrkphZXdLuWmRGYDRIypxlobzSrCIJGHLbM5fOb
	iM0CG1Et1RhDU/YmP8SQcM1cq58wpQk/Myb1ip8c9BD5sirZWuSBj6fgH/n17QenZU4q
	2PWvi/UXX4Yhco98LfEf8HTdvWgokph5rMyBwfOAvUAkB+VG7n09woiD8q8D6fPsvx+2
	rN/jPbCIZMzSX4kmQCJ9yJ4hpD6+vFQgBOYJeSz04cR0d9I+bY/eFU4W3S6nX9Js3bfj
	KFAA==
MIME-Version: 1.0
Received: by 10.152.48.70 with SMTP id j6mr1342704lan.57.1348022883954; Tue,
	18 Sep 2012 19:48:03 -0700 (PDT)
Sender: asmrookie@gmail.com
Received: by 10.112.102.39 with HTTP; Tue, 18 Sep 2012 19:48:03 -0700 (PDT)
In-Reply-To: <20120917140055.GA9037@x2.osted.lan>
References: <CAJ-FndCQ0YEo9_6x3g-12XEs8QmtyecwkLBX9z_sptnOUNTHrw@mail.gmail.com>
	<20120829060158.GA38721@x2.osted.lan>
	<CAJ-FndAaFv2o05MZZceT8Qr4mhPxuzrnmOZ30c3gy8=pnjjZvw@mail.gmail.com>
	<20120831052003.GA91340@x2.osted.lan>
	<CAJ-FndAaxQA8NYCFSN629XXi9zMVNyu2TuHjZLvmn3jhzRJb4w@mail.gmail.com>
	<CAJ-FndDdDVuwc=NgDeG7XiWW59-+Ls5wc2GBqbjLOLDUdUb9SA@mail.gmail.com>
	<20120905201531.GA54452@x2.osted.lan>
	<CAJ-FndCHSroZFfVgHAW8SUVZhDSaX9qix=aZnHVC_BN_fW6sgg@mail.gmail.com>
	<CAJ-FndDr5WmeKXCwSCucQ4w3hPHRBuu36YH1xiW_wKXOkKEdZg@mail.gmail.com>
	<CAJ-FndCvc+phY_g4CeGfzsj017roxs_C5adjuLuszpEPWO2+1g@mail.gmail.com>
	<20120917140055.GA9037@x2.osted.lan>
Date: Wed, 19 Sep 2012 03:48:03 +0100
X-Google-Sender-Auth: 9xVWoK3IL0t4skUVBDmQF81AApc
Message-ID: <CAJ-FndAP9Ua6tRcbrfYY1+56O-YbJvmyaUco9K42-0hmchKD6g@mail.gmail.com>
From: Attilio Rao <attilio@freebsd.org>
To: FreeBSD FS <freebsd-fs@freebsd.org>, freebsd-current@freebsd.org, 
	Peter Holm <pho@freebsd.org>,
	=?UTF-8?Q?Gustau_P=C3=A9rez?= <gperez@entel.upc.edu>, 
	George Neville-Neil <gnn@freebsd.org>, Florian Smeets <flo@freebsd.org>,
	bdrewery@freebsd.org
Content-Type: text/plain; charset=UTF-8
Cc: 
Subject: Re: MPSAFE VFS -- List of upcoming actions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: attilio@FreeBSD.org
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Sep 2012 02:48:06 -0000

On Fri, Jul 13, 2012 at 12:18 AM, Attilio Rao <attilio@freebsd.org> wrote:
> 2012/7/4 Attilio Rao <attilio@freebsd.org>:
>> 2012/6/29 Attilio Rao <attilio@freebsd.org>:
>>> As already published several times, according to the following plan:
>>> http://wiki.freebsd.org/NONMPSAFE_DEORBIT_VFS
>>>
>>
>> I still haven't heard from Vivien or Edward, anyway as NTFS is
>> basically only used RO these days (also the mount_ntfs code just
>> permits RO mounting) I stripped all the uncomplete/bogus write support
>> with the following patch:
>> http://www.freebsd.org/~attilio/ntfs_remove_write.patch
>>
>> This is an attempt to make the code smaller and possibly just focus on
>> the locking that really matter (as read-only filesystem).
>> On some points of the patch I'm a bit less sure as we could easily
>> take into account also write for things like vaccess() arguments, and
>> make easier to re-add correct write support at some point in the
>> future, but still force RO, even if the approach used in the patch is
>> more correct IMHO.
>> As an added bonus this patch cleans some dirty code in the mount
>> operation and fixes a bug as vfs_mountedfrom() is called before real
>> mounting is completed and can still fail.
>
> A quick update on this.
> It looks like NTFS won't be completed for this GSoC thus I seriously
> need to find an alternative to not loose the NTFS support entirely.
>
> I tried to look into the NTFS implementation right now and it is
> really a poor support. As Peter has also verified, it can deadlock in
> no-time, it compeltely violates VFS rules, etc. IMHO it deserves a
> complete rewrite if we would still support in-kernel NTFS. I also
> tried to look at the NetBSD implementation. Their code is someway
> similar to our, but they used very complicated (and very dirty) code
> to do the locking. Even if I don't know well enough NetBSD VFS, I have
> the impression not all the races are correctly handled. Definitively,
> not something I would like to port.
>
> Considering all that the only viable option would be meaning an
> userland filesystem implementation. My preferred choice would be to
> import PUFFS and librefuse on top of it but honestly it requires a lot
> of time to be completed, time which I don't currently have as in 2
> months Giant must be gone by the VFS.
>
> I then decided to switch to gnn's rewamp of FUSE patches. You can find
> his initial e-mail here:
> http://lists.freebsd.org/pipermail/freebsd-fs/2012-March/013876.html
>
> I've precisely got the second version of George's patch and created
> this dolphin branch:
> svn://svn.freebsd.org/base/projects/fuse
>
> I'm fixing low hanging fruit for the moment (see r238411 for example)
> and I still have to make a throughful review.
> However my idea is to commit the support once:
> - ntfs-3g is well stress-tested and proves to be bug-free
> - there is no major/big technical issue pending after the reviews

In the last weeks Peter, Florian, Gustau and I have been working in
stabilizing fuse support. In the specific, Peter has worked hard on
producing several utilities to nit stress-test fuse and in particular
ntfs, Florian has improved fuse related ports (as explained later) and
Gustau has done sparse testing. I feel moderately satisfied by the
level of stability of fuse now to propose to wider usage, in
particular given the huge amount of complaints I'm hearing around
about occasional fuse users.

The final target of the project is to completely import into base the
content of fusefs-kmod starting from earlier posted patches by George.
So far, we took care only of importing in the fuse branch the kernel
part, so that fusefs-kmod userland part is still needed to be
installed from ports, but I was studying the mount_fusefs licensing
before to process with the import for the userland bits of it.

The fixing has been happening here:
svn://svn.freebsd.org/base/projects/fuse/

which is essentially an HEAD branch + fuse kernel components. In order
to get fuse, please compile a kernel from this branch with FUSE option
or simply build and load fuse module.
Alternatively, a kernel patch that should work with HEAD@240684 is here:
http://www.freebsd.org/~attilio/fuse_import/fuse_240684.patch

I guess the patch can easilly apply to all FreeBSD branches, really,
but it is not tested to anything else different then -CURRENT.

As said you still need currently to build fusefs-kmod port. However
you need these further patches, to be put in the fusefs-kmod/files/
directory::
http://www.freebsd.org/~attilio/fuse_import/patch-Makefile
http://www.freebsd.org/~attilio/fuse_import/patch-mount_fusefs__mount_fusefs2.c

They both disable the old kernel building/linking and import new
functionality to let the new kernel support work well in presence of
many consumers.

In addition to fusefs-kmod, Bryan and Florian have also updated
fusefs-lib and fusefs-ntfs ports. For instance, please refer to this
e-mail:
http://lists.freebsd.org/pipermail/freebsd-ports/2012-August/077950.html

Even if this work is someway independent by the fusefs-kmod import, I
warmly suggest to all of you to use their patches (and this what we
have been testing so far too).

At this point what I'm looking for are reviews and further testing.
I would like to spend some words on what you should expect from this work:
*Fuse is far from being perfect*.
I cannot stress this enough. Peter stress-tests could break also Fuse
on Linux generally and by Fuse authors admissions the modules can
never guarantee to be completely starvation-free. However, they tend
to be designed in a way that sleeps can be at least interrupted
easily, making at least easy to recover from deadlocks. This is mostly
retained also in FreeBSD, for what I can tell. Also, sometimes fuse
seems to leave a small amount of hidden files, when it find references
on files it wants to delete. This happens also under Linux and it is
part of FUSE design, not much we can do.
However, if deadlocks can be someway tollerated, things you should
really pay attention are dumps of fuse modules (like ntfs-3g binary)
and kernel panics. They must not happen and if they do they need to be
fixed promptly.
However, the good new is that ntfs seems doing exceptionally good.
Florian could use ntfs as a backend for postgresql test. I think this
is by far a big improvement if compared to current in-kernel ntfs
which is completely torned.

So far we have almost entirely tested only ntfs-3g. I know Gustau also
used other modules like sshfs and George used GlusterFS with his older
patches, but I encourage you to test as many modules as you want, as
they may expose different bugs. Of course, I don't plan to spend much
more time on FUSE, but I can occasionally look at bugs as they fall in
the filesystems category and I'm always interested in keeping a good
open eye on such issues.

A few operational informations:
- In the next days I will import the userland bits of fusefs-kmod to
the fuse project branch making the port obsolete. When this happens I
will make this clear to the user of this thread.
- If no major bug is remained by the early October, I will commit this
to -CURRENT
- I expect Bryan and Florian to commit libfuse and ntfs updates soon.
They can do independently from the fusefs-kmod retiral, but I would
prefer their patches to go on first.
- After that I will handover fusefs maintainership to gnn as agreed in
precedence but I will be around helping with analysis and fixing,
depending on time availability

In the end I have really 2 minor questions:
- One is about importing the mount_fusefs userland bits. I don't think
we need a vendor import at all because they were developed by a
FreeBSD GSoC student and kept in his git repo (or someone else's).
Anyway, i'd just commit as new files once I do a good sweep. I hope
nobody objects to that.
- Another one is: fusefs-kmod right now is only amd64/i386 specific. I
have no idea why as it has not any MD specific code. However I'm sure
it has not been tested on other arches so far. Anyway I left it usable
by all the arches. I think this is the correct choice. If someone
objects with valid argument I can bring it back to be usable only on
i386 and amd64.

That's all, for any question please don't hesitate to contact me and
the other people involved in this work.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 19 03:47:38 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B5770106566C;
	Wed, 19 Sep 2012 03:47:38 +0000 (UTC)
	(envelope-from kob6558@gmail.com)
Received: from mail-wi0-f172.google.com (mail-wi0-f172.google.com
	[209.85.212.172])
	by mx1.freebsd.org (Postfix) with ESMTP id 24C498FC12;
	Wed, 19 Sep 2012 03:47:36 +0000 (UTC)
Received: by wibhi8 with SMTP id hi8so3896450wib.13
	for <multiple recipients>; Tue, 18 Sep 2012 20:47:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	bh=AIwLTTx0DDhUOJDKhVusSQJ4ffJAKBEZjAkeOSnIBJU=;
	b=j/9jxSo/I5kO8V5zsWPs2IyQuzOdklv+QXl4xEGqRHB/63lfuL3MW8KC31hxamVFHu
	yrCtyB8divYT3Oxo2qcHHDnofFnWUxbnl7qWzqqRbOpXWwN7auM4quP492WoWYtbb3LD
	/hH4CjmYW6FbqR9K1RqI1/FphJLjc75uytVSnJJAfabUuvjr7UWJr0khXKcyZb5wI162
	u9CkO0llpDwhN6medh8a9+GQxuxMjNRbF1BFq6vSyWK5beJIIdRs7oLK1uvbMO2S4UT+
	aBVqOqz8pGucNtLKs1/QC3L61aMhI94M+/CDjclQXtcuHcK/119w5pHDaKF0Qjndkg5Z
	Sefw==
MIME-Version: 1.0
Received: by 10.180.83.66 with SMTP id o2mr3680771wiy.14.1348026455765; Tue,
	18 Sep 2012 20:47:35 -0700 (PDT)
Received: by 10.223.151.130 with HTTP; Tue, 18 Sep 2012 20:47:35 -0700 (PDT)
In-Reply-To: <CAJ-FndAP9Ua6tRcbrfYY1+56O-YbJvmyaUco9K42-0hmchKD6g@mail.gmail.com>
References: <CAJ-FndCQ0YEo9_6x3g-12XEs8QmtyecwkLBX9z_sptnOUNTHrw@mail.gmail.com>
	<20120829060158.GA38721@x2.osted.lan>
	<CAJ-FndAaFv2o05MZZceT8Qr4mhPxuzrnmOZ30c3gy8=pnjjZvw@mail.gmail.com>
	<20120831052003.GA91340@x2.osted.lan>
	<CAJ-FndAaxQA8NYCFSN629XXi9zMVNyu2TuHjZLvmn3jhzRJb4w@mail.gmail.com>
	<CAJ-FndDdDVuwc=NgDeG7XiWW59-+Ls5wc2GBqbjLOLDUdUb9SA@mail.gmail.com>
	<20120905201531.GA54452@x2.osted.lan>
	<CAJ-FndCHSroZFfVgHAW8SUVZhDSaX9qix=aZnHVC_BN_fW6sgg@mail.gmail.com>
	<CAJ-FndDr5WmeKXCwSCucQ4w3hPHRBuu36YH1xiW_wKXOkKEdZg@mail.gmail.com>
	<CAJ-FndCvc+phY_g4CeGfzsj017roxs_C5adjuLuszpEPWO2+1g@mail.gmail.com>
	<20120917140055.GA9037@x2.osted.lan>
	<CAJ-FndAP9Ua6tRcbrfYY1+56O-YbJvmyaUco9K42-0hmchKD6g@mail.gmail.com>
Date: Tue, 18 Sep 2012 20:47:35 -0700
Message-ID: <CAN6yY1tjHFEopgJ+cAfQ5ES5Q4NnOg1tYA81==9uzPQgpQVDzA@mail.gmail.com>
From: Kevin Oberman <kob6558@gmail.com>
To: attilio@freebsd.org
Content-Type: text/plain; charset=UTF-8
Cc: Peter Holm <pho@freebsd.org>, bdrewery@freebsd.org,
	FreeBSD FS <freebsd-fs@freebsd.org>, freebsd-current@freebsd.org
Subject: Re: MPSAFE VFS -- List of upcoming actions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Sep 2012 03:47:38 -0000

On Tue, Sep 18, 2012 at 7:48 PM, Attilio Rao <attilio@freebsd.org> wrote:
> On Fri, Jul 13, 2012 at 12:18 AM, Attilio Rao <attilio@freebsd.org> wrote:
>> 2012/7/4 Attilio Rao <attilio@freebsd.org>:
>>> 2012/6/29 Attilio Rao <attilio@freebsd.org>:
>>>> As already published several times, according to the following plan:
>>>> http://wiki.freebsd.org/NONMPSAFE_DEORBIT_VFS
>>>>
>>>
>>> I still haven't heard from Vivien or Edward, anyway as NTFS is
>>> basically only used RO these days (also the mount_ntfs code just
>>> permits RO mounting) I stripped all the uncomplete/bogus write support
>>> with the following patch:
>>> http://www.freebsd.org/~attilio/ntfs_remove_write.patch
>>>
>>> This is an attempt to make the code smaller and possibly just focus on
>>> the locking that really matter (as read-only filesystem).
>>> On some points of the patch I'm a bit less sure as we could easily
>>> take into account also write for things like vaccess() arguments, and
>>> make easier to re-add correct write support at some point in the
>>> future, but still force RO, even if the approach used in the patch is
>>> more correct IMHO.
>>> As an added bonus this patch cleans some dirty code in the mount
>>> operation and fixes a bug as vfs_mountedfrom() is called before real
>>> mounting is completed and can still fail.
>>
>> A quick update on this.
>> It looks like NTFS won't be completed for this GSoC thus I seriously
>> need to find an alternative to not loose the NTFS support entirely.
>>
>> I tried to look into the NTFS implementation right now and it is
>> really a poor support. As Peter has also verified, it can deadlock in
>> no-time, it compeltely violates VFS rules, etc. IMHO it deserves a
>> complete rewrite if we would still support in-kernel NTFS. I also
>> tried to look at the NetBSD implementation. Their code is someway
>> similar to our, but they used very complicated (and very dirty) code
>> to do the locking. Even if I don't know well enough NetBSD VFS, I have
>> the impression not all the races are correctly handled. Definitively,
>> not something I would like to port.
>>
>> Considering all that the only viable option would be meaning an
>> userland filesystem implementation. My preferred choice would be to
>> import PUFFS and librefuse on top of it but honestly it requires a lot
>> of time to be completed, time which I don't currently have as in 2
>> months Giant must be gone by the VFS.
>>
>> I then decided to switch to gnn's rewamp of FUSE patches. You can find
>> his initial e-mail here:
>> http://lists.freebsd.org/pipermail/freebsd-fs/2012-March/013876.html
>>
>> I've precisely got the second version of George's patch and created
>> this dolphin branch:
>> svn://svn.freebsd.org/base/projects/fuse
>>
>> I'm fixing low hanging fruit for the moment (see r238411 for example)
>> and I still have to make a throughful review.
>> However my idea is to commit the support once:
>> - ntfs-3g is well stress-tested and proves to be bug-free
>> - there is no major/big technical issue pending after the reviews
>
> In the last weeks Peter, Florian, Gustau and I have been working in
> stabilizing fuse support. In the specific, Peter has worked hard on
> producing several utilities to nit stress-test fuse and in particular
> ntfs, Florian has improved fuse related ports (as explained later) and
> Gustau has done sparse testing. I feel moderately satisfied by the
> level of stability of fuse now to propose to wider usage, in
> particular given the huge amount of complaints I'm hearing around
> about occasional fuse users.
>
> The final target of the project is to completely import into base the
> content of fusefs-kmod starting from earlier posted patches by George.
> So far, we took care only of importing in the fuse branch the kernel
> part, so that fusefs-kmod userland part is still needed to be
> installed from ports, but I was studying the mount_fusefs licensing
> before to process with the import for the userland bits of it.
>
> The fixing has been happening here:
> svn://svn.freebsd.org/base/projects/fuse/
>
> which is essentially an HEAD branch + fuse kernel components. In order
> to get fuse, please compile a kernel from this branch with FUSE option
> or simply build and load fuse module.
> Alternatively, a kernel patch that should work with HEAD@240684 is here:
> http://www.freebsd.org/~attilio/fuse_import/fuse_240684.patch
>
> I guess the patch can easilly apply to all FreeBSD branches, really,
> but it is not tested to anything else different then -CURRENT.
>
> As said you still need currently to build fusefs-kmod port. However
> you need these further patches, to be put in the fusefs-kmod/files/
> directory::
> http://www.freebsd.org/~attilio/fuse_import/patch-Makefile
> http://www.freebsd.org/~attilio/fuse_import/patch-mount_fusefs__mount_fusefs2.c
>
> They both disable the old kernel building/linking and import new
> functionality to let the new kernel support work well in presence of
> many consumers.
>
> In addition to fusefs-kmod, Bryan and Florian have also updated
> fusefs-lib and fusefs-ntfs ports. For instance, please refer to this
> e-mail:
> http://lists.freebsd.org/pipermail/freebsd-ports/2012-August/077950.html
>
> Even if this work is someway independent by the fusefs-kmod import, I
> warmly suggest to all of you to use their patches (and this what we
> have been testing so far too).
>
> At this point what I'm looking for are reviews and further testing.
> I would like to spend some words on what you should expect from this work:
> *Fuse is far from being perfect*.
> I cannot stress this enough. Peter stress-tests could break also Fuse
> on Linux generally and by Fuse authors admissions the modules can
> never guarantee to be completely starvation-free. However, they tend
> to be designed in a way that sleeps can be at least interrupted
> easily, making at least easy to recover from deadlocks. This is mostly
> retained also in FreeBSD, for what I can tell. Also, sometimes fuse
> seems to leave a small amount of hidden files, when it find references
> on files it wants to delete. This happens also under Linux and it is
> part of FUSE design, not much we can do.
> However, if deadlocks can be someway tollerated, things you should
> really pay attention are dumps of fuse modules (like ntfs-3g binary)
> and kernel panics. They must not happen and if they do they need to be
> fixed promptly.
> However, the good new is that ntfs seems doing exceptionally good.
> Florian could use ntfs as a backend for postgresql test. I think this
> is by far a big improvement if compared to current in-kernel ntfs
> which is completely torned.
>
> So far we have almost entirely tested only ntfs-3g. I know Gustau also
> used other modules like sshfs and George used GlusterFS with his older
> patches, but I encourage you to test as many modules as you want, as
> they may expose different bugs. Of course, I don't plan to spend much
> more time on FUSE, but I can occasionally look at bugs as they fall in
> the filesystems category and I'm always interested in keeping a good
> open eye on such issues.
>
> A few operational informations:
> - In the next days I will import the userland bits of fusefs-kmod to
> the fuse project branch making the port obsolete. When this happens I
> will make this clear to the user of this thread.
> - If no major bug is remained by the early October, I will commit this
> to -CURRENT
> - I expect Bryan and Florian to commit libfuse and ntfs updates soon.
> They can do independently from the fusefs-kmod retiral, but I would
> prefer their patches to go on first.
> - After that I will handover fusefs maintainership to gnn as agreed in
> precedence but I will be around helping with analysis and fixing,
> depending on time availability
>
> In the end I have really 2 minor questions:
> - One is about importing the mount_fusefs userland bits. I don't think
> we need a vendor import at all because they were developed by a
> FreeBSD GSoC student and kept in his git repo (or someone else's).
> Anyway, i'd just commit as new files once I do a good sweep. I hope
> nobody objects to that.
> - Another one is: fusefs-kmod right now is only amd64/i386 specific. I
> have no idea why as it has not any MD specific code. However I'm sure
> it has not been tested on other arches so far. Anyway I left it usable
> by all the arches. I think this is the correct choice. If someone
> objects with valid argument I can bring it back to be usable only on
> i386 and amd64.
>
> That's all, for any question please don't hesitate to contact me and
> the other people involved in this work.

Attilio (and the crew),

Thanks for working on fusefs-ntfs. It's been increasingly worrying to
me that we might lose it and I really depend on it. I really hope to
be able to use rsync to update files without killing my system some
day.

I tried the new fusefs-libs and fusefs-ntfs ports from Florian and
Bryan, but ran into trouble as I could no longer build the kmod after
installing the updated fusefs-libs. It had an unresolved symbol:
cc -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE
-nostdinc  -I../include -I. -I@ -I@/contrib/altq -finline-limit=8000
--param inline-unit-growth=100 --param large-function-growth=1000
-fno-common  -fno-omit-frame-pointer  -mcmodel=kernel -mno-red-zone
-mno-mmx -mno-sse -msoft-float  -fno-asynchronous-unwind-tables
-ffreestanding -fstack-protector -std=iso9899:1999 -fstack-protector
-Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes
-Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual  -Wundef
-Wno-pointer-sign -fformat-extensions  -Wmissing-include-dirs
-fdiagnostics-show-option   -c fuse_vnops.c
fuse_vnops.c: In function 'create_filehandle':
fuse_vnops.c:1586: error: 'struct fuse_open_in' has no member named 'mode'
*** [fuse_vnops.o] Error code 1

This was on amd64 9-Stable r239879 until/unless this issue is
resolved, please keep the existing port available and/or mark the new
one to not install on pre-10 systems.
-- 
R. Kevin Oberman, Network Engineer
E-mail: kob6558@gmail.com

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 19 06:17:05 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 63988106566C
	for <freebsd-fs@freebsd.org>; Wed, 19 Sep 2012 06:17:05 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id D676F8FC15
	for <freebsd-fs@freebsd.org>; Wed, 19 Sep 2012 06:17:04 +0000 (UTC)
Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q8J6HCUA003288;
	Wed, 19 Sep 2012 09:17:12 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id
	q8J6H0sA042206; Wed, 19 Sep 2012 09:17:00 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q8J6Gxrf042205; 
	Wed, 19 Sep 2012 09:16:59 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Wed, 19 Sep 2012 09:16:59 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Message-ID: <20120919061659.GS37286@deviant.kiev.zoral.com.ua>
References: <20120918085941.GZ37286@deviant.kiev.zoral.com.ua>
	<21418398.765673.1347975294365.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="2CmXnuBWhlSJqbcw"
Content-Disposition: inline
In-Reply-To: <21418398.765673.1347975294365.JavaMail.root@erie.cs.uoguelph.ca>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: FS List <freebsd-fs@freebsd.org>
Subject: Re: testing/review of atomic export update patch
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Sep 2012 06:17:05 -0000


--2CmXnuBWhlSJqbcw
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, Sep 18, 2012 at 09:34:54AM -0400, Rick Macklem wrote:
> Konstantin Belousov wrote:
> > On Mon, Sep 17, 2012 at 05:32:44PM -0400, Rick Macklem wrote:
> > > Konstantin Belousov wrote:
> > > > On Sun, Sep 16, 2012 at 05:41:25PM -0400, Rick Macklem wrote:
> > > > > Hi,
> > > > >
> > > > > There is a simple patch at:
> > > > >   http://people.freebsd.org/~rmacklem/atomic-export.patch
> > > > > that can be applied to a kernel + mountd, so that the new
> > > > > nfsd can be suspended by mountd while the exports are being
> > > > > reloaded. It adds a new "-S" flag to mountd to enable this.
> > > > > (This avoids the long standing bug where clients receive ESTALE
> > > > >  replies to RPCs while mountd is reloading exports.)
> > > >
> > > > This looks simple, but also somewhat worrisome. What would happen
> > > > if the mountd crashes after nfsd suspension is requested, but
> > > > before
> > > > resume was performed ?
> > > >
> > > > Might be, mountd should check for suspended nfsd on start and
> > > > unsuspend
> > > > it, if some flag is specified ?
> > > Well, I think that happens with the patch as it stands.
> > >
> > > suspend is done if the "-S" option is specified, but that is a no op
> > > if it is already suspended. The resume is done no matter what flags
> > > are provided, so mountd will always try and do a "resume".
> > > --> get_exportlist() is always called when mountd is started up and
> > >     it does the resume unconditionally when it completes.
> > >     If mountd repeatedly crashes before completing get_exportlist()
> > >     when it is started up, the exports will be all messed up, so
> > >     having the nfsd threads suspended doesn't seem so bad for this
> > >     case (which hopefully never happens;-).
> > >
> > > Both suspend and resume are just no ops for unpatched kernels.
> > >
> > > Maybe the comment in front of "resume" should explicitly explain
> > > this, instead of saying resume is harmless to do under all
> > > conditions?
> > >
> > > Thanks for looking at it, rick
> > I see.
> >=20
> > My another note is that there is no any protection against parallel
> > instances of suspend/resume happen. For instance, one thread could set
> > suspend_nfsd =3D 1 and be descheduled, while another executes resume
> > code sequence meantime. Then it would see suspend_nfsd !=3D 0, while
> > nfsv4rootfs_lock not held, and tries to unlock it. It seems that
> > nfsv4_unlock would silently exit. The suspending thread resumes,
> > and obtains the lock. You end up with suspend_nfsd =3D=3D 0 but lock he=
ld.
> Yes. I had assumed that mountd would be the only thing using these syscal=
ls
> and it is single threaded. (The syscalls can only be done by root for the
> obvious reasons.;-)
>=20
> Maybe the following untested version of the syscalls would be better, sin=
ce
> they would allow multiple concurrent calls to either suspend or resume.
> (There would still be an indeterminate case if one thread called resume
>  concurrently with another few calling suspend, but that is unavoidable,
>  I think?)
>=20
> Again, thanks for the comments, rick
> --- untested version of syscalls ---
> 	} else if ((uap->flag & NFSSVC_SUSPENDNFSD) !=3D 0) {
> 		NFSLOCKV4ROOTMUTEX();
> 		if (suspend_nfsd =3D=3D 0) {
> 			/* Lock out all nfsd threads */
> 			igotlock =3D 0;
> 			while (igotlock =3D=3D 0 && suspend_nfsd =3D=3D 0) {
> 				igotlock =3D nfsv4_lock(&nfsv4rootfs_lock, 1,
> 				    NULL, NFSV4ROOTLOCKMUTEXPTR, NULL);
> 			}
> 			suspend_nfsd =3D 1;
> 		}
> 		NFSUNLOCKV4ROOTMUTEX();
> 		error =3D 0;
> 	} else if ((uap->flag & NFSSVC_RESUMENFSD) !=3D 0) {
> 		NFSLOCKV4ROOTMUTEX();
> 		if (suspend_nfsd !=3D 0) {
> 			nfsv4_unlock(&nfsv4rootfs_lock, 0);
> 			suspend_nfsd =3D 0;
> 		}
> 		NFSUNLOCKV4ROOTMUTEX();
> 		error =3D 0;
> 	}

=46rom the cursory look, this variant is an improvement, mostly by taking
the interlock before testing suspend_nfsd, and using the while loop.

Is it possible to also make the sleep for the lock interruptible ?
So that blocked mountd could be killed by a signal ?

--2CmXnuBWhlSJqbcw
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAlBZY1sACgkQC3+MBN1Mb4jzjgCfVE5TsFuaN7NItix9xLNCMjam
eKkAn2IsumdW+ckxb4xAGXZorptD5njG
=J1Hs
-----END PGP SIGNATURE-----

--2CmXnuBWhlSJqbcw--

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 19 07:30:14 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 1CFF1106564A;
	Wed, 19 Sep 2012 07:30:14 +0000 (UTC)
	(envelope-from asmrookie@gmail.com)
Received: from mail-lpp01m010-f54.google.com (mail-lpp01m010-f54.google.com
	[209.85.215.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 755DD8FC0C;
	Wed, 19 Sep 2012 07:30:12 +0000 (UTC)
Received: by lahe6 with SMTP id e6so410484lah.13
	for <multiple recipients>; Wed, 19 Sep 2012 00:30:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:reply-to:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type;
	bh=/L2gqn9cpJ29eGUEFoYpM+7cUjSo/qtgHi48XJNYoIw=;
	b=ybHEeBhUGiL9VFpmOGdxlL5HLbf0nTQbDflRirEJJ+2ZdB29FRS2YgKYUDW2m1PGvX
	L4E+jWQ0cn19DdrXzwu0wLW3bXM+KbCP0XNYIHWFDFtMNd1GrvemWeptSs86B0MrObzG
	JnWzfi/1VyNx7Hkm5e7v3SCpSa5g2iNfPCW86+7E4Dkktitt/FyJDcSqQiD3YD/OSr6w
	hRlnVs+rAZs38RZ7ZCR3tsM5PrI6UZtOMTVcCBOJ8ao55io8/ecujpb8aQ1zQQtG4SYt
	nN4q1Kr/G6EBWgaDuwufEHC99VFaRq4zCZtsoDODfXSQxNI2lVM2EV70PYef+BvcvpfP
	fcDg==
MIME-Version: 1.0
Received: by 10.152.131.68 with SMTP id ok4mr1888249lab.47.1348039811094; Wed,
	19 Sep 2012 00:30:11 -0700 (PDT)
Sender: asmrookie@gmail.com
Received: by 10.112.102.39 with HTTP; Wed, 19 Sep 2012 00:30:11 -0700 (PDT)
In-Reply-To: <CAN6yY1tjHFEopgJ+cAfQ5ES5Q4NnOg1tYA81==9uzPQgpQVDzA@mail.gmail.com>
References: <CAJ-FndCQ0YEo9_6x3g-12XEs8QmtyecwkLBX9z_sptnOUNTHrw@mail.gmail.com>
	<20120829060158.GA38721@x2.osted.lan>
	<CAJ-FndAaFv2o05MZZceT8Qr4mhPxuzrnmOZ30c3gy8=pnjjZvw@mail.gmail.com>
	<20120831052003.GA91340@x2.osted.lan>
	<CAJ-FndAaxQA8NYCFSN629XXi9zMVNyu2TuHjZLvmn3jhzRJb4w@mail.gmail.com>
	<CAJ-FndDdDVuwc=NgDeG7XiWW59-+Ls5wc2GBqbjLOLDUdUb9SA@mail.gmail.com>
	<20120905201531.GA54452@x2.osted.lan>
	<CAJ-FndCHSroZFfVgHAW8SUVZhDSaX9qix=aZnHVC_BN_fW6sgg@mail.gmail.com>
	<CAJ-FndDr5WmeKXCwSCucQ4w3hPHRBuu36YH1xiW_wKXOkKEdZg@mail.gmail.com>
	<CAJ-FndCvc+phY_g4CeGfzsj017roxs_C5adjuLuszpEPWO2+1g@mail.gmail.com>
	<20120917140055.GA9037@x2.osted.lan>
	<CAJ-FndAP9Ua6tRcbrfYY1+56O-YbJvmyaUco9K42-0hmchKD6g@mail.gmail.com>
	<CAN6yY1tjHFEopgJ+cAfQ5ES5Q4NnOg1tYA81==9uzPQgpQVDzA@mail.gmail.com>
Date: Wed, 19 Sep 2012 08:30:11 +0100
X-Google-Sender-Auth: E7omFsgrPjQO55RBiG5X8WqrrcY
Message-ID: <CAJ-FndBDb_OdcdkW5YYs_X_+YdZc8N+krvK2pPzdnYa+3nYEDg@mail.gmail.com>
From: Attilio Rao <attilio@freebsd.org>
To: Kevin Oberman <kob6558@gmail.com>
Content-Type: text/plain; charset=UTF-8
Cc: Peter Holm <pho@freebsd.org>, bdrewery@freebsd.org,
	FreeBSD FS <freebsd-fs@freebsd.org>, freebsd-current@freebsd.org
Subject: Re: MPSAFE VFS -- List of upcoming actions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: attilio@FreeBSD.org
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Sep 2012 07:30:14 -0000

On Wed, Sep 19, 2012 at 4:47 AM, Kevin Oberman <kob6558@gmail.com> wrote:
> On Tue, Sep 18, 2012 at 7:48 PM, Attilio Rao <attilio@freebsd.org> wrote:
>> On Fri, Jul 13, 2012 at 12:18 AM, Attilio Rao <attilio@freebsd.org> wrote:
>>> 2012/7/4 Attilio Rao <attilio@freebsd.org>:
>>>> 2012/6/29 Attilio Rao <attilio@freebsd.org>:
>>>>> As already published several times, according to the following plan:
>>>>> http://wiki.freebsd.org/NONMPSAFE_DEORBIT_VFS
>>>>>
>>>>
>>>> I still haven't heard from Vivien or Edward, anyway as NTFS is
>>>> basically only used RO these days (also the mount_ntfs code just
>>>> permits RO mounting) I stripped all the uncomplete/bogus write support
>>>> with the following patch:
>>>> http://www.freebsd.org/~attilio/ntfs_remove_write.patch
>>>>
>>>> This is an attempt to make the code smaller and possibly just focus on
>>>> the locking that really matter (as read-only filesystem).
>>>> On some points of the patch I'm a bit less sure as we could easily
>>>> take into account also write for things like vaccess() arguments, and
>>>> make easier to re-add correct write support at some point in the
>>>> future, but still force RO, even if the approach used in the patch is
>>>> more correct IMHO.
>>>> As an added bonus this patch cleans some dirty code in the mount
>>>> operation and fixes a bug as vfs_mountedfrom() is called before real
>>>> mounting is completed and can still fail.
>>>
>>> A quick update on this.
>>> It looks like NTFS won't be completed for this GSoC thus I seriously
>>> need to find an alternative to not loose the NTFS support entirely.
>>>
>>> I tried to look into the NTFS implementation right now and it is
>>> really a poor support. As Peter has also verified, it can deadlock in
>>> no-time, it compeltely violates VFS rules, etc. IMHO it deserves a
>>> complete rewrite if we would still support in-kernel NTFS. I also
>>> tried to look at the NetBSD implementation. Their code is someway
>>> similar to our, but they used very complicated (and very dirty) code
>>> to do the locking. Even if I don't know well enough NetBSD VFS, I have
>>> the impression not all the races are correctly handled. Definitively,
>>> not something I would like to port.
>>>
>>> Considering all that the only viable option would be meaning an
>>> userland filesystem implementation. My preferred choice would be to
>>> import PUFFS and librefuse on top of it but honestly it requires a lot
>>> of time to be completed, time which I don't currently have as in 2
>>> months Giant must be gone by the VFS.
>>>
>>> I then decided to switch to gnn's rewamp of FUSE patches. You can find
>>> his initial e-mail here:
>>> http://lists.freebsd.org/pipermail/freebsd-fs/2012-March/013876.html
>>>
>>> I've precisely got the second version of George's patch and created
>>> this dolphin branch:
>>> svn://svn.freebsd.org/base/projects/fuse
>>>
>>> I'm fixing low hanging fruit for the moment (see r238411 for example)
>>> and I still have to make a throughful review.
>>> However my idea is to commit the support once:
>>> - ntfs-3g is well stress-tested and proves to be bug-free
>>> - there is no major/big technical issue pending after the reviews
>>
>> In the last weeks Peter, Florian, Gustau and I have been working in
>> stabilizing fuse support. In the specific, Peter has worked hard on
>> producing several utilities to nit stress-test fuse and in particular
>> ntfs, Florian has improved fuse related ports (as explained later) and
>> Gustau has done sparse testing. I feel moderately satisfied by the
>> level of stability of fuse now to propose to wider usage, in
>> particular given the huge amount of complaints I'm hearing around
>> about occasional fuse users.
>>
>> The final target of the project is to completely import into base the
>> content of fusefs-kmod starting from earlier posted patches by George.
>> So far, we took care only of importing in the fuse branch the kernel
>> part, so that fusefs-kmod userland part is still needed to be
>> installed from ports, but I was studying the mount_fusefs licensing
>> before to process with the import for the userland bits of it.
>>
>> The fixing has been happening here:
>> svn://svn.freebsd.org/base/projects/fuse/
>>
>> which is essentially an HEAD branch + fuse kernel components. In order
>> to get fuse, please compile a kernel from this branch with FUSE option
>> or simply build and load fuse module.
>> Alternatively, a kernel patch that should work with HEAD@240684 is here:
>> http://www.freebsd.org/~attilio/fuse_import/fuse_240684.patch
>>
>> I guess the patch can easilly apply to all FreeBSD branches, really,
>> but it is not tested to anything else different then -CURRENT.
>>
>> As said you still need currently to build fusefs-kmod port. However
>> you need these further patches, to be put in the fusefs-kmod/files/
>> directory::
>> http://www.freebsd.org/~attilio/fuse_import/patch-Makefile
>> http://www.freebsd.org/~attilio/fuse_import/patch-mount_fusefs__mount_fusefs2.c
>>
>> They both disable the old kernel building/linking and import new
>> functionality to let the new kernel support work well in presence of
>> many consumers.
>>
>> In addition to fusefs-kmod, Bryan and Florian have also updated
>> fusefs-lib and fusefs-ntfs ports. For instance, please refer to this
>> e-mail:
>> http://lists.freebsd.org/pipermail/freebsd-ports/2012-August/077950.html
>>
>> Even if this work is someway independent by the fusefs-kmod import, I
>> warmly suggest to all of you to use their patches (and this what we
>> have been testing so far too).
>>
>> At this point what I'm looking for are reviews and further testing.
>> I would like to spend some words on what you should expect from this work:
>> *Fuse is far from being perfect*.
>> I cannot stress this enough. Peter stress-tests could break also Fuse
>> on Linux generally and by Fuse authors admissions the modules can
>> never guarantee to be completely starvation-free. However, they tend
>> to be designed in a way that sleeps can be at least interrupted
>> easily, making at least easy to recover from deadlocks. This is mostly
>> retained also in FreeBSD, for what I can tell. Also, sometimes fuse
>> seems to leave a small amount of hidden files, when it find references
>> on files it wants to delete. This happens also under Linux and it is
>> part of FUSE design, not much we can do.
>> However, if deadlocks can be someway tollerated, things you should
>> really pay attention are dumps of fuse modules (like ntfs-3g binary)
>> and kernel panics. They must not happen and if they do they need to be
>> fixed promptly.
>> However, the good new is that ntfs seems doing exceptionally good.
>> Florian could use ntfs as a backend for postgresql test. I think this
>> is by far a big improvement if compared to current in-kernel ntfs
>> which is completely torned.
>>
>> So far we have almost entirely tested only ntfs-3g. I know Gustau also
>> used other modules like sshfs and George used GlusterFS with his older
>> patches, but I encourage you to test as many modules as you want, as
>> they may expose different bugs. Of course, I don't plan to spend much
>> more time on FUSE, but I can occasionally look at bugs as they fall in
>> the filesystems category and I'm always interested in keeping a good
>> open eye on such issues.
>>
>> A few operational informations:
>> - In the next days I will import the userland bits of fusefs-kmod to
>> the fuse project branch making the port obsolete. When this happens I
>> will make this clear to the user of this thread.
>> - If no major bug is remained by the early October, I will commit this
>> to -CURRENT
>> - I expect Bryan and Florian to commit libfuse and ntfs updates soon.
>> They can do independently from the fusefs-kmod retiral, but I would
>> prefer their patches to go on first.
>> - After that I will handover fusefs maintainership to gnn as agreed in
>> precedence but I will be around helping with analysis and fixing,
>> depending on time availability
>>
>> In the end I have really 2 minor questions:
>> - One is about importing the mount_fusefs userland bits. I don't think
>> we need a vendor import at all because they were developed by a
>> FreeBSD GSoC student and kept in his git repo (or someone else's).
>> Anyway, i'd just commit as new files once I do a good sweep. I hope
>> nobody objects to that.
>> - Another one is: fusefs-kmod right now is only amd64/i386 specific. I
>> have no idea why as it has not any MD specific code. However I'm sure
>> it has not been tested on other arches so far. Anyway I left it usable
>> by all the arches. I think this is the correct choice. If someone
>> objects with valid argument I can bring it back to be usable only on
>> i386 and amd64.
>>
>> That's all, for any question please don't hesitate to contact me and
>> the other people involved in this work.
>
> Attilio (and the crew),
>
> Thanks for working on fusefs-ntfs. It's been increasingly worrying to
> me that we might lose it and I really depend on it. I really hope to
> be able to use rsync to update files without killing my system some
> day.
>
> I tried the new fusefs-libs and fusefs-ntfs ports from Florian and
> Bryan, but ran into trouble as I could no longer build the kmod after
> installing the updated fusefs-libs. It had an unresolved symbol:
> cc -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE
> -nostdinc  -I../include -I. -I@ -I@/contrib/altq -finline-limit=8000
> --param inline-unit-growth=100 --param large-function-growth=1000
> -fno-common  -fno-omit-frame-pointer  -mcmodel=kernel -mno-red-zone
> -mno-mmx -mno-sse -msoft-float  -fno-asynchronous-unwind-tables
> -ffreestanding -fstack-protector -std=iso9899:1999 -fstack-protector
> -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes
> -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual  -Wundef
> -Wno-pointer-sign -fformat-extensions  -Wmissing-include-dirs
> -fdiagnostics-show-option   -c fuse_vnops.c
> fuse_vnops.c: In function 'create_filehandle':
> fuse_vnops.c:1586: error: 'struct fuse_open_in' has no member named 'mode'
> *** [fuse_vnops.o] Error code 1
>
> This was on amd64 9-Stable r239879 until/unless this issue is
> resolved, please keep the existing port available and/or mark the new
> one to not install on pre-10 systems.

If you follow the rule I described in this e-mail, the fusefs-kmod
kernel part won't be build anymore, so you won't run into this.
If it is build yet, please let me know because there is a bug in the 2
patches I posted for fusefs-kmod port.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 19 18:35:23 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 584BD106566B
	for <fs@freebsd.org>; Wed, 19 Sep 2012 18:35:23 +0000 (UTC)
	(envelope-from dg@pki2.com)
Received: from btw.pki2.com (btw.pki2.com [IPv6:2001:470:a:6fd::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 0F0A18FC18
	for <fs@freebsd.org>; Wed, 19 Sep 2012 18:35:23 +0000 (UTC)
Received: from btw.pki2.com (btw.pki2.com [192.168.23.1])
	by btw.pki2.com (8.14.5/8.14.5) with ESMTP id q8JIZFBX046148
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT)
	for <fs@freebsd.org>; Wed, 19 Sep 2012 11:35:17 -0700 (PDT)
	(envelope-from dg@pki2.com)
Date: Wed, 19 Sep 2012 11:35:15 -0700 (PDT)
From: Dennis Glatting <dg@pki2.com>
X-X-Sender: dennisg@btw.pki2.com
To: fs@freebsd.org
Message-ID: <alpine.BSF.2.00.1209191133270.44545@btw.pki2.com>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII
X-yoursite-MailScanner-Information: Dennis Glatting
X-yoursite-MailScanner-ID: q8JIZFBX046148
X-yoursite-MailScanner: Found to be clean
X-MailScanner-From: dg@pki2.com
Cc: 
Subject: How to recover from theis ZFS error?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Sep 2012 18:35:23 -0000


One of my pools (disk-1) with 12T of data is reporting this error after a 
scrub. Is there a way to fix this error without backing up and restoring 
12T of data?


errors: Permanent errors have been detected in the following files:

         <metadata>:<0x0>
         disk-1:<0x0>


From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 19 19:03:55 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 56F90106564A;
	Wed, 19 Sep 2012 19:03:55 +0000 (UTC)
	(envelope-from kob6558@gmail.com)
Received: from mail-we0-f182.google.com (mail-we0-f182.google.com
	[74.125.82.182])
	by mx1.freebsd.org (Postfix) with ESMTP id D1F218FC15;
	Wed, 19 Sep 2012 19:03:53 +0000 (UTC)
Received: by weyx56 with SMTP id x56so932319wey.13
	for <multiple recipients>; Wed, 19 Sep 2012 12:03:52 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	bh=od0OjYD8RVH6/r4g3UQYYDtTt7moal/DORSpoeU95UQ=;
	b=E45NE5FYbL6rIJOavSzeiYeyxoRNCUKtkRKC6katc8e2DMRUfBAXatvI/SPXp04D6o
	lZ/5NUWhcKpZPuk75S1QjfYgwwWVzcsQEiee6HlY1DrhcUbaqb+QsAZQjgdmIQh7KlWZ
	IuWwbZ0FZXt8Mw1vgFc23RDp2qnIV8MwdRvRbWm008lnZ3sw/zJduEKylQDBxz3LFzue
	P7onxJriYJzpS1afXCizCJFmABO1pyxTtH1g0+rU5jkNRIVsLLj/JWSx2+EWdCbEfXKE
	T7r9e5bchIXeOyWHasYikM61U8c4YzUnspUJnwwJHXphQM1TSSoA1RDG8QCuU2KS8ghi
	S1ug==
MIME-Version: 1.0
Received: by 10.180.107.103 with SMTP id hb7mr8553096wib.3.1348081432502; Wed,
	19 Sep 2012 12:03:52 -0700 (PDT)
Received: by 10.223.66.194 with HTTP; Wed, 19 Sep 2012 12:03:52 -0700 (PDT)
In-Reply-To: <CAJ-FndBDb_OdcdkW5YYs_X_+YdZc8N+krvK2pPzdnYa+3nYEDg@mail.gmail.com>
References: <CAJ-FndCQ0YEo9_6x3g-12XEs8QmtyecwkLBX9z_sptnOUNTHrw@mail.gmail.com>
	<20120829060158.GA38721@x2.osted.lan>
	<CAJ-FndAaFv2o05MZZceT8Qr4mhPxuzrnmOZ30c3gy8=pnjjZvw@mail.gmail.com>
	<20120831052003.GA91340@x2.osted.lan>
	<CAJ-FndAaxQA8NYCFSN629XXi9zMVNyu2TuHjZLvmn3jhzRJb4w@mail.gmail.com>
	<CAJ-FndDdDVuwc=NgDeG7XiWW59-+Ls5wc2GBqbjLOLDUdUb9SA@mail.gmail.com>
	<20120905201531.GA54452@x2.osted.lan>
	<CAJ-FndCHSroZFfVgHAW8SUVZhDSaX9qix=aZnHVC_BN_fW6sgg@mail.gmail.com>
	<CAJ-FndDr5WmeKXCwSCucQ4w3hPHRBuu36YH1xiW_wKXOkKEdZg@mail.gmail.com>
	<CAJ-FndCvc+phY_g4CeGfzsj017roxs_C5adjuLuszpEPWO2+1g@mail.gmail.com>
	<20120917140055.GA9037@x2.osted.lan>
	<CAJ-FndAP9Ua6tRcbrfYY1+56O-YbJvmyaUco9K42-0hmchKD6g@mail.gmail.com>
	<CAN6yY1tjHFEopgJ+cAfQ5ES5Q4NnOg1tYA81==9uzPQgpQVDzA@mail.gmail.com>
	<CAJ-FndBDb_OdcdkW5YYs_X_+YdZc8N+krvK2pPzdnYa+3nYEDg@mail.gmail.com>
Date: Wed, 19 Sep 2012 12:03:52 -0700
Message-ID: <CAN6yY1urtKmTZ3Sk5H+b5SSZrYKqkpMZ_nvQPVLiN+c51-i_ng@mail.gmail.com>
From: Kevin Oberman <kob6558@gmail.com>
To: attilio@freebsd.org
Content-Type: text/plain; charset=UTF-8
Cc: Peter Holm <pho@freebsd.org>, bdrewery@freebsd.org,
	FreeBSD FS <freebsd-fs@freebsd.org>, freebsd-current@freebsd.org
Subject: Re: MPSAFE VFS -- List of upcoming actions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Sep 2012 19:03:55 -0000

On Wed, Sep 19, 2012 at 12:30 AM, Attilio Rao <attilio@freebsd.org> wrote:
> On Wed, Sep 19, 2012 at 4:47 AM, Kevin Oberman <kob6558@gmail.com> wrote:
>> On Tue, Sep 18, 2012 at 7:48 PM, Attilio Rao <attilio@freebsd.org> wrote:
>>> On Fri, Jul 13, 2012 at 12:18 AM, Attilio Rao <attilio@freebsd.org> wrote:
>>>> 2012/7/4 Attilio Rao <attilio@freebsd.org>:
>>>>> 2012/6/29 Attilio Rao <attilio@freebsd.org>:
>>>>>> As already published several times, according to the following plan:
>>>>>> http://wiki.freebsd.org/NONMPSAFE_DEORBIT_VFS
>>>>>>
>>>>>
>>>>> I still haven't heard from Vivien or Edward, anyway as NTFS is
>>>>> basically only used RO these days (also the mount_ntfs code just
>>>>> permits RO mounting) I stripped all the uncomplete/bogus write support
>>>>> with the following patch:
>>>>> http://www.freebsd.org/~attilio/ntfs_remove_write.patch
>>>>>
>>>>> This is an attempt to make the code smaller and possibly just focus on
>>>>> the locking that really matter (as read-only filesystem).
>>>>> On some points of the patch I'm a bit less sure as we could easily
>>>>> take into account also write for things like vaccess() arguments, and
>>>>> make easier to re-add correct write support at some point in the
>>>>> future, but still force RO, even if the approach used in the patch is
>>>>> more correct IMHO.
>>>>> As an added bonus this patch cleans some dirty code in the mount
>>>>> operation and fixes a bug as vfs_mountedfrom() is called before real
>>>>> mounting is completed and can still fail.
>>>>
>>>> A quick update on this.
>>>> It looks like NTFS won't be completed for this GSoC thus I seriously
>>>> need to find an alternative to not loose the NTFS support entirely.
>>>>
>>>> I tried to look into the NTFS implementation right now and it is
>>>> really a poor support. As Peter has also verified, it can deadlock in
>>>> no-time, it compeltely violates VFS rules, etc. IMHO it deserves a
>>>> complete rewrite if we would still support in-kernel NTFS. I also
>>>> tried to look at the NetBSD implementation. Their code is someway
>>>> similar to our, but they used very complicated (and very dirty) code
>>>> to do the locking. Even if I don't know well enough NetBSD VFS, I have
>>>> the impression not all the races are correctly handled. Definitively,
>>>> not something I would like to port.
>>>>
>>>> Considering all that the only viable option would be meaning an
>>>> userland filesystem implementation. My preferred choice would be to
>>>> import PUFFS and librefuse on top of it but honestly it requires a lot
>>>> of time to be completed, time which I don't currently have as in 2
>>>> months Giant must be gone by the VFS.
>>>>
>>>> I then decided to switch to gnn's rewamp of FUSE patches. You can find
>>>> his initial e-mail here:
>>>> http://lists.freebsd.org/pipermail/freebsd-fs/2012-March/013876.html
>>>>
>>>> I've precisely got the second version of George's patch and created
>>>> this dolphin branch:
>>>> svn://svn.freebsd.org/base/projects/fuse
>>>>
>>>> I'm fixing low hanging fruit for the moment (see r238411 for example)
>>>> and I still have to make a throughful review.
>>>> However my idea is to commit the support once:
>>>> - ntfs-3g is well stress-tested and proves to be bug-free
>>>> - there is no major/big technical issue pending after the reviews
>>>
>>> In the last weeks Peter, Florian, Gustau and I have been working in
>>> stabilizing fuse support. In the specific, Peter has worked hard on
>>> producing several utilities to nit stress-test fuse and in particular
>>> ntfs, Florian has improved fuse related ports (as explained later) and
>>> Gustau has done sparse testing. I feel moderately satisfied by the
>>> level of stability of fuse now to propose to wider usage, in
>>> particular given the huge amount of complaints I'm hearing around
>>> about occasional fuse users.
>>>
>>> The final target of the project is to completely import into base the
>>> content of fusefs-kmod starting from earlier posted patches by George.
>>> So far, we took care only of importing in the fuse branch the kernel
>>> part, so that fusefs-kmod userland part is still needed to be
>>> installed from ports, but I was studying the mount_fusefs licensing
>>> before to process with the import for the userland bits of it.
>>>
>>> The fixing has been happening here:
>>> svn://svn.freebsd.org/base/projects/fuse/
>>>
>>> which is essentially an HEAD branch + fuse kernel components. In order
>>> to get fuse, please compile a kernel from this branch with FUSE option
>>> or simply build and load fuse module.
>>> Alternatively, a kernel patch that should work with HEAD@240684 is here:
>>> http://www.freebsd.org/~attilio/fuse_import/fuse_240684.patch
>>>
>>> I guess the patch can easilly apply to all FreeBSD branches, really,
>>> but it is not tested to anything else different then -CURRENT.
>>>
>>> As said you still need currently to build fusefs-kmod port. However
>>> you need these further patches, to be put in the fusefs-kmod/files/
>>> directory::
>>> http://www.freebsd.org/~attilio/fuse_import/patch-Makefile
>>> http://www.freebsd.org/~attilio/fuse_import/patch-mount_fusefs__mount_fusefs2.c
>>>
>>> They both disable the old kernel building/linking and import new
>>> functionality to let the new kernel support work well in presence of
>>> many consumers.
>>>
>>> In addition to fusefs-kmod, Bryan and Florian have also updated
>>> fusefs-lib and fusefs-ntfs ports. For instance, please refer to this
>>> e-mail:
>>> http://lists.freebsd.org/pipermail/freebsd-ports/2012-August/077950.html
>>>
>>> Even if this work is someway independent by the fusefs-kmod import, I
>>> warmly suggest to all of you to use their patches (and this what we
>>> have been testing so far too).
>>>
>>> At this point what I'm looking for are reviews and further testing.
>>> I would like to spend some words on what you should expect from this work:
>>> *Fuse is far from being perfect*.
>>> I cannot stress this enough. Peter stress-tests could break also Fuse
>>> on Linux generally and by Fuse authors admissions the modules can
>>> never guarantee to be completely starvation-free. However, they tend
>>> to be designed in a way that sleeps can be at least interrupted
>>> easily, making at least easy to recover from deadlocks. This is mostly
>>> retained also in FreeBSD, for what I can tell. Also, sometimes fuse
>>> seems to leave a small amount of hidden files, when it find references
>>> on files it wants to delete. This happens also under Linux and it is
>>> part of FUSE design, not much we can do.
>>> However, if deadlocks can be someway tollerated, things you should
>>> really pay attention are dumps of fuse modules (like ntfs-3g binary)
>>> and kernel panics. They must not happen and if they do they need to be
>>> fixed promptly.
>>> However, the good new is that ntfs seems doing exceptionally good.
>>> Florian could use ntfs as a backend for postgresql test. I think this
>>> is by far a big improvement if compared to current in-kernel ntfs
>>> which is completely torned.
>>>
>>> So far we have almost entirely tested only ntfs-3g. I know Gustau also
>>> used other modules like sshfs and George used GlusterFS with his older
>>> patches, but I encourage you to test as many modules as you want, as
>>> they may expose different bugs. Of course, I don't plan to spend much
>>> more time on FUSE, but I can occasionally look at bugs as they fall in
>>> the filesystems category and I'm always interested in keeping a good
>>> open eye on such issues.
>>>
>>> A few operational informations:
>>> - In the next days I will import the userland bits of fusefs-kmod to
>>> the fuse project branch making the port obsolete. When this happens I
>>> will make this clear to the user of this thread.
>>> - If no major bug is remained by the early October, I will commit this
>>> to -CURRENT
>>> - I expect Bryan and Florian to commit libfuse and ntfs updates soon.
>>> They can do independently from the fusefs-kmod retiral, but I would
>>> prefer their patches to go on first.
>>> - After that I will handover fusefs maintainership to gnn as agreed in
>>> precedence but I will be around helping with analysis and fixing,
>>> depending on time availability
>>>
>>> In the end I have really 2 minor questions:
>>> - One is about importing the mount_fusefs userland bits. I don't think
>>> we need a vendor import at all because they were developed by a
>>> FreeBSD GSoC student and kept in his git repo (or someone else's).
>>> Anyway, i'd just commit as new files once I do a good sweep. I hope
>>> nobody objects to that.
>>> - Another one is: fusefs-kmod right now is only amd64/i386 specific. I
>>> have no idea why as it has not any MD specific code. However I'm sure
>>> it has not been tested on other arches so far. Anyway I left it usable
>>> by all the arches. I think this is the correct choice. If someone
>>> objects with valid argument I can bring it back to be usable only on
>>> i386 and amd64.
>>>
>>> That's all, for any question please don't hesitate to contact me and
>>> the other people involved in this work.
>>
>> Attilio (and the crew),
>>
>> Thanks for working on fusefs-ntfs. It's been increasingly worrying to
>> me that we might lose it and I really depend on it. I really hope to
>> be able to use rsync to update files without killing my system some
>> day.
>>
>> I tried the new fusefs-libs and fusefs-ntfs ports from Florian and
>> Bryan, but ran into trouble as I could no longer build the kmod after
>> installing the updated fusefs-libs. It had an unresolved symbol:
>> cc -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE
>> -nostdinc  -I../include -I. -I@ -I@/contrib/altq -finline-limit=8000
>> --param inline-unit-growth=100 --param large-function-growth=1000
>> -fno-common  -fno-omit-frame-pointer  -mcmodel=kernel -mno-red-zone
>> -mno-mmx -mno-sse -msoft-float  -fno-asynchronous-unwind-tables
>> -ffreestanding -fstack-protector -std=iso9899:1999 -fstack-protector
>> -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes
>> -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual  -Wundef
>> -Wno-pointer-sign -fformat-extensions  -Wmissing-include-dirs
>> -fdiagnostics-show-option   -c fuse_vnops.c
>> fuse_vnops.c: In function 'create_filehandle':
>> fuse_vnops.c:1586: error: 'struct fuse_open_in' has no member named 'mode'
>> *** [fuse_vnops.o] Error code 1
>>
>> This was on amd64 9-Stable r239879 until/unless this issue is
>> resolved, please keep the existing port available and/or mark the new
>> one to not install on pre-10 systems.
>
> If you follow the rule I described in this e-mail, the fusefs-kmod
> kernel part won't be build anymore, so you won't run into this.
> If it is build yet, please let me know because there is a bug in the 2
> patches I posted for fusefs-kmod port.

Attilo,

I assumed that your new kernel module was only tested/working with
current, so I did not try to use it. I was only referring to the use
of the updated of fusefs-libs and fusefs-ntfs that Florian and Bryan
provided. I had tested these on 9-stable and found that after
installing the updated fusefs-libs, the old fusefs-kmod port would no
longer compile.

Today Florian sent me a one line patch to fuse-modue/fuse-vnops.c in
the current fusefs-kmod port which appears to have fixed the problem.
It compiled fine and it is currently running on the system on which I
am typing this. I have done a bit of light testing and it works to
this point. I'll do some heavier testing later today. So it looks like
this there is probably no issue with Florian committing the new
fusefs-libs and fusefs-ntfs ports for those of us not running current.

If I get enough time, I'll look into applying the patches to the
kernel module on 9-stableand see how that does, but I have my day job
and contractors working in the house, so I won't make any promises.

Thanks again to you and all of the others who contributed to this. May
not be perfect, but it is a huge win over the kernel NTFS code,
especially or those of us who need to actually write a file now and
then.
-- 
R. Kevin Oberman, Network Engineer
E-mail: kob6558@gmail.com

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 19 19:09:00 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A0EFF1065670;
	Wed, 19 Sep 2012 19:09:00 +0000 (UTC)
	(envelope-from asmrookie@gmail.com)
Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com
	[209.85.217.182])
	by mx1.freebsd.org (Postfix) with ESMTP id E95698FC12;
	Wed, 19 Sep 2012 19:08:58 +0000 (UTC)
Received: by lbbgg13 with SMTP id gg13so1714225lbb.13
	for <multiple recipients>; Wed, 19 Sep 2012 12:08:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:reply-to:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type;
	bh=12+s5RgA3unmVGZDyI1LX38jPqNdroOc1I3M/BgBDD8=;
	b=eJad25XU58ZMWnfiHy2MHybC9htlZLKB3HeELkkHPMOFuoDTIyFtDAG/gEqe+n3owi
	vPEQJ6dQnXse0SboLvHA1fKe/TiA2KO1TqETHsL0oX63prN4pXNR/QQbGBJ/B9jrbsZL
	W80Kq62jt1alrKkOCTDQqxl8ka0u8cZ54/GbD4JYCyMsxGA0IZRJmSFMIRQ0RhSMq38u
	RwXAsiKlyeMs46p7lpO9lGroImFX26qt20ljFoueKEBiWdrQFlqI0+gjMNrMxBLpBhNU
	FjZAR+BWsglxq7oe9HdvM+BTL4qw4ACqMyVjQ9JiXMDBoQD6fSVc7X0T6Ci3NqFy6hdf
	BjYg==
MIME-Version: 1.0
Received: by 10.152.131.68 with SMTP id ok4mr3414683lab.47.1348081736990; Wed,
	19 Sep 2012 12:08:56 -0700 (PDT)
Sender: asmrookie@gmail.com
Received: by 10.112.102.39 with HTTP; Wed, 19 Sep 2012 12:08:56 -0700 (PDT)
In-Reply-To: <CAN6yY1urtKmTZ3Sk5H+b5SSZrYKqkpMZ_nvQPVLiN+c51-i_ng@mail.gmail.com>
References: <CAJ-FndCQ0YEo9_6x3g-12XEs8QmtyecwkLBX9z_sptnOUNTHrw@mail.gmail.com>
	<20120829060158.GA38721@x2.osted.lan>
	<CAJ-FndAaFv2o05MZZceT8Qr4mhPxuzrnmOZ30c3gy8=pnjjZvw@mail.gmail.com>
	<20120831052003.GA91340@x2.osted.lan>
	<CAJ-FndAaxQA8NYCFSN629XXi9zMVNyu2TuHjZLvmn3jhzRJb4w@mail.gmail.com>
	<CAJ-FndDdDVuwc=NgDeG7XiWW59-+Ls5wc2GBqbjLOLDUdUb9SA@mail.gmail.com>
	<20120905201531.GA54452@x2.osted.lan>
	<CAJ-FndCHSroZFfVgHAW8SUVZhDSaX9qix=aZnHVC_BN_fW6sgg@mail.gmail.com>
	<CAJ-FndDr5WmeKXCwSCucQ4w3hPHRBuu36YH1xiW_wKXOkKEdZg@mail.gmail.com>
	<CAJ-FndCvc+phY_g4CeGfzsj017roxs_C5adjuLuszpEPWO2+1g@mail.gmail.com>
	<20120917140055.GA9037@x2.osted.lan>
	<CAJ-FndAP9Ua6tRcbrfYY1+56O-YbJvmyaUco9K42-0hmchKD6g@mail.gmail.com>
	<CAN6yY1tjHFEopgJ+cAfQ5ES5Q4NnOg1tYA81==9uzPQgpQVDzA@mail.gmail.com>
	<CAJ-FndBDb_OdcdkW5YYs_X_+YdZc8N+krvK2pPzdnYa+3nYEDg@mail.gmail.com>
	<CAN6yY1urtKmTZ3Sk5H+b5SSZrYKqkpMZ_nvQPVLiN+c51-i_ng@mail.gmail.com>
Date: Wed, 19 Sep 2012 20:08:56 +0100
X-Google-Sender-Auth: rSaECWf7wPkM6KW-iZebcef82T8
Message-ID: <CAJ-FndD_acVWQeRpAJp5nyxSy2MS-1=astA5d-9cATsg7eXxRQ@mail.gmail.com>
From: Attilio Rao <attilio@freebsd.org>
To: Kevin Oberman <kob6558@gmail.com>
Content-Type: text/plain; charset=UTF-8
Cc: Peter Holm <pho@freebsd.org>, bdrewery@freebsd.org,
	FreeBSD FS <freebsd-fs@freebsd.org>, freebsd-current@freebsd.org
Subject: Re: MPSAFE VFS -- List of upcoming actions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: attilio@FreeBSD.org
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Sep 2012 19:09:00 -0000

On 9/19/12, Kevin Oberman <kob6558@gmail.com> wrote:
> On Wed, Sep 19, 2012 at 12:30 AM, Attilio Rao <attilio@freebsd.org> wrote:
>> On Wed, Sep 19, 2012 at 4:47 AM, Kevin Oberman <kob6558@gmail.com> wrote:
>>> On Tue, Sep 18, 2012 at 7:48 PM, Attilio Rao <attilio@freebsd.org>
>>> wrote:
>>>> On Fri, Jul 13, 2012 at 12:18 AM, Attilio Rao <attilio@freebsd.org>
>>>> wrote:
>>>>> 2012/7/4 Attilio Rao <attilio@freebsd.org>:
>>>>>> 2012/6/29 Attilio Rao <attilio@freebsd.org>:
>>>>>>> As already published several times, according to the following plan:
>>>>>>> http://wiki.freebsd.org/NONMPSAFE_DEORBIT_VFS
>>>>>>>
>>>>>>
>>>>>> I still haven't heard from Vivien or Edward, anyway as NTFS is
>>>>>> basically only used RO these days (also the mount_ntfs code just
>>>>>> permits RO mounting) I stripped all the uncomplete/bogus write
>>>>>> support
>>>>>> with the following patch:
>>>>>> http://www.freebsd.org/~attilio/ntfs_remove_write.patch
>>>>>>
>>>>>> This is an attempt to make the code smaller and possibly just focus
>>>>>> on
>>>>>> the locking that really matter (as read-only filesystem).
>>>>>> On some points of the patch I'm a bit less sure as we could easily
>>>>>> take into account also write for things like vaccess() arguments, and
>>>>>> make easier to re-add correct write support at some point in the
>>>>>> future, but still force RO, even if the approach used in the patch is
>>>>>> more correct IMHO.
>>>>>> As an added bonus this patch cleans some dirty code in the mount
>>>>>> operation and fixes a bug as vfs_mountedfrom() is called before real
>>>>>> mounting is completed and can still fail.
>>>>>
>>>>> A quick update on this.
>>>>> It looks like NTFS won't be completed for this GSoC thus I seriously
>>>>> need to find an alternative to not loose the NTFS support entirely.
>>>>>
>>>>> I tried to look into the NTFS implementation right now and it is
>>>>> really a poor support. As Peter has also verified, it can deadlock in
>>>>> no-time, it compeltely violates VFS rules, etc. IMHO it deserves a
>>>>> complete rewrite if we would still support in-kernel NTFS. I also
>>>>> tried to look at the NetBSD implementation. Their code is someway
>>>>> similar to our, but they used very complicated (and very dirty) code
>>>>> to do the locking. Even if I don't know well enough NetBSD VFS, I have
>>>>> the impression not all the races are correctly handled. Definitively,
>>>>> not something I would like to port.
>>>>>
>>>>> Considering all that the only viable option would be meaning an
>>>>> userland filesystem implementation. My preferred choice would be to
>>>>> import PUFFS and librefuse on top of it but honestly it requires a lot
>>>>> of time to be completed, time which I don't currently have as in 2
>>>>> months Giant must be gone by the VFS.
>>>>>
>>>>> I then decided to switch to gnn's rewamp of FUSE patches. You can find
>>>>> his initial e-mail here:
>>>>> http://lists.freebsd.org/pipermail/freebsd-fs/2012-March/013876.html
>>>>>
>>>>> I've precisely got the second version of George's patch and created
>>>>> this dolphin branch:
>>>>> svn://svn.freebsd.org/base/projects/fuse
>>>>>
>>>>> I'm fixing low hanging fruit for the moment (see r238411 for example)
>>>>> and I still have to make a throughful review.
>>>>> However my idea is to commit the support once:
>>>>> - ntfs-3g is well stress-tested and proves to be bug-free
>>>>> - there is no major/big technical issue pending after the reviews
>>>>
>>>> In the last weeks Peter, Florian, Gustau and I have been working in
>>>> stabilizing fuse support. In the specific, Peter has worked hard on
>>>> producing several utilities to nit stress-test fuse and in particular
>>>> ntfs, Florian has improved fuse related ports (as explained later) and
>>>> Gustau has done sparse testing. I feel moderately satisfied by the
>>>> level of stability of fuse now to propose to wider usage, in
>>>> particular given the huge amount of complaints I'm hearing around
>>>> about occasional fuse users.
>>>>
>>>> The final target of the project is to completely import into base the
>>>> content of fusefs-kmod starting from earlier posted patches by George.
>>>> So far, we took care only of importing in the fuse branch the kernel
>>>> part, so that fusefs-kmod userland part is still needed to be
>>>> installed from ports, but I was studying the mount_fusefs licensing
>>>> before to process with the import for the userland bits of it.
>>>>
>>>> The fixing has been happening here:
>>>> svn://svn.freebsd.org/base/projects/fuse/
>>>>
>>>> which is essentially an HEAD branch + fuse kernel components. In order
>>>> to get fuse, please compile a kernel from this branch with FUSE option
>>>> or simply build and load fuse module.
>>>> Alternatively, a kernel patch that should work with HEAD@240684 is
>>>> here:
>>>> http://www.freebsd.org/~attilio/fuse_import/fuse_240684.patch
>>>>
>>>> I guess the patch can easilly apply to all FreeBSD branches, really,
>>>> but it is not tested to anything else different then -CURRENT.
>>>>
>>>> As said you still need currently to build fusefs-kmod port. However
>>>> you need these further patches, to be put in the fusefs-kmod/files/
>>>> directory::
>>>> http://www.freebsd.org/~attilio/fuse_import/patch-Makefile
>>>> http://www.freebsd.org/~attilio/fuse_import/patch-mount_fusefs__mount_fusefs2.c
>>>>
>>>> They both disable the old kernel building/linking and import new
>>>> functionality to let the new kernel support work well in presence of
>>>> many consumers.
>>>>
>>>> In addition to fusefs-kmod, Bryan and Florian have also updated
>>>> fusefs-lib and fusefs-ntfs ports. For instance, please refer to this
>>>> e-mail:
>>>> http://lists.freebsd.org/pipermail/freebsd-ports/2012-August/077950.html
>>>>
>>>> Even if this work is someway independent by the fusefs-kmod import, I
>>>> warmly suggest to all of you to use their patches (and this what we
>>>> have been testing so far too).
>>>>
>>>> At this point what I'm looking for are reviews and further testing.
>>>> I would like to spend some words on what you should expect from this
>>>> work:
>>>> *Fuse is far from being perfect*.
>>>> I cannot stress this enough. Peter stress-tests could break also Fuse
>>>> on Linux generally and by Fuse authors admissions the modules can
>>>> never guarantee to be completely starvation-free. However, they tend
>>>> to be designed in a way that sleeps can be at least interrupted
>>>> easily, making at least easy to recover from deadlocks. This is mostly
>>>> retained also in FreeBSD, for what I can tell. Also, sometimes fuse
>>>> seems to leave a small amount of hidden files, when it find references
>>>> on files it wants to delete. This happens also under Linux and it is
>>>> part of FUSE design, not much we can do.
>>>> However, if deadlocks can be someway tollerated, things you should
>>>> really pay attention are dumps of fuse modules (like ntfs-3g binary)
>>>> and kernel panics. They must not happen and if they do they need to be
>>>> fixed promptly.
>>>> However, the good new is that ntfs seems doing exceptionally good.
>>>> Florian could use ntfs as a backend for postgresql test. I think this
>>>> is by far a big improvement if compared to current in-kernel ntfs
>>>> which is completely torned.
>>>>
>>>> So far we have almost entirely tested only ntfs-3g. I know Gustau also
>>>> used other modules like sshfs and George used GlusterFS with his older
>>>> patches, but I encourage you to test as many modules as you want, as
>>>> they may expose different bugs. Of course, I don't plan to spend much
>>>> more time on FUSE, but I can occasionally look at bugs as they fall in
>>>> the filesystems category and I'm always interested in keeping a good
>>>> open eye on such issues.
>>>>
>>>> A few operational informations:
>>>> - In the next days I will import the userland bits of fusefs-kmod to
>>>> the fuse project branch making the port obsolete. When this happens I
>>>> will make this clear to the user of this thread.
>>>> - If no major bug is remained by the early October, I will commit this
>>>> to -CURRENT
>>>> - I expect Bryan and Florian to commit libfuse and ntfs updates soon.
>>>> They can do independently from the fusefs-kmod retiral, but I would
>>>> prefer their patches to go on first.
>>>> - After that I will handover fusefs maintainership to gnn as agreed in
>>>> precedence but I will be around helping with analysis and fixing,
>>>> depending on time availability
>>>>
>>>> In the end I have really 2 minor questions:
>>>> - One is about importing the mount_fusefs userland bits. I don't think
>>>> we need a vendor import at all because they were developed by a
>>>> FreeBSD GSoC student and kept in his git repo (or someone else's).
>>>> Anyway, i'd just commit as new files once I do a good sweep. I hope
>>>> nobody objects to that.
>>>> - Another one is: fusefs-kmod right now is only amd64/i386 specific. I
>>>> have no idea why as it has not any MD specific code. However I'm sure
>>>> it has not been tested on other arches so far. Anyway I left it usable
>>>> by all the arches. I think this is the correct choice. If someone
>>>> objects with valid argument I can bring it back to be usable only on
>>>> i386 and amd64.
>>>>
>>>> That's all, for any question please don't hesitate to contact me and
>>>> the other people involved in this work.
>>>
>>> Attilio (and the crew),
>>>
>>> Thanks for working on fusefs-ntfs. It's been increasingly worrying to
>>> me that we might lose it and I really depend on it. I really hope to
>>> be able to use rsync to update files without killing my system some
>>> day.
>>>
>>> I tried the new fusefs-libs and fusefs-ntfs ports from Florian and
>>> Bryan, but ran into trouble as I could no longer build the kmod after
>>> installing the updated fusefs-libs. It had an unresolved symbol:
>>> cc -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE
>>> -nostdinc  -I../include -I. -I@ -I@/contrib/altq -finline-limit=8000
>>> --param inline-unit-growth=100 --param large-function-growth=1000
>>> -fno-common  -fno-omit-frame-pointer  -mcmodel=kernel -mno-red-zone
>>> -mno-mmx -mno-sse -msoft-float  -fno-asynchronous-unwind-tables
>>> -ffreestanding -fstack-protector -std=iso9899:1999 -fstack-protector
>>> -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes
>>> -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual  -Wundef
>>> -Wno-pointer-sign -fformat-extensions  -Wmissing-include-dirs
>>> -fdiagnostics-show-option   -c fuse_vnops.c
>>> fuse_vnops.c: In function 'create_filehandle':
>>> fuse_vnops.c:1586: error: 'struct fuse_open_in' has no member named
>>> 'mode'
>>> *** [fuse_vnops.o] Error code 1
>>>
>>> This was on amd64 9-Stable r239879 until/unless this issue is
>>> resolved, please keep the existing port available and/or mark the new
>>> one to not install on pre-10 systems.
>>
>> If you follow the rule I described in this e-mail, the fusefs-kmod
>> kernel part won't be build anymore, so you won't run into this.
>> If it is build yet, please let me know because there is a bug in the 2
>> patches I posted for fusefs-kmod port.
>
> Attilo,
>
> I assumed that your new kernel module was only tested/working with
> current, so I did not try to use it. I was only referring to the use
> of the updated of fusefs-libs and fusefs-ntfs that Florian and Bryan
> provided. I had tested these on 9-stable and found that after
> installing the updated fusefs-libs, the old fusefs-kmod port would no
> longer compile.
>
> Today Florian sent me a one line patch to fuse-modue/fuse-vnops.c in
> the current fusefs-kmod port which appears to have fixed the problem.
> It compiled fine and it is currently running on the system on which I
> am typing this. I have done a bit of light testing and it works to
> this point. I'll do some heavier testing later today. So it looks like
> this there is probably no issue with Florian committing the new
> fusefs-libs and fusefs-ntfs ports for those of us not running current.

Thanks for let us know. I think that Bryan and Florian should really
update the ports as soon as possible.

Also, I hope that someone will sync the fusefs-kmod port (in
particular the kernel part) with the kernel code that our branch
brings along. I think Florian volountereed for this, so there should
not be a problem on that.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 19 22:26:16 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id BD778106579D
	for <freebsd-fs@freebsd.org>; Wed, 19 Sep 2012 22:26:16 +0000 (UTC)
	(envelope-from bdrewery@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 941AB8FC19
	for <freebsd-fs@freebsd.org>; Wed, 19 Sep 2012 22:26:16 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q8JMQGEc091054
	for <freebsd-fs@freebsd.org>; Wed, 19 Sep 2012 22:26:16 GMT
	(envelope-from bdrewery@freefall.freebsd.org)
Received: (from bdrewery@localhost)
	by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q8JMQGff091049
	for freebsd-fs@freebsd.org; Wed, 19 Sep 2012 22:26:16 GMT
	(envelope-from bdrewery)
Received: (qmail 68708 invoked from network); 19 Sep 2012 17:26:10 -0500
Received: from unknown (HELO ?192.168.0.74?) (freebsd@shatow.net@74.94.87.209)
	by sweb.xzibition.com with ESMTPA; 19 Sep 2012 17:26:10 -0500
Message-ID: <505A468E.2080902@FreeBSD.org>
Date: Wed, 19 Sep 2012 17:26:22 -0500
From: Bryan Drewery <bdrewery@freebsd.org>
Organization: FreeBSD
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
	rv:15.0) Gecko/20120824 Thunderbird/15.0
MIME-Version: 1.0
To: attilio@freebsd.org
References: <CAJ-FndCQ0YEo9_6x3g-12XEs8QmtyecwkLBX9z_sptnOUNTHrw@mail.gmail.com>
	<20120829060158.GA38721@x2.osted.lan>
	<CAJ-FndAaFv2o05MZZceT8Qr4mhPxuzrnmOZ30c3gy8=pnjjZvw@mail.gmail.com>
	<20120831052003.GA91340@x2.osted.lan>
	<CAJ-FndAaxQA8NYCFSN629XXi9zMVNyu2TuHjZLvmn3jhzRJb4w@mail.gmail.com>
	<CAJ-FndDdDVuwc=NgDeG7XiWW59-+Ls5wc2GBqbjLOLDUdUb9SA@mail.gmail.com>
	<20120905201531.GA54452@x2.osted.lan>
	<CAJ-FndCHSroZFfVgHAW8SUVZhDSaX9qix=aZnHVC_BN_fW6sgg@mail.gmail.com>
	<CAJ-FndDr5WmeKXCwSCucQ4w3hPHRBuu36YH1xiW_wKXOkKEdZg@mail.gmail.com>
	<CAJ-FndCvc+phY_g4CeGfzsj017roxs_C5adjuLuszpEPWO2+1g@mail.gmail.com>
	<20120917140055.GA9037@x2.osted.lan>
	<CAJ-FndAP9Ua6tRcbrfYY1+56O-YbJvmyaUco9K42-0hmchKD6g@mail.gmail.com>
In-Reply-To: <CAJ-FndAP9Ua6tRcbrfYY1+56O-YbJvmyaUco9K42-0hmchKD6g@mail.gmail.com>
X-Enigmail-Version: 1.4.4
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Cc: Peter Holm <pho@freebsd.org>, FreeBSD FS <freebsd-fs@freebsd.org>,
	freebsd-current@freebsd.org
Subject: Re: MPSAFE VFS -- List of upcoming actions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Sep 2012 22:26:16 -0000

On 9/18/2012 9:48 PM, Attilio Rao wrote:
> In addition to fusefs-kmod, Bryan and Florian have also updated
> fusefs-lib and fusefs-ntfs ports. For instance, please refer to this
> e-mail:
> http://lists.freebsd.org/pipermail/freebsd-ports/2012-August/077950.html
> 
> Even if this work is someway independent by the fusefs-kmod import, I
> warmly suggest to all of you to use their patches (and this what we
> have been testing so far too).

I have committed my updates to sysutils/fusefs-ntfs now.

-- 
Regards,
Bryan Drewery
bdrewery@freenode/EFNet

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 20 05:11:41 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 9DA07106564A;
	Thu, 20 Sep 2012 05:11:41 +0000 (UTC)
	(envelope-from kamikaze@bsdforen.de)
Received: from mail.server1.bsdforen.de (bsdforen.de [82.193.243.81])
	by mx1.freebsd.org (Postfix) with ESMTP id 3AF288FC08;
	Thu, 20 Sep 2012 05:11:40 +0000 (UTC)
Received: from mobileKamikaze.norad
	(HSI-KBW-134-3-231-194.hsi14.kabel-badenwuerttemberg.de
	[134.3.231.194])
	(using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.server1.bsdforen.de (Postfix) with ESMTPSA id 32711861A0;
	Thu, 20 Sep 2012 07:11:32 +0200 (CEST)
Message-ID: <505AA583.7090401@bsdforen.de>
Date: Thu, 20 Sep 2012 07:11:31 +0200
From: Dominic Fandrey <kamikaze@bsdforen.de>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:15.0) Gecko/20120912 Thunderbird/15.0.1
MIME-Version: 1.0
To: attilio@FreeBSD.org
References: <CAJ-FndCQ0YEo9_6x3g-12XEs8QmtyecwkLBX9z_sptnOUNTHrw@mail.gmail.com>
	<20120829060158.GA38721@x2.osted.lan>
	<CAJ-FndAaFv2o05MZZceT8Qr4mhPxuzrnmOZ30c3gy8=pnjjZvw@mail.gmail.com>
	<20120831052003.GA91340@x2.osted.lan>
	<CAJ-FndAaxQA8NYCFSN629XXi9zMVNyu2TuHjZLvmn3jhzRJb4w@mail.gmail.com>
	<CAJ-FndDdDVuwc=NgDeG7XiWW59-+Ls5wc2GBqbjLOLDUdUb9SA@mail.gmail.com>
	<20120905201531.GA54452@x2.osted.lan>
	<CAJ-FndCHSroZFfVgHAW8SUVZhDSaX9qix=aZnHVC_BN_fW6sgg@mail.gmail.com>
	<CAJ-FndDr5WmeKXCwSCucQ4w3hPHRBuu36YH1xiW_wKXOkKEdZg@mail.gmail.com>
	<CAJ-FndCvc+phY_g4CeGfzsj017roxs_C5adjuLuszpEPWO2+1g@mail.gmail.com>
	<20120917140055.GA9037@x2.osted.lan>
	<CAJ-FndAP9Ua6tRcbrfYY1+56O-YbJvmyaUco9K42-0hmchKD6g@mail.gmail.com>
In-Reply-To: <CAJ-FndAP9Ua6tRcbrfYY1+56O-YbJvmyaUco9K42-0hmchKD6g@mail.gmail.com>
Content-Type: text/plain; charset=ascii; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Peter Holm <pho@freebsd.org>, bdrewery@freebsd.org,
	FreeBSD FS <freebsd-fs@freebsd.org>, freebsd-current@freebsd.org
Subject: Re: MPSAFE VFS -- List of upcoming actions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Sep 2012 05:11:41 -0000

On 19/09/2012 04:48, Attilio Rao wrote:
> On Fri, Jul 13, 2012 at 12:18 AM, Attilio Rao <attilio@freebsd.org> wrote:
> ...
> Alternatively, a kernel patch that should work with HEAD@240684 is here:
> http://www.freebsd.org/~attilio/fuse_import/fuse_240684.patch
>
> I guess the patch can easilly apply to all FreeBSD branches, really,
> but it is not tested to anything else different then -CURRENT.

RELENG_9, fetched yesterday:
===> fuse (all)
env CCACHE_PREFIX=/usr/local/bin/distcc /usr/local/bin/ccache cc -O2 -pipe -march=core2 -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -nostdinc   -DHAVE_KERNEL_OPTION_HEADERS -include /usr/obj/HP6510b-9/amd64/usr/src/sys/HP6510b-9/opt_global.h -I. -I@ -I@/contrib/altq -finline-limit=8000 --param inline-unit-growth=100 --param large-function-growth=1000 -fno-common -g -fno-omit-frame-pointer -I/usr/obj/HP6510b-9/amd64/usr/src/sys/HP6510b-9  -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float  -fno-asynchronous-unwind-tables -ffreestanding -fstack-protector -std=iso9899:1999 -fstack-protector -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions  -Wmissing-include-dirs -fdiagnostics-show-option   -c /usr/src/sys/modules/fuse/../../fs/fuse/fuse_device.c
env CCACHE_PREFIX=/usr/local/bin/distcc /usr/local/bin/ccache cc -O2 -pipe -march=core2 -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -nostdinc   -DHAVE_KERNEL_OPTION_HEADERS -include /usr/obj/HP6510b-9/amd64/usr/src/sys/HP6510b-9/opt_global.h -I. -I@ -I@/contrib/altq -finline-limit=8000 --param inline-unit-growth=100 --param large-function-growth=1000 -fno-common -g -fno-omit-frame-pointer -I/usr/obj/HP6510b-9/amd64/usr/src/sys/HP6510b-9  -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float  -fno-asynchronous-unwind-tables -ffreestanding -fstack-protector -std=iso9899:1999 -fstack-protector -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions  -Wmissing-include-dirs -fdiagnostics-show-option   -c /usr/src/sys/modules/fuse/../../fs/fuse/fuse_node.c
distcc[20814] ERROR: compile /root/.ccache/tmp/fuse_node.tmp.mobileKamikaze.norad.20806.i on localhost failed
cc1: warnings being treated as errors
/usr/src/sys/modules/fuse/../../fs/fuse/fuse_node.c: In function 'fuse_vnode_setsize':
/usr/src/sys/modules/fuse/../../fs/fuse/fuse_node.c:378: warning: passing argument 3 of 'vtruncbuf' makes pointer from integer without a cast
/usr/src/sys/modules/fuse/../../fs/fuse/fuse_node.c:378: error: too few arguments to function 'vtruncbuf'
*** [fuse_node.o] Error code 1
1 error
*** [all] Error code 2
1 error
*** [modules-all] Error code 2
1 error
*** [buildkernel] Error code 2
1 error
*** [buildkernel] Error code 2

Stop in /usr/src.


-- 
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 20 12:59:13 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 28AC7106566B;
	Thu, 20 Sep 2012 12:59:13 +0000 (UTC)
	(envelope-from simon@comsys.ntu-kpi.kiev.ua)
Received: from comsys.kpi.ua (comsys.kpi.ua [77.47.192.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 737B88FC12;
	Thu, 20 Sep 2012 12:59:12 +0000 (UTC)
Received: from pm513-1.comsys.kpi.ua ([10.18.52.101]
	helo=pm513-1.comsys.ntu-kpi.kiev.ua)
	by comsys.kpi.ua with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.63)
	(envelope-from <simon@comsys.ntu-kpi.kiev.ua>)
	id 1TEgLW-0005iT-6v; Thu, 20 Sep 2012 15:59:10 +0300
Received: by pm513-1.comsys.ntu-kpi.kiev.ua (Postfix, from userid 1001)
	id 8EC2B1CC23; Thu, 20 Sep 2012 15:59:09 +0300 (EEST)
Date: Thu, 20 Sep 2012 15:59:09 +0300
From: Andrey Simonenko <simon@comsys.ntu-kpi.kiev.ua>
To: Rick Macklem <rmacklem@uoguelph.ca>
Message-ID: <20120920125909.GA9013@pm513-1.comsys.ntu-kpi.kiev.ua>
References: <C94CA78C-CDA9-4472-8BB7-CFD46FA0B3D9@FreeBSD.org>
	<2050472507.821722.1348009974939.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <2050472507.821722.1348009974939.JavaMail.root@erie.cs.uoguelph.ca>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Authenticated-User: simon@comsys.ntu-kpi.kiev.ua
X-Authenticator: plain
X-Sender-Verify: SUCCEEDED (sender exists & accepts mail)
X-Exim-Version: 4.63 (build at 28-Apr-2011 07:11:12)
X-Date: 2012-09-20 15:59:10
X-Connected-IP: 10.18.52.101:19601
X-Message-Linecount: 192
X-Body-Linecount: 175
X-Message-Size: 6779
X-Body-Size: 5939
Cc: FS List <freebsd-fs@FreeBSD.org>, Will Andrews <willa@spectralogic.com>,
	"Justin T. Gibbs" <gibbs@FreeBSD.org>
Subject: Re: testing/review of atomic export update patch
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Sep 2012 12:59:13 -0000

On Tue, Sep 18, 2012 at 07:12:54PM -0400, Rick Macklem wrote:
> Justin T. Gibbs wrote:
> > On Sep 16, 2012, at 3:41 PM, Rick Macklem <rmacklem@uoguelph.ca>
> > wrote:
> > 
> > > Hi,
> > >
> > > There is a simple patch at:
> > >  http://people.freebsd.org/~rmacklem/atomic-export.patch
> > > that can be applied to a kernel + mountd, so that the new
> > > nfsd can be suspended by mountd while the exports are being
> > > reloaded. It adds a new "-S" flag to mountd to enable this.
> > > (This avoids the long standing bug where clients receive ESTALE
> > > replies to RPCs while mountd is reloading exports.)
> > 
> > At Spectra, we are successfully using the NFSE patch set from
> > nfse.sourceforge.net (FreeBSD PR 136865). It addresses
> > the ESTALE problem in addition to cleaning up several aspects
> > of exports processing.
> > 
> > Have you reviewed the NFSE work? Do you have any issues
> > or concerns with it? What is the right path for getting NFSE
> > integrated into FreeBSD?
> > 
> I, personally, have not found the time to review it. As such,
> I can't state specifics, however there have been concerns w.r.t.
> a switch from mountd->nfse resulting in different behaviour when
> used with the same /etc/exports file used for mountd.
> 
> Some questions that need to be answered w.r.t. nfse, which I
> haven't had the time to do:
> - Are the differences listed here significant enough for a
>   change to be considered a POLA violation?
>     http://nfse.sourceforge.net/COMPATIBILITY

Just to be clear with what is "POLA violation", can somebody who
has interest in this topic answer the following questions:

Was the following change to mountd considered POLA violation?
(this change ignored the semantics of the -alldirs option)

   http://www.freebsd.org/cgi/cvsweb.cgi/src/usr.sbin/mountd/mountd.c.diff?r1=1.83;r2=1.84

Are the following changes (corrections) to mountd considered POLA violation?

   http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/170295
   http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/170413

> - If the server mount point is /sub1 and the only line
>   referring to this server volume in /etc/exports looks like:
> 
>   /sub1/sub2 client.net
> 
>   Does the following mount command work on client.net
>   # mount -t nfs -o nfsv3 server.net:/sub1 /mnt
>   when nfse is run with -C using the /etc/exports file?

This command will cause "access denied" for the NFSv2/3 clients
and NFS server will export /sub1 to NFSv2/3 clients.

>   This is typically referred to as an "administrative control",
>   since it is only enforced by mountd for the Mount protocol,
>   but is considered an important feature by some (rwatson@
>   expressed a desire/need for it).

Exactly like this.

> - Does the nfse patch handle exporting of all file systems types
>   and, in particular, the `zfs share` case.

It does not depend on file system type, a file system just should
support NFS.  NFSE is integrated as a part of NFS server code, so
ZFS snapshots will not be exported automatically.  Currently if ZFS
file system is exported, then its snapshots are exported as well (not
optional BTW) because VFS_CHECKEXP() is used.

"zfs share/unshare" modifies /etc/zfs/exports and depending on presence
of the /etc/nfs.exports file sends SIGHUP to mountd or calls "nfse -c ..."
to update settings dynamically.  It is assumed that if ZFS file system is
exported via "zfs share", then it is not exported in another file (details
in fsshare.c:nfse_update() in src/cddl.

1. Example how mountd after 1.84 of mountd.c understands -alldirs:

# cat /etc/exports
/cdrom -alldirs
# mountd
# showmount -e
Exports list on localhost:
/cdrom                             Everyone
# mount | grep /cdrom
# mount -t nfs -o nfsv3 127.0.0.1:/ /mnt
# ls /mnt
...
#

2. Example how nfse understands /etc/exports with only one line
   with pathname that is not a mount point:

# cat /etc/exports
/sub1/sub2 127.0.0.1
# mount | grep /sub1
/dev/md0 on /sub1 (ufs, local)
# nfse -C
# mount -t nfs -o nfsv3 127.0.0.1:/sub1 /mnt
[tcp] 127.0.0.1:/sub1: Permission denied
# mount -t nfs -o nfsv3 127.0.0.1:/sub1/sub2 /mnt
# ls /mnt
file.txt
# ls /sub1/sub2
file.txt
# showmount -e
Exports list on localhost:
/sub1                              127.0.0.1 
# nfse -c show
...
Pathname /sub1 (exported)
    Export specifications:
        -rw -sec sys -maproot=-2:-2 -host 127.0.0.1
    Subdirectories for NFSv2/3:
        /sub1/sub2
            -host 127.0.0.1
...
#

3. Example how nfse understands /etc/exports with one line with -alldirs:

# cat /etc/eports
/sub1/sub2 -alldirs 127.0.0.1
# mount | grep /sub1
/dev/md0 on /sub1 (ufs, local)
# nfse -C
# mount -t nfs -o nfsv3 127.0.0.1:/sub1 /mnt
[tcp] 127.0.0.1:/sub1: Permission denied
# mount -t nfs -o nfsv3 127.0.0.1:/sub1/sub2 /mnt
[tcp] 127.0.0.1:/sub1/sub2: Permission denied
# showmount -e
Exports list on localhost:
# nfse -c show
...
Pathname /sub1/sub2
    File system options:
        -alldirs
    Export specifications:
        -rw -sec sys -maproot=-2:-2 -host 127.0.0.1
    Subdirectories for NFSv2/3:
        /sub1/sub2
            -alldirs -host 127.0.0.1
...
# mount /dev/md1 /sub1/sub2
# mount | grep /sub1
/dev/md0 on /sub1 (ufs, local)
/dev/md1 on /sub1/sub2 (ufs, local)
# mount -t nfs -o nfsv3 127.0.0.1:/sub1 /mnt
[tcp] 127.0.0.1:/sub1: Permission denied
# mount -t nfs -o nfsv3 127.0.0.1:/sub1/sub2 /mnt
# showmount -e
Exports list on localhost:
/sub1/sub2                         127.0.0.1 
# nfse -c show
...
Pathname /sub1/sub2 (exported)
    File system options:
        -alldirs
    Export specifications:
        -rw -sec sys -maproot=-2:-2 -host 127.0.0.1
    Subdirectories for NFSv2/3:
        /sub1/sub2
            -alldirs -host 127.0.0.1
...
#

All examples were checked on 10-CURRENT with recent NFSE.  Some lines from
the "nfse -c show" command were removed.  There is no /var/run/mountd.pid ->
/var/run/nfse.pid symlink, so mount does not send SIGHUP to nfse.

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 20 15:46:32 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 0E355106566B
	for <freebsd-fs@freebsd.org>; Thu, 20 Sep 2012 15:46:32 +0000 (UTC)
	(envelope-from jusher71@yahoo.com)
Received: from nm4-vm4.bullet.mail.ne1.yahoo.com
	(nm4-vm4.bullet.mail.ne1.yahoo.com [98.138.91.164])
	by mx1.freebsd.org (Postfix) with SMTP id 950FA8FC08
	for <freebsd-fs@freebsd.org>; Thu, 20 Sep 2012 15:46:31 +0000 (UTC)
Received: from [98.138.90.48] by nm4.bullet.mail.ne1.yahoo.com with NNFMP;
	20 Sep 2012 15:46:25 -0000
Received: from [98.138.89.195] by tm1.bullet.mail.ne1.yahoo.com with NNFMP;
	20 Sep 2012 15:46:25 -0000
Received: from [127.0.0.1] by omp1053.mail.ne1.yahoo.com with NNFMP;
	20 Sep 2012 15:46:25 -0000
X-Yahoo-Newman-Property: ymail-3
X-Yahoo-Newman-Id: 119621.67281.bm@omp1053.mail.ne1.yahoo.com
Received: (qmail 57976 invoked by uid 60001); 20 Sep 2012 15:46:25 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024;
	t=1348155984; bh=5kX3eAU4bvIPixEHIyg4smH6GjdnHIl184cuG1m+qa0=;
	h=X-YMail-OSG:Received:X-Mailer:Message-ID:Date:From:Subject:To:MIME-Version:Content-Type;
	b=uYEA5j5aGQAaU5s9ep8YUGXmzolowM6g4n9OAfU2Mxy8GxYJ4xeDGytIhkiKIwh7ihafxB2hD+WAeruwVJ/N3vE8liS6jfWT/BGFeHodFMmwu5ZPDPS2o81o07/br2wpoex/GyKgauSr49acbph5hgUpVjGWZftPzqAINTwGVDs=
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com;
	h=X-YMail-OSG:Received:X-Mailer:Message-ID:Date:From:Subject:To:MIME-Version:Content-Type;
	b=2SisoEr0efAwa97x0ds0zzp8vtv4NXja2sLoS7ZYP5W14ReuzhFKSZBsva7HGVBm3tBR/WXj++UJPkXH8q6tCTfI/fiCfaRXTTED2Da36cD+OasENpKkj9xqySb0XLdC8tUZuoTksfKQ3X816ey3wA6njIazZ8+Wxx01+Tm74Qk=;
X-YMail-OSG: 6APzz3UVM1mLWy2PESfRgMMeb7L9DhL7aDSUtjnc9_drP.7
	F9gYQWIqQi3Qoqw.P678FRIdjUxg0.ckdOPGnwOvt_HJuPfCaYJ9.73Q_S40
	taEjQVAcTSC.rQUdvipUvZJQoIlD5BRiogBcCgXAD7LeHp0rIYJSpEvCGa_E
	R.KhrM6HAe2c0Y2qsQKlcF13LBTrwK4AIGAWqYhm2tCao6pRC_GS4mHMo3h4
	TyjFiaIemp6ib_VMvEVfvyFX1SICSGR1J0lEhMuJc0EW.bBzh219EMbgE2SL
	imeNLEwjBSX3Q2jjvWBUcoNk0BzFX1ujxL9F5uPjwqU9oM_UTWF_pf_AT8mx
	w0oOoWy29N8EMQvW.7.BMkgYdf8YZ6PPEkYv8oF_tM7l.b7TkWQc32haoGD7
	xNwKFhY4XCEelMw--
Received: from [12.202.173.2] by web121204.mail.ne1.yahoo.com via HTTP;
	Thu, 20 Sep 2012 08:46:24 PDT
X-Mailer: YahooMailClassic/15.0.8 YahooMailWebService/0.8.121.416
Message-ID: <1348155984.52722.YahooMailClassic@web121204.mail.ne1.yahoo.com>
Date: Thu, 20 Sep 2012 08:46:24 -0700 (PDT)
From: Jason Usher <jusher71@yahoo.com>
To: freebsd-fs@freebsd.org
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Subject: ZFS stats output - used, compressed, deduped, etc.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Sep 2012 15:46:32 -0000

Hi,

I have a ZFS filesystem with compression turned on.  Does the "used" property show me the actual data size, or the compressed data size ?  If it shows me the compressed size, where can I see the actual data size ?

I also wonder about checking status of dedupe - I created my pool without dedupe, and continue to NOT enable dedupe - from zpool history, we see:

zpool create -f -O atime=off -O setuid=off -O exec=off -m /mnt/pool pool raidz3 da0 da1 da2 da3 da4 da5 da6 da7 da8 da9 da10 da11

Later, I enabled dedup for just a single filesystem on this pool:

zfs set dedup=on pool/dataset

and now, I see in 'zpool list' a value for dedupratio:

pool  dedupratio     1.65x       -


Why do I see a value here ?  Isn't dedupe still OFF for the pool as a whole ?  I do NOT want to enable dedupe for the entire pool.

Also, why do I not see any dedupe stats for the individual filesystem ?  I see compressratio, and I see dedup=on, but I don't see any dedupratio for the filesystem itself...

Did turning on dedupe for a single filesystem turn it on for the entire pool ?

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 20 21:55:56 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 58C5F106564A
	for <freebsd-fs@freebsd.org>; Thu, 20 Sep 2012 21:55:56 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
	[131.104.91.44])
	by mx1.freebsd.org (Postfix) with ESMTP id 016A88FC16
	for <freebsd-fs@freebsd.org>; Thu, 20 Sep 2012 21:55:55 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ap8EAHqQW1CDaFvO/2dsb2JhbAA+BxaFdbg/giABAQUjBFIbDgoCAg0ZAlkGiBYLpyaTBoEhiXshhQ+BEgOVZIEUjw2DA4E+Ihs
X-IronPort-AV: E=Sophos;i="4.80,456,1344225600"; d="scan'208";a="179950066"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 20 Sep 2012 17:55:26 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 86877B4014;
	Thu, 20 Sep 2012 17:55:26 -0400 (EDT)
Date: Thu, 20 Sep 2012 17:55:26 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Konstantin Belousov <kostikbel@gmail.com>
Message-ID: <1237981048.964353.1348178126537.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20120919061659.GS37286@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: FS List <freebsd-fs@freebsd.org>
Subject: Re: testing/review of atomic export update patch
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Sep 2012 21:55:56 -0000

Konstantin Belousov wrote:
> On Tue, Sep 18, 2012 at 09:34:54AM -0400, Rick Macklem wrote:
> > Konstantin Belousov wrote:
> > > On Mon, Sep 17, 2012 at 05:32:44PM -0400, Rick Macklem wrote:
> > > > Konstantin Belousov wrote:
> > > > > On Sun, Sep 16, 2012 at 05:41:25PM -0400, Rick Macklem wrote:
> > > > > > Hi,
> > > > > >
> > > > > > There is a simple patch at:
> > > > > >   http://people.freebsd.org/~rmacklem/atomic-export.patch
> > > > > > that can be applied to a kernel + mountd, so that the new
> > > > > > nfsd can be suspended by mountd while the exports are being
> > > > > > reloaded. It adds a new "-S" flag to mountd to enable this.
> > > > > > (This avoids the long standing bug where clients receive
> > > > > > ESTALE
> > > > > >  replies to RPCs while mountd is reloading exports.)
> > > > >
> > > > > This looks simple, but also somewhat worrisome. What would
> > > > > happen
> > > > > if the mountd crashes after nfsd suspension is requested, but
> > > > > before
> > > > > resume was performed ?
> > > > >
> > > > > Might be, mountd should check for suspended nfsd on start and
> > > > > unsuspend
> > > > > it, if some flag is specified ?
> > > > Well, I think that happens with the patch as it stands.
> > > >
> > > > suspend is done if the "-S" option is specified, but that is a
> > > > no op
> > > > if it is already suspended. The resume is done no matter what
> > > > flags
> > > > are provided, so mountd will always try and do a "resume".
> > > > --> get_exportlist() is always called when mountd is started up
> > > > and
> > > >     it does the resume unconditionally when it completes.
> > > >     If mountd repeatedly crashes before completing
> > > >     get_exportlist()
> > > >     when it is started up, the exports will be all messed up, so
> > > >     having the nfsd threads suspended doesn't seem so bad for
> > > >     this
> > > >     case (which hopefully never happens;-).
> > > >
> > > > Both suspend and resume are just no ops for unpatched kernels.
> > > >
> > > > Maybe the comment in front of "resume" should explicitly explain
> > > > this, instead of saying resume is harmless to do under all
> > > > conditions?
> > > >
> > > > Thanks for looking at it, rick
> > > I see.
> > >
> > > My another note is that there is no any protection against
> > > parallel
> > > instances of suspend/resume happen. For instance, one thread could
> > > set
> > > suspend_nfsd = 1 and be descheduled, while another executes resume
> > > code sequence meantime. Then it would see suspend_nfsd != 0, while
> > > nfsv4rootfs_lock not held, and tries to unlock it. It seems that
> > > nfsv4_unlock would silently exit. The suspending thread resumes,
> > > and obtains the lock. You end up with suspend_nfsd == 0 but lock
> > > held.
> > Yes. I had assumed that mountd would be the only thing using these
> > syscalls
> > and it is single threaded. (The syscalls can only be done by root
> > for the
> > obvious reasons.;-)
> >
> > Maybe the following untested version of the syscalls would be
> > better, since
> > they would allow multiple concurrent calls to either suspend or
> > resume.
> > (There would still be an indeterminate case if one thread called
> > resume
> >  concurrently with another few calling suspend, but that is
> >  unavoidable,
> >  I think?)
> >
> > Again, thanks for the comments, rick
> > --- untested version of syscalls ---
> > 	} else if ((uap->flag & NFSSVC_SUSPENDNFSD) != 0) {
> > 		NFSLOCKV4ROOTMUTEX();
> > 		if (suspend_nfsd == 0) {
> > 			/* Lock out all nfsd threads */
> > 			igotlock = 0;
> > 			while (igotlock == 0 && suspend_nfsd == 0) {
> > 				igotlock = nfsv4_lock(&nfsv4rootfs_lock, 1,
> > 				    NULL, NFSV4ROOTLOCKMUTEXPTR, NULL);
> > 			}
> > 			suspend_nfsd = 1;
> > 		}
> > 		NFSUNLOCKV4ROOTMUTEX();
> > 		error = 0;
> > 	} else if ((uap->flag & NFSSVC_RESUMENFSD) != 0) {
> > 		NFSLOCKV4ROOTMUTEX();
> > 		if (suspend_nfsd != 0) {
> > 			nfsv4_unlock(&nfsv4rootfs_lock, 0);
> > 			suspend_nfsd = 0;
> > 		}
> > 		NFSUNLOCKV4ROOTMUTEX();
> > 		error = 0;
> > 	}
> 
> From the cursory look, this variant is an improvement, mostly by
> taking
> the interlock before testing suspend_nfsd, and using the while loop.
> 
> Is it possible to also make the sleep for the lock interruptible ?
> So that blocked mountd could be killed by a signal ?
Well, it would require some coding. An extra argument to nfsv4_lock()
to indicate to do so and then either the caller would have to check
for a pending termination signal when it returns 0 (indicates didn't get
lock) or a new return value to indicate EINTR. The latter would require
all the calls to it to be changed to recognize the new 3rd return case.
Because there are a lot of these calls, I'd tend towards just having the
caller check for a pending signal.

Not sure if it would make much difference though. The only time it
would get stuck in nfsv4_lock() is if the nfsd threads are all wedged
and in that case having mountd wedged too probably doesn't make much
difference, since the NFS service is toast in that case anyhow.

If you think it is worth doing, I can add that. I basically see this
as a "stop-gap" fix until such time as something like nfse is done,
but since I haven't the time to look at nfse right now, I have no
idea when/if that might happen.

rick


From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 00:03:16 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 2C083106566C;
	Fri, 21 Sep 2012 00:03:16 +0000 (UTC) (envelope-from flo@smeets.im)
Received: from mail.solomo.de (mail.solomo.de [IPv6:2a01:238:42c7:9a00::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 8C2F68FC1E;
	Fri, 21 Sep 2012 00:03:15 +0000 (UTC)
Received: from mail.solomo.de (localhost [127.0.0.1])
	by mail.solomo.de (Postfix) with ESMTP id 790DDC3833;
	Fri, 21 Sep 2012 02:03:14 +0200 (CEST)
X-Virus-Scanned: amavisd-new at solomo.de
Received: from mail.solomo.de ([127.0.0.1])
	by mail.solomo.de (mail.solomo.de [127.0.0.1]) (amavisd-new, port 10024)
	with LMTP id eJL7agiouy0y; Fri, 21 Sep 2012 02:03:13 +0200 (CEST)
Received: from nibbler-osx.local (unknown
	[IPv6:2001:4dd0:ff00:8bb6:d806:7e81:457:3997])
	(using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.solomo.de (Postfix) with ESMTPSA id 5A5F1C381A;
	Fri, 21 Sep 2012 02:03:13 +0200 (CEST)
Message-ID: <505BAEBF.7070403@smeets.im>
Date: Fri, 21 Sep 2012 02:03:11 +0200
From: Florian Smeets <flo@smeets.im>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8;
	rv:16.0) Gecko/20120905 Thunderbird/16.0
MIME-Version: 1.0
To: Bryan Drewery <bdrewery@freebsd.org>
References: <CAJ-FndCQ0YEo9_6x3g-12XEs8QmtyecwkLBX9z_sptnOUNTHrw@mail.gmail.com>
	<20120829060158.GA38721@x2.osted.lan>
	<CAJ-FndAaFv2o05MZZceT8Qr4mhPxuzrnmOZ30c3gy8=pnjjZvw@mail.gmail.com>
	<20120831052003.GA91340@x2.osted.lan>
	<CAJ-FndAaxQA8NYCFSN629XXi9zMVNyu2TuHjZLvmn3jhzRJb4w@mail.gmail.com>
	<CAJ-FndDdDVuwc=NgDeG7XiWW59-+Ls5wc2GBqbjLOLDUdUb9SA@mail.gmail.com>
	<20120905201531.GA54452@x2.osted.lan>
	<CAJ-FndCHSroZFfVgHAW8SUVZhDSaX9qix=aZnHVC_BN_fW6sgg@mail.gmail.com>
	<CAJ-FndDr5WmeKXCwSCucQ4w3hPHRBuu36YH1xiW_wKXOkKEdZg@mail.gmail.com>
	<CAJ-FndCvc+phY_g4CeGfzsj017roxs_C5adjuLuszpEPWO2+1g@mail.gmail.com>
	<20120917140055.GA9037@x2.osted.lan>
	<CAJ-FndAP9Ua6tRcbrfYY1+56O-YbJvmyaUco9K42-0hmchKD6g@mail.gmail.com>
	<505A468E.2080902@FreeBSD.org>
In-Reply-To: <505A468E.2080902@FreeBSD.org>
X-Enigmail-Version: 1.5a1pre
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature";
	boundary="------------enig856A503AD0801149A196BDD4"
Cc: FreeBSD FS <freebsd-fs@freebsd.org>, Peter Holm <pho@freebsd.org>,
	attilio@freebsd.org, freebsd-current@freebsd.org
Subject: Re: MPSAFE VFS -- List of upcoming actions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2012 00:03:16 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig856A503AD0801149A196BDD4
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 20.09.12 00:26, Bryan Drewery wrote:
> On 9/18/2012 9:48 PM, Attilio Rao wrote:
>> In addition to fusefs-kmod, Bryan and Florian have also updated
>> fusefs-lib and fusefs-ntfs ports. For instance, please refer to this
>> e-mail:
>> http://lists.freebsd.org/pipermail/freebsd-ports/2012-August/077950.ht=
ml
>>
>> Even if this work is someway independent by the fusefs-kmod import, I
>> warmly suggest to all of you to use their patches (and this what we
>> have been testing so far too).
>=20
> I have committed my updates to sysutils/fusefs-ntfs now.
>=20

The sysutils/fusefs-libs port was updated a few minutes ago.

Florian


--------------enig856A503AD0801149A196BDD4
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----

iEYEARECAAYFAlBbrsAACgkQapo8P8lCvwmtHQCfc6yPoAmqqlh5DSm/XfJ9PnmY
TAcAn1LDy4OziPr+8ydUvSLvHXKTGkrw
=ME21
-----END PGP SIGNATURE-----

--------------enig856A503AD0801149A196BDD4--

From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 00:24:48 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F3044106566B;
	Fri, 21 Sep 2012 00:24:47 +0000 (UTC)
	(envelope-from asmrookie@gmail.com)
Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com
	[209.85.217.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 546508FC0A;
	Fri, 21 Sep 2012 00:24:45 +0000 (UTC)
Received: by lbbgg13 with SMTP id gg13so3927786lbb.13
	for <multiple recipients>; Thu, 20 Sep 2012 17:24:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:reply-to:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:content-type;
	bh=6W00lKwbtN1omyhyOsTSSYofFPMlDSJIchDghmi1WT8=;
	b=mhtCa1FR/auVB5rcKLF6ozVNBHUfLguw4EbInJXRGlWUBu+yittMGF0Pc9O1mODCxM
	Ub/XeIkehjK7mYIj3VNGZqEWj9a9TjjhB7YPFRuisSnuUxUz+SmNHx0Lip8RB9+z1WCl
	r2VqKwaj86qkGwsY4Tex9aIIW1qzpwQPVNc2m0bMIfQIWJh3JYAT06JZWPDtOXsiwtsm
	ARk6QpWnKNvB/3uRl1KNOwuHDbaqtw1n+m6nQFlzxKJ6bH4hmo6gMC34KtASytfGZ1IO
	hMj69LmLdBQbLf6PLyKbcjigMsaE5FnC2uDl9rtuh7m/P3Jf+bhbYkDjA+jL6s8vjsYt
	G8bg==
MIME-Version: 1.0
Received: by 10.112.82.66 with SMTP id g2mr1189036lby.15.1348187084804; Thu,
	20 Sep 2012 17:24:44 -0700 (PDT)
Sender: asmrookie@gmail.com
Received: by 10.112.102.39 with HTTP; Thu, 20 Sep 2012 17:24:44 -0700 (PDT)
In-Reply-To: <CAJ-FndAisKoCwLkvXpmW=XhXDRH8me8fMjwrfBuWVqfoA95rmQ@mail.gmail.com>
References: <CAJ-FndCQ0YEo9_6x3g-12XEs8QmtyecwkLBX9z_sptnOUNTHrw@mail.gmail.com>
	<20120829060158.GA38721@x2.osted.lan>
	<CAJ-FndAaFv2o05MZZceT8Qr4mhPxuzrnmOZ30c3gy8=pnjjZvw@mail.gmail.com>
	<20120831052003.GA91340@x2.osted.lan>
	<CAJ-FndAaxQA8NYCFSN629XXi9zMVNyu2TuHjZLvmn3jhzRJb4w@mail.gmail.com>
	<CAJ-FndDdDVuwc=NgDeG7XiWW59-+Ls5wc2GBqbjLOLDUdUb9SA@mail.gmail.com>
	<20120905201531.GA54452@x2.osted.lan>
	<CAJ-FndCHSroZFfVgHAW8SUVZhDSaX9qix=aZnHVC_BN_fW6sgg@mail.gmail.com>
	<CAJ-FndDr5WmeKXCwSCucQ4w3hPHRBuu36YH1xiW_wKXOkKEdZg@mail.gmail.com>
	<CAJ-FndCvc+phY_g4CeGfzsj017roxs_C5adjuLuszpEPWO2+1g@mail.gmail.com>
	<20120917140055.GA9037@x2.osted.lan>
	<CAJ-FndAP9Ua6tRcbrfYY1+56O-YbJvmyaUco9K42-0hmchKD6g@mail.gmail.com>
	<CAJ-FndAisKoCwLkvXpmW=XhXDRH8me8fMjwrfBuWVqfoA95rmQ@mail.gmail.com>
Date: Fri, 21 Sep 2012 01:24:44 +0100
X-Google-Sender-Auth: e9-qkoeyV4fWH0A9ttW69Y7epXU
Message-ID: <CAJ-FndDMcKsg-54fXQSQCYC5SO1fP7YLPGy0jr80sA5T26_CCA@mail.gmail.com>
From: Attilio Rao <attilio@freebsd.org>
To: FreeBSD FS <freebsd-fs@freebsd.org>, freebsd-current@freebsd.org, 
	Peter Holm <pho@freebsd.org>,
	=?UTF-8?Q?Gustau_P=C3=A9rez?= <gperez@entel.upc.edu>, 
	George Neville-Neil <gnn@freebsd.org>, Florian Smeets <flo@freebsd.org>,
	bdrewery@freebsd.org
Content-Type: text/plain; charset=UTF-8
Cc: 
Subject: Re: MPSAFE VFS -- List of upcoming actions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: attilio@FreeBSD.org
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2012 00:24:48 -0000

On Fri, Sep 21, 2012 at 1:22 AM, Attilio Rao <attilio@freebsd.org> wrote:

[ trimm ]

>
> You can use the branch directly or this patch against -CURRENT at 240752:
> http://www.freebsd.org/~attilio/fuse_import/fuse_240752.patch
>
> In order to test this work, then, you just need to patch (or use
> directly the branch) your sources with this patch and install ports
> normally as they work.

Forgot to tell: with the new branch you *must not* install fusefs-kmod port.
Please test it from a pristine installation or double-check if your
fusefs-kmod port is completely gone (if already installed) before to
report bugs as its functionality could be tainting the branch one.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein

From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 00:28:26 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 44A03106564A;
	Fri, 21 Sep 2012 00:28:26 +0000 (UTC)
	(envelope-from asmrookie@gmail.com)
Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com
	[209.85.217.182])
	by mx1.freebsd.org (Postfix) with ESMTP id A119D8FC08;
	Fri, 21 Sep 2012 00:28:24 +0000 (UTC)
Received: by lbbgg13 with SMTP id gg13so3930523lbb.13
	for <multiple recipients>; Thu, 20 Sep 2012 17:28:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:reply-to:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:content-type;
	bh=Phz+wLnOM6ybKNOMNmFaJ5/EXOoTOgo/FllHDRXoHYA=;
	b=hU0D0yC2H1Zmdx4/fP7IOxG7TjNGhLkXE7qRwuuS3n0lmU7VW/mxkFuY+INnDttACk
	E41/0EmKeZnLNuPJOoivDd/RjxpZMVb+rTcdHx4czQWZdjG0KofliSYy+TT1sU/6RJ2y
	YXWYTzDyJgM4jcHdKkZXmry6JMU4fs0+avfvWRehcyvvgeZzPRn8boibhibx99F+mJVT
	HSwn+IWlqTW7/eUNb/gbdTPwgHS/R7WqAZBTRksy9svQNlpFnRxWss9E7KoZJd/GYlOa
	yQ5OAHM20BwUNyuk9Qsl0a73Z19ObRJjmXCg4mVvPrP/Bbo7LMpCfrIgOYlSXH8OH00L
	DSTg==
MIME-Version: 1.0
Received: by 10.152.112.233 with SMTP id it9mr2813805lab.40.1348186974838;
	Thu, 20 Sep 2012 17:22:54 -0700 (PDT)
Sender: asmrookie@gmail.com
Received: by 10.112.102.39 with HTTP; Thu, 20 Sep 2012 17:22:54 -0700 (PDT)
In-Reply-To: <CAJ-FndAP9Ua6tRcbrfYY1+56O-YbJvmyaUco9K42-0hmchKD6g@mail.gmail.com>
References: <CAJ-FndCQ0YEo9_6x3g-12XEs8QmtyecwkLBX9z_sptnOUNTHrw@mail.gmail.com>
	<20120829060158.GA38721@x2.osted.lan>
	<CAJ-FndAaFv2o05MZZceT8Qr4mhPxuzrnmOZ30c3gy8=pnjjZvw@mail.gmail.com>
	<20120831052003.GA91340@x2.osted.lan>
	<CAJ-FndAaxQA8NYCFSN629XXi9zMVNyu2TuHjZLvmn3jhzRJb4w@mail.gmail.com>
	<CAJ-FndDdDVuwc=NgDeG7XiWW59-+Ls5wc2GBqbjLOLDUdUb9SA@mail.gmail.com>
	<20120905201531.GA54452@x2.osted.lan>
	<CAJ-FndCHSroZFfVgHAW8SUVZhDSaX9qix=aZnHVC_BN_fW6sgg@mail.gmail.com>
	<CAJ-FndDr5WmeKXCwSCucQ4w3hPHRBuu36YH1xiW_wKXOkKEdZg@mail.gmail.com>
	<CAJ-FndCvc+phY_g4CeGfzsj017roxs_C5adjuLuszpEPWO2+1g@mail.gmail.com>
	<20120917140055.GA9037@x2.osted.lan>
	<CAJ-FndAP9Ua6tRcbrfYY1+56O-YbJvmyaUco9K42-0hmchKD6g@mail.gmail.com>
Date: Fri, 21 Sep 2012 01:22:54 +0100
X-Google-Sender-Auth: wZLKdGCIiTExkzPYkAJj7fAHXKw
Message-ID: <CAJ-FndAisKoCwLkvXpmW=XhXDRH8me8fMjwrfBuWVqfoA95rmQ@mail.gmail.com>
From: Attilio Rao <attilio@freebsd.org>
To: FreeBSD FS <freebsd-fs@freebsd.org>, freebsd-current@freebsd.org, 
	Peter Holm <pho@freebsd.org>,
	=?UTF-8?Q?Gustau_P=C3=A9rez?= <gperez@entel.upc.edu>, 
	George Neville-Neil <gnn@freebsd.org>, Florian Smeets <flo@freebsd.org>,
	bdrewery@freebsd.org
Content-Type: text/plain; charset=UTF-8
Cc: 
Subject: Re: MPSAFE VFS -- List of upcoming actions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: attilio@FreeBSD.org
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2012 00:28:26 -0000

On Wed, Sep 19, 2012 at 3:48 AM, Attilio Rao <attilio@freebsd.org> wrote:
> On Fri, Jul 13, 2012 at 12:18 AM, Attilio Rao <attilio@freebsd.org> wrote:
>> 2012/7/4 Attilio Rao <attilio@freebsd.org>:
>>> 2012/6/29 Attilio Rao <attilio@freebsd.org>:
>>>> As already published several times, according to the following plan:
>>>> http://wiki.freebsd.org/NONMPSAFE_DEORBIT_VFS
>>>>
>>>
>>> I still haven't heard from Vivien or Edward, anyway as NTFS is
>>> basically only used RO these days (also the mount_ntfs code just
>>> permits RO mounting) I stripped all the uncomplete/bogus write support
>>> with the following patch:
>>> http://www.freebsd.org/~attilio/ntfs_remove_write.patch
>>>
>>> This is an attempt to make the code smaller and possibly just focus on
>>> the locking that really matter (as read-only filesystem).
>>> On some points of the patch I'm a bit less sure as we could easily
>>> take into account also write for things like vaccess() arguments, and
>>> make easier to re-add correct write support at some point in the
>>> future, but still force RO, even if the approach used in the patch is
>>> more correct IMHO.
>>> As an added bonus this patch cleans some dirty code in the mount
>>> operation and fixes a bug as vfs_mountedfrom() is called before real
>>> mounting is completed and can still fail.
>>
>> A quick update on this.
>> It looks like NTFS won't be completed for this GSoC thus I seriously
>> need to find an alternative to not loose the NTFS support entirely.
>>
>> I tried to look into the NTFS implementation right now and it is
>> really a poor support. As Peter has also verified, it can deadlock in
>> no-time, it compeltely violates VFS rules, etc. IMHO it deserves a
>> complete rewrite if we would still support in-kernel NTFS. I also
>> tried to look at the NetBSD implementation. Their code is someway
>> similar to our, but they used very complicated (and very dirty) code
>> to do the locking. Even if I don't know well enough NetBSD VFS, I have
>> the impression not all the races are correctly handled. Definitively,
>> not something I would like to port.
>>
>> Considering all that the only viable option would be meaning an
>> userland filesystem implementation. My preferred choice would be to
>> import PUFFS and librefuse on top of it but honestly it requires a lot
>> of time to be completed, time which I don't currently have as in 2
>> months Giant must be gone by the VFS.
>>
>> I then decided to switch to gnn's rewamp of FUSE patches. You can find
>> his initial e-mail here:
>> http://lists.freebsd.org/pipermail/freebsd-fs/2012-March/013876.html
>>
>> I've precisely got the second version of George's patch and created
>> this dolphin branch:
>> svn://svn.freebsd.org/base/projects/fuse
>>
>> I'm fixing low hanging fruit for the moment (see r238411 for example)
>> and I still have to make a throughful review.
>> However my idea is to commit the support once:
>> - ntfs-3g is well stress-tested and proves to be bug-free
>> - there is no major/big technical issue pending after the reviews
>
> In the last weeks Peter, Florian, Gustau and I have been working in
> stabilizing fuse support. In the specific, Peter has worked hard on
> producing several utilities to nit stress-test fuse and in particular
> ntfs, Florian has improved fuse related ports (as explained later) and
> Gustau has done sparse testing. I feel moderately satisfied by the
> level of stability of fuse now to propose to wider usage, in
> particular given the huge amount of complaints I'm hearing around
> about occasional fuse users.
>
> The final target of the project is to completely import into base the
> content of fusefs-kmod starting from earlier posted patches by George.
> So far, we took care only of importing in the fuse branch the kernel
> part, so that fusefs-kmod userland part is still needed to be
> installed from ports, but I was studying the mount_fusefs licensing
> before to process with the import for the userland bits of it.
>
> The fixing has been happening here:
> svn://svn.freebsd.org/base/projects/fuse/
>
> which is essentially an HEAD branch + fuse kernel components. In order
> to get fuse, please compile a kernel from this branch with FUSE option
> or simply build and load fuse module.
> Alternatively, a kernel patch that should work with HEAD@240684 is here:
> http://www.freebsd.org/~attilio/fuse_import/fuse_240684.patch
>
> I guess the patch can easilly apply to all FreeBSD branches, really,
> but it is not tested to anything else different then -CURRENT.
>
> As said you still need currently to build fusefs-kmod port. However
> you need these further patches, to be put in the fusefs-kmod/files/
> directory::
> http://www.freebsd.org/~attilio/fuse_import/patch-Makefile
> http://www.freebsd.org/~attilio/fuse_import/patch-mount_fusefs__mount_fusefs2.c
>
> They both disable the old kernel building/linking and import new
> functionality to let the new kernel support work well in presence of
> many consumers.
>
> In addition to fusefs-kmod, Bryan and Florian have also updated
> fusefs-lib and fusefs-ntfs ports. For instance, please refer to this
> e-mail:
> http://lists.freebsd.org/pipermail/freebsd-ports/2012-August/077950.html
>
> Even if this work is someway independent by the fusefs-kmod import, I
> warmly suggest to all of you to use their patches (and this what we
> have been testing so far too.

So, after Bryan and Florian ports update, I've also committed userland
part of fusefs-kmod and now the project branch fully mirrors
functionality of fusefs-kmod. The code in projects/fuse, infact, will
also install mount_fusefs as part of the fuse support.

You can use the branch directly or this patch against -CURRENT at 240752:
http://www.freebsd.org/~attilio/fuse_import/fuse_240752.patch

In order to test this work, then, you just need to patch (or use
directly the branch) your sources with this patch and install ports
normally as they work.

If no major bugs are found before October 4th, this is the code that
is going to be committed to HEAD.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein

From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 08:05:29 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 80A50106564A
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2012 08:05:29 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id 83B718FC12
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2012 08:05:27 +0000 (UTC)
Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q8L85Thp099520;
	Fri, 21 Sep 2012 11:05:29 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id
	q8L85Gae022597; Fri, 21 Sep 2012 11:05:16 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q8L85GtC022596; 
	Fri, 21 Sep 2012 11:05:16 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Fri, 21 Sep 2012 11:05:16 +0300
From: Konstantin Belousov <kostikbel@gmail.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Message-ID: <20120921080516.GC37286@deviant.kiev.zoral.com.ua>
References: <20120919061659.GS37286@deviant.kiev.zoral.com.ua>
	<1237981048.964353.1348178126537.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="7iqzzmjEMnnYslDD"
Content-Disposition: inline
In-Reply-To: <1237981048.964353.1348178126537.JavaMail.root@erie.cs.uoguelph.ca>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00
	autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: FS List <freebsd-fs@freebsd.org>
Subject: Re: testing/review of atomic export update patch
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2012 08:05:29 -0000


--7iqzzmjEMnnYslDD
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Sep 20, 2012 at 05:55:26PM -0400, Rick Macklem wrote:
> Konstantin Belousov wrote:
> > On Tue, Sep 18, 2012 at 09:34:54AM -0400, Rick Macklem wrote:
> > > Konstantin Belousov wrote:
> > > > On Mon, Sep 17, 2012 at 05:32:44PM -0400, Rick Macklem wrote:
> > > > > Konstantin Belousov wrote:
> > > > > > On Sun, Sep 16, 2012 at 05:41:25PM -0400, Rick Macklem wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > There is a simple patch at:
> > > > > > >   http://people.freebsd.org/~rmacklem/atomic-export.patch
> > > > > > > that can be applied to a kernel + mountd, so that the new
> > > > > > > nfsd can be suspended by mountd while the exports are being
> > > > > > > reloaded. It adds a new "-S" flag to mountd to enable this.
> > > > > > > (This avoids the long standing bug where clients receive
> > > > > > > ESTALE
> > > > > > >  replies to RPCs while mountd is reloading exports.)
> > > > > >
> > > > > > This looks simple, but also somewhat worrisome. What would
> > > > > > happen
> > > > > > if the mountd crashes after nfsd suspension is requested, but
> > > > > > before
> > > > > > resume was performed ?
> > > > > >
> > > > > > Might be, mountd should check for suspended nfsd on start and
> > > > > > unsuspend
> > > > > > it, if some flag is specified ?
> > > > > Well, I think that happens with the patch as it stands.
> > > > >
> > > > > suspend is done if the "-S" option is specified, but that is a
> > > > > no op
> > > > > if it is already suspended. The resume is done no matter what
> > > > > flags
> > > > > are provided, so mountd will always try and do a "resume".
> > > > > --> get_exportlist() is always called when mountd is started up
> > > > > and
> > > > >     it does the resume unconditionally when it completes.
> > > > >     If mountd repeatedly crashes before completing
> > > > >     get_exportlist()
> > > > >     when it is started up, the exports will be all messed up, so
> > > > >     having the nfsd threads suspended doesn't seem so bad for
> > > > >     this
> > > > >     case (which hopefully never happens;-).
> > > > >
> > > > > Both suspend and resume are just no ops for unpatched kernels.
> > > > >
> > > > > Maybe the comment in front of "resume" should explicitly explain
> > > > > this, instead of saying resume is harmless to do under all
> > > > > conditions?
> > > > >
> > > > > Thanks for looking at it, rick
> > > > I see.
> > > >
> > > > My another note is that there is no any protection against
> > > > parallel
> > > > instances of suspend/resume happen. For instance, one thread could
> > > > set
> > > > suspend_nfsd =3D 1 and be descheduled, while another executes resume
> > > > code sequence meantime. Then it would see suspend_nfsd !=3D 0, while
> > > > nfsv4rootfs_lock not held, and tries to unlock it. It seems that
> > > > nfsv4_unlock would silently exit. The suspending thread resumes,
> > > > and obtains the lock. You end up with suspend_nfsd =3D=3D 0 but lock
> > > > held.
> > > Yes. I had assumed that mountd would be the only thing using these
> > > syscalls
> > > and it is single threaded. (The syscalls can only be done by root
> > > for the
> > > obvious reasons.;-)
> > >
> > > Maybe the following untested version of the syscalls would be
> > > better, since
> > > they would allow multiple concurrent calls to either suspend or
> > > resume.
> > > (There would still be an indeterminate case if one thread called
> > > resume
> > >  concurrently with another few calling suspend, but that is
> > >  unavoidable,
> > >  I think?)
> > >
> > > Again, thanks for the comments, rick
> > > --- untested version of syscalls ---
> > > 	} else if ((uap->flag & NFSSVC_SUSPENDNFSD) !=3D 0) {
> > > 		NFSLOCKV4ROOTMUTEX();
> > > 		if (suspend_nfsd =3D=3D 0) {
> > > 			/* Lock out all nfsd threads */
> > > 			igotlock =3D 0;
> > > 			while (igotlock =3D=3D 0 && suspend_nfsd =3D=3D 0) {
> > > 				igotlock =3D nfsv4_lock(&nfsv4rootfs_lock, 1,
> > > 				    NULL, NFSV4ROOTLOCKMUTEXPTR, NULL);
> > > 			}
> > > 			suspend_nfsd =3D 1;
> > > 		}
> > > 		NFSUNLOCKV4ROOTMUTEX();
> > > 		error =3D 0;
> > > 	} else if ((uap->flag & NFSSVC_RESUMENFSD) !=3D 0) {
> > > 		NFSLOCKV4ROOTMUTEX();
> > > 		if (suspend_nfsd !=3D 0) {
> > > 			nfsv4_unlock(&nfsv4rootfs_lock, 0);
> > > 			suspend_nfsd =3D 0;
> > > 		}
> > > 		NFSUNLOCKV4ROOTMUTEX();
> > > 		error =3D 0;
> > > 	}
> >=20
> > From the cursory look, this variant is an improvement, mostly by
> > taking
> > the interlock before testing suspend_nfsd, and using the while loop.
> >=20
> > Is it possible to also make the sleep for the lock interruptible ?
> > So that blocked mountd could be killed by a signal ?
> Well, it would require some coding. An extra argument to nfsv4_lock()
> to indicate to do so and then either the caller would have to check
> for a pending termination signal when it returns 0 (indicates didn't get
> lock) or a new return value to indicate EINTR. The latter would require
> all the calls to it to be changed to recognize the new 3rd return case.
> Because there are a lot of these calls, I'd tend towards just having the
> caller check for a pending signal.
>=20
> Not sure if it would make much difference though. The only time it
> would get stuck in nfsv4_lock() is if the nfsd threads are all wedged
> and in that case having mountd wedged too probably doesn't make much
> difference, since the NFS service is toast in that case anyhow.
>=20
> If you think it is worth doing, I can add that. I basically see this
> as a "stop-gap" fix until such time as something like nfse is done,
> but since I haven't the time to look at nfse right now, I have no
> idea when/if that might happen.

Ok, please go ahead with the patch. Having the patch even in its current
form is obviously better then not to have it. If the wedged mountd
appears to be annoying enough for me, I would do the change.

Thanks.

--7iqzzmjEMnnYslDD
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAlBcH7wACgkQC3+MBN1Mb4hrPwCdH6HrPJL/FeYl2hofEkPEB299
ISQAn2OuFrZuC0lpmL/lFF1xen2APSs1
=UIog
-----END PGP SIGNATURE-----

--7iqzzmjEMnnYslDD--

From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 21:11:46 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D5488106566B
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2012 21:11:46 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
	[131.104.91.44])
	by mx1.freebsd.org (Postfix) with ESMTP id 716C88FC14
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2012 21:11:45 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ap8EAJrXXFCDaFvO/2dsb2JhbAA+BxaFdbkdgiABAQQBIwRSBRYOCgICDRkCWQaIEgYLpjCSeoEhiXshhHOBEgOVZIEVjw2DA4E+Ihs
X-IronPort-AV: E=Sophos;i="4.80,465,1344225600"; d="scan'208";a="180103908"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 21 Sep 2012 17:11:38 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id BECC1B4026;
	Fri, 21 Sep 2012 17:11:38 -0400 (EDT)
Date: Fri, 21 Sep 2012 17:11:38 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Konstantin Belousov <kostikbel@gmail.com>
Message-ID: <683271364.1028517.1348261898771.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20120921080516.GC37286@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: FS List <freebsd-fs@freebsd.org>
Subject: Re: testing/review of atomic export update patch
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2012 21:11:46 -0000

Konstantin Belousov wrote:
> On Thu, Sep 20, 2012 at 05:55:26PM -0400, Rick Macklem wrote:
> > Konstantin Belousov wrote:
> > > On Tue, Sep 18, 2012 at 09:34:54AM -0400, Rick Macklem wrote:
> > > > Konstantin Belousov wrote:
> > > > > On Mon, Sep 17, 2012 at 05:32:44PM -0400, Rick Macklem wrote:
> > > > > > Konstantin Belousov wrote:
> > > > > > > On Sun, Sep 16, 2012 at 05:41:25PM -0400, Rick Macklem
> > > > > > > wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > There is a simple patch at:
> > > > > > > >   http://people.freebsd.org/~rmacklem/atomic-export.patch
> > > > > > > > that can be applied to a kernel + mountd, so that the
> > > > > > > > new
> > > > > > > > nfsd can be suspended by mountd while the exports are
> > > > > > > > being
> > > > > > > > reloaded. It adds a new "-S" flag to mountd to enable
> > > > > > > > this.
> > > > > > > > (This avoids the long standing bug where clients receive
> > > > > > > > ESTALE
> > > > > > > >  replies to RPCs while mountd is reloading exports.)
> > > > > > >
> > > > > > > This looks simple, but also somewhat worrisome. What would
> > > > > > > happen
> > > > > > > if the mountd crashes after nfsd suspension is requested,
> > > > > > > but
> > > > > > > before
> > > > > > > resume was performed ?
> > > > > > >
> > > > > > > Might be, mountd should check for suspended nfsd on start
> > > > > > > and
> > > > > > > unsuspend
> > > > > > > it, if some flag is specified ?
> > > > > > Well, I think that happens with the patch as it stands.
> > > > > >
> > > > > > suspend is done if the "-S" option is specified, but that is
> > > > > > a
> > > > > > no op
> > > > > > if it is already suspended. The resume is done no matter
> > > > > > what
> > > > > > flags
> > > > > > are provided, so mountd will always try and do a "resume".
> > > > > > --> get_exportlist() is always called when mountd is started
> > > > > > up
> > > > > > and
> > > > > >     it does the resume unconditionally when it completes.
> > > > > >     If mountd repeatedly crashes before completing
> > > > > >     get_exportlist()
> > > > > >     when it is started up, the exports will be all messed
> > > > > >     up, so
> > > > > >     having the nfsd threads suspended doesn't seem so bad
> > > > > >     for
> > > > > >     this
> > > > > >     case (which hopefully never happens;-).
> > > > > >
> > > > > > Both suspend and resume are just no ops for unpatched
> > > > > > kernels.
> > > > > >
> > > > > > Maybe the comment in front of "resume" should explicitly
> > > > > > explain
> > > > > > this, instead of saying resume is harmless to do under all
> > > > > > conditions?
> > > > > >
> > > > > > Thanks for looking at it, rick
> > > > > I see.
> > > > >
> > > > > My another note is that there is no any protection against
> > > > > parallel
> > > > > instances of suspend/resume happen. For instance, one thread
> > > > > could
> > > > > set
> > > > > suspend_nfsd = 1 and be descheduled, while another executes
> > > > > resume
> > > > > code sequence meantime. Then it would see suspend_nfsd != 0,
> > > > > while
> > > > > nfsv4rootfs_lock not held, and tries to unlock it. It seems
> > > > > that
> > > > > nfsv4_unlock would silently exit. The suspending thread
> > > > > resumes,
> > > > > and obtains the lock. You end up with suspend_nfsd == 0 but
> > > > > lock
> > > > > held.
> > > > Yes. I had assumed that mountd would be the only thing using
> > > > these
> > > > syscalls
> > > > and it is single threaded. (The syscalls can only be done by
> > > > root
> > > > for the
> > > > obvious reasons.;-)
> > > >
> > > > Maybe the following untested version of the syscalls would be
> > > > better, since
> > > > they would allow multiple concurrent calls to either suspend or
> > > > resume.
> > > > (There would still be an indeterminate case if one thread called
> > > > resume
> > > >  concurrently with another few calling suspend, but that is
> > > >  unavoidable,
> > > >  I think?)
> > > >
> > > > Again, thanks for the comments, rick
> > > > --- untested version of syscalls ---
> > > > 	} else if ((uap->flag & NFSSVC_SUSPENDNFSD) != 0) {
> > > > 		NFSLOCKV4ROOTMUTEX();
> > > > 		if (suspend_nfsd == 0) {
> > > > 			/* Lock out all nfsd threads */
> > > > 			igotlock = 0;
> > > > 			while (igotlock == 0 && suspend_nfsd == 0) {
> > > > 				igotlock = nfsv4_lock(&nfsv4rootfs_lock, 1,
> > > > 				    NULL, NFSV4ROOTLOCKMUTEXPTR, NULL);
> > > > 			}
> > > > 			suspend_nfsd = 1;
> > > > 		}
> > > > 		NFSUNLOCKV4ROOTMUTEX();
> > > > 		error = 0;
> > > > 	} else if ((uap->flag & NFSSVC_RESUMENFSD) != 0) {
> > > > 		NFSLOCKV4ROOTMUTEX();
> > > > 		if (suspend_nfsd != 0) {
> > > > 			nfsv4_unlock(&nfsv4rootfs_lock, 0);
> > > > 			suspend_nfsd = 0;
> > > > 		}
> > > > 		NFSUNLOCKV4ROOTMUTEX();
> > > > 		error = 0;
> > > > 	}
> > >
> > > From the cursory look, this variant is an improvement, mostly by
> > > taking
> > > the interlock before testing suspend_nfsd, and using the while
> > > loop.
> > >
> > > Is it possible to also make the sleep for the lock interruptible ?
> > > So that blocked mountd could be killed by a signal ?
> > Well, it would require some coding. An extra argument to
> > nfsv4_lock()
> > to indicate to do so and then either the caller would have to check
> > for a pending termination signal when it returns 0 (indicates didn't
> > get
> > lock) or a new return value to indicate EINTR. The latter would
> > require
> > all the calls to it to be changed to recognize the new 3rd return
> > case.
> > Because there are a lot of these calls, I'd tend towards just having
> > the
> > caller check for a pending signal.
> >
> > Not sure if it would make much difference though. The only time it
> > would get stuck in nfsv4_lock() is if the nfsd threads are all
> > wedged
> > and in that case having mountd wedged too probably doesn't make much
> > difference, since the NFS service is toast in that case anyhow.
> >
> > If you think it is worth doing, I can add that. I basically see this
> > as a "stop-gap" fix until such time as something like nfse is done,
> > but since I haven't the time to look at nfse right now, I have no
> > idea when/if that might happen.
> 
> Ok, please go ahead with the patch. Having the patch even in its
> current
> form is obviously better then not to have it. If the wedged mountd
> appears to be annoying enough for me, I would do the change.
> 
Ah, that's ok, I'll do it. I'll do it as a separate commit first,
since I can't see it being controversial. I'll cobble to-gether a
version of the atomic-export patch using it after that.

Have a good weekend, rick

> Thanks.

From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 21:18:21 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E5819106564A
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2012 21:18:21 +0000 (UTC)
	(envelope-from tjg@soe.ucsc.edu)
Received: from mail-pb0-f54.google.com (mail-pb0-f54.google.com
	[209.85.160.54])
	by mx1.freebsd.org (Postfix) with ESMTP id B6C168FC08
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2012 21:18:21 +0000 (UTC)
Received: by pbbrp2 with SMTP id rp2so9255341pbb.13
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2012 14:18:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=ucsc.edu; s=ucsc-google;
	h=mime-version:date:message-id:subject:from:to:content-type;
	bh=mWjJELHmgZ+JzW5187eVF+ddUokUSpsU6bSkldKOVfY=;
	b=jk4zxD+TFAzFq9jCPa6FV5y5aiGz6lBDzbXqhhwcWcj7lpsKePmB8fpBGs5DTg6B6L
	l85x+Kg91MFFQnZHafZ02ae0aFLEU7ldzgigI2cymVjFmD9uSFyE00J11MbjYLnpex8D
	v6JpdeJmJPCtq66+9+R9UNVPKORymsf9E7srI=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=google.com; s=20120113;
	h=mime-version:date:message-id:subject:from:to:content-type
	:x-gm-message-state;
	bh=mWjJELHmgZ+JzW5187eVF+ddUokUSpsU6bSkldKOVfY=;
	b=YgB7o2miftTx1VDTdIXH7QetRGVWfGtbnSW6aUBxM38s0Kv7/pvSLHGPWt1rSGiET0
	8wM1zF8zhOh2BY0J1QA+Uz/pR+GV8n2Sq8RxrEWxkkxv4NK35vL15eSPlvaW0Uw1NWvl
	22gBWH+zQ2LgCRoHme7hFox5w/fwCGxWMov/KTHpB7pho+7V/5UCyLNEW3kM24Z3jP5E
	eOPlkrFjJrJ5dwEfjLZ++WYyCPlkP8ajfFRBOVlc8uqTWvD77dYVLVO6luVUk8OOlq2a
	gLrPjoDalv592EMnvv2Si+Lsff6OgvWkFQmN6MvUYKXhwqcH9zNjOj1f746drtf9Moax
	lKKw==
MIME-Version: 1.0
Received: by 10.68.218.196 with SMTP id pi4mr18418366pbc.128.1348262301144;
	Fri, 21 Sep 2012 14:18:21 -0700 (PDT)
Received: by 10.68.25.69 with HTTP; Fri, 21 Sep 2012 14:18:21 -0700 (PDT)
Date: Fri, 21 Sep 2012 14:18:21 -0700
Message-ID: <CAG27QgQbwot1+fu=2PmQYpaqKqb8Ob8sieb28XuXJDaWbZnKqQ@mail.gmail.com>
From: Tim Gustafson <tjg@soe.ucsc.edu>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=UTF-8
X-Gm-Message-State: ALoCoQk2CUhno3qMOHE+AJg+ih/TPSqDoctWg+PStjxbVJSIYdTSCuoXXnIXVEnup/Ry6UMRHkI1
Subject: Exporting ZFS File System to Multiple Subnets
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2012 21:18:22 -0000

Hi,

I Googled around about exporting ZFS file systems to multiple subnets,
but most of what I found was years old, so I thought I'd ask to see
what the current state of things are.

We have about 2,000 file systems that we need to export to a handful
of subnets.  Most file systems are exported to the same set of
subnets.  If I were setting up /etc/exports, I would do something
like:

/export/home -alldirs -network=1.2.3.0/22
/export/home -alldirs -network=1.2.3.0/23
/export/projects -alldirs -network=4.5.6.0/22
/export/projects -alldirs -network=4.5.6.0/23

Perhaps followed by some additional lines to specify additional export
networks for specific filesystems:

/export/projects/foo -network 7.8.9.0/24
/export/projects/bar -network 9.8.7.0/24

But, FreeBSD's implementation of the "zfs sharenfs" property does not
allow multiple subnets to be specified.  And it seems that ZFS somehow
gets in the way of /etc/exports for ZFS file systems, so that if I
turn off sharenfs ("zfs inherit sharenfs tank/export; zfs inherit
sharenfs tank/projects") it actually blocks the export lines in
/etc/exports from being mounted.

So, how can I use FreeBSD and ZFS to export file systems in this way?
It seems like sharenfs is a dead end, and it also seems like
/etc/exports doesn't work because ZFS gets in the way.  This seems
like a really significant impediment to using FreeBSD as a ZFS file
server for anything other than the most basic of configurations.

Is there some magic that I can use to at least work around this
limitation for now?

Ideally, we'd just move to NFSv4, but a significant portion of our
clients are not NFSv4 ready, so that's not an option.

-- 

Tim Gustafson
tjg@soe.ucsc.edu
831-459-5354
Baskin Engineering, Room 313A

From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 21:40:46 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 26F3D106566B
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2012 21:40:46 +0000 (UTC)
	(envelope-from zeus@ibs.dn.ua)
Received: from relay.ibs.dn.ua (relay.ibs.dn.ua [91.216.196.25])
	by mx1.freebsd.org (Postfix) with ESMTP id 9614E8FC12
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2012 21:40:45 +0000 (UTC)
Received: from ibs.dn.ua (relay.ibs.dn.ua [91.216.196.25]) 
	by relay.ibs.dn.ua with ESMTP id q8LLeaKi088464;
	Sat, 22 Sep 2012 00:40:36 +0300 (EEST)
Message-ID: <20120922004036.88462@relay.ibs.dn.ua>
Date: Sat, 22 Sep 2012 00:40:36 +0300
From: Zeus Panchenko <zeus@ibs.dn.ua>
To: "Tim Gustafson" <tjg@soe.ucsc.edu>
In-reply-to: Your message of Fri, 21 Sep 2012 14:18:21 -0700
	<CAG27QgQbwot1+fu=2PmQYpaqKqb8Ob8sieb28XuXJDaWbZnKqQ@mail.gmail.com>
References: <CAG27QgQbwot1+fu=2PmQYpaqKqb8Ob8sieb28XuXJDaWbZnKqQ@mail.gmail.com>
Organization: I.B.S. LLC
X-Mailer: MH-E 8.2; GNU Mailutils 2.99.97; GNU Emacs 23.4.1
X-Face: &sReWXo3Iwtqql1[My(t1Gkx;
	y?KF@KF`4X+'9Cs@PtK^y%}^.>Mtbpyz6U=,Op:KPOT.uG
	)Nvx`=er!l?WASh7KeaGhga"1[&yz$_7ir'cVp7o%CGbJ/V)j/=]vzvvcqcZkf;
	JDurQG6wTg+?/xA go`}1.Ze//K;
	Fk&/&OoHd'[b7iGt2UO>o(YskCT[_D)kh4!yY'<&:yt+zM=A`@`~9U+P[qS:f; #9z~
	Or/Bo#N-'S'!'[3Wog'ADkyMqmGDvga?WW)qd=?)`Y&k=o}>!ST\
Cc: freebsd-fs@freebsd.org
Subject: Re: Exporting ZFS File System to Multiple Subnets
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: Zeus Panchenko <zeus@ibs.dn.ua>
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2012 21:40:46 -0000

Tim Gustafson <tjg@soe.ucsc.edu> wrote:
> 
> But, FreeBSD's implementation of the "zfs sharenfs" property does not
> allow multiple subnets to be specified.

the only way to do that I know is described here:
http://freebsd.1045724.n5.nabble.com/zfs-sharenfs-to-multiple-subnets-found-a-dirty-looking-hack-td4030378.html

looks weird but works ...

-- 
Zeus V. Panchenko				jid:zeus@im.ibs.dn.ua
IT Dpt., I.B.S. LLC					  GMT+2 (EET)

From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 21:42:31 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8089E1065670
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2012 21:42:31 +0000 (UTC)
	(envelope-from tjg@soe.ucsc.edu)
Received: from mail-pb0-f54.google.com (mail-pb0-f54.google.com
	[209.85.160.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 4D45A8FC08
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2012 21:42:31 +0000 (UTC)
Received: by pbbrp2 with SMTP id rp2so9289230pbb.13
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2012 14:42:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=ucsc.edu; s=ucsc-google;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	bh=mElFXrdCMhsC53iL6AzgD2znxJA11SAJ4//Swf7QPv4=;
	b=ZVzuUaQZG6ipkdjuTd+TWHRIpyLK0NZ0b1HwVjM39+rWcg3wBy4h22aONcIqhejL8c
	jqsPUp13r175s1SRu1oPKta5Tg7RXtEvIObBgEq15X8squcboJt7IdgP6lrDwKnxiO90
	00bB09h2b2k6piXOsyFLd8Rtkladj6tP/HzN8=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=google.com; s=20120113;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:x-gm-message-state;
	bh=mElFXrdCMhsC53iL6AzgD2znxJA11SAJ4//Swf7QPv4=;
	b=lu2/LxWPUhhUCXK1TErgkKVXjJb2On/P/wcEatIbQPb0exSc1EMuWRpleDmZXNeco7
	IrytNwcJvfB4zcNBsNOUewI5XvcgKz3WJh3f3S0ZBjjV0QlynRXJ4Hz0BYAwfwJNCsOa
	wvaJRHcVofB3QCtt9tLAW/5L4IGIVeEngDjfJUZUgQVurgDsvcNzKb8M6GD5CH21Jh9n
	qq43/RJz/pkNUMCQtLmlral6qyZC+qQtNbHuhSHi+RjjsUaLzRWBA9S7IfLixClqGKVR
	JkPEZE4mmixBoQ6oOlKUQo1SWqcvf/vWAxc7Y9WRhtfZcjde4txridD7sxAxWJNs5kI0
	U6Xw==
MIME-Version: 1.0
Received: by 10.68.222.226 with SMTP id qp2mr18608947pbc.57.1348263751009;
	Fri, 21 Sep 2012 14:42:31 -0700 (PDT)
Received: by 10.68.25.69 with HTTP; Fri, 21 Sep 2012 14:42:30 -0700 (PDT)
In-Reply-To: <20120922004036.88462@relay.ibs.dn.ua>
References: <CAG27QgQbwot1+fu=2PmQYpaqKqb8Ob8sieb28XuXJDaWbZnKqQ@mail.gmail.com>
	<20120922004036.88462@relay.ibs.dn.ua>
Date: Fri, 21 Sep 2012 14:42:30 -0700
Message-ID: <CAG27QgTYg-K553bnPffVC9FhCthCDqGWLVayBPv86NJeCAoE6Q@mail.gmail.com>
From: Tim Gustafson <tjg@soe.ucsc.edu>
To: Zeus Panchenko <zeus@ibs.dn.ua>
Content-Type: text/plain; charset=UTF-8
X-Gm-Message-State: ALoCoQn3IgV12QQF97sAtLBP3Hp1ElK6VkhZjNhor6+ZFABiXK0b+omuetLfB71ICr24S4BpX0+f
Cc: freebsd-fs@freebsd.org
Subject: Re: Exporting ZFS File System to Multiple Subnets
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2012 21:42:31 -0000

> the only way to do that I know is described here:
> http://freebsd.1045724.n5.nabble.com/zfs-sharenfs-to-multiple-subnets-found-a-dirty-looking-hack-td4030378.html

I've seen that, but it certainly feels "dirty".

-- 

Tim Gustafson
tjg@soe.ucsc.edu
831-459-5354
Baskin Engineering, Room 313A

From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 22:38:43 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 433FF106564A
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2012 22:38:43 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
	[131.104.91.44])
	by mx1.freebsd.org (Postfix) with ESMTP id D22368FC08
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2012 22:38:42 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ap8EAJXrXFCDaFvO/2dsb2JhbAA+BxaFdbkegiABAQUjBFIbDgoCAg0ZAlkGhiSBdAumMJJ4gSGJeyGEc4ESA5VkgRWPDYMDgT4JGRs
X-IronPort-AV: E=Sophos;i="4.80,465,1344225600"; d="scan'208";a="180112553"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 21 Sep 2012 18:38:41 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 91D7679463;
	Fri, 21 Sep 2012 18:38:41 -0400 (EDT)
Date: Fri, 21 Sep 2012 18:38:41 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Konstantin Belousov <kostikbel@gmail.com>
Message-ID: <1697573610.1030942.1348267121541.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20120921080516.GC37286@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: FS List <freebsd-fs@freebsd.org>
Subject: Re: testing/review of atomic export update patch
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2012 22:38:43 -0000

Konstantin Belousov wrote:
> On Thu, Sep 20, 2012 at 05:55:26PM -0400, Rick Macklem wrote:
> > Konstantin Belousov wrote:
> > > On Tue, Sep 18, 2012 at 09:34:54AM -0400, Rick Macklem wrote:
> > > > Konstantin Belousov wrote:
> > > > > On Mon, Sep 17, 2012 at 05:32:44PM -0400, Rick Macklem wrote:
> > > > > > Konstantin Belousov wrote:
> > > > > > > On Sun, Sep 16, 2012 at 05:41:25PM -0400, Rick Macklem
> > > > > > > wrote:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > There is a simple patch at:
> > > > > > > >   http://people.freebsd.org/~rmacklem/atomic-export.patch
> > > > > > > > that can be applied to a kernel + mountd, so that the
> > > > > > > > new
> > > > > > > > nfsd can be suspended by mountd while the exports are
> > > > > > > > being
> > > > > > > > reloaded. It adds a new "-S" flag to mountd to enable
> > > > > > > > this.
> > > > > > > > (This avoids the long standing bug where clients receive
> > > > > > > > ESTALE
> > > > > > > >  replies to RPCs while mountd is reloading exports.)
> > > > > > >
> > > > > > > This looks simple, but also somewhat worrisome. What would
> > > > > > > happen
> > > > > > > if the mountd crashes after nfsd suspension is requested,
> > > > > > > but
> > > > > > > before
> > > > > > > resume was performed ?
> > > > > > >
> > > > > > > Might be, mountd should check for suspended nfsd on start
> > > > > > > and
> > > > > > > unsuspend
> > > > > > > it, if some flag is specified ?
> > > > > > Well, I think that happens with the patch as it stands.
> > > > > >
> > > > > > suspend is done if the "-S" option is specified, but that is
> > > > > > a
> > > > > > no op
> > > > > > if it is already suspended. The resume is done no matter
> > > > > > what
> > > > > > flags
> > > > > > are provided, so mountd will always try and do a "resume".
> > > > > > --> get_exportlist() is always called when mountd is started
> > > > > > up
> > > > > > and
> > > > > >     it does the resume unconditionally when it completes.
> > > > > >     If mountd repeatedly crashes before completing
> > > > > >     get_exportlist()
> > > > > >     when it is started up, the exports will be all messed
> > > > > >     up, so
> > > > > >     having the nfsd threads suspended doesn't seem so bad
> > > > > >     for
> > > > > >     this
> > > > > >     case (which hopefully never happens;-).
> > > > > >
> > > > > > Both suspend and resume are just no ops for unpatched
> > > > > > kernels.
> > > > > >
> > > > > > Maybe the comment in front of "resume" should explicitly
> > > > > > explain
> > > > > > this, instead of saying resume is harmless to do under all
> > > > > > conditions?
> > > > > >
> > > > > > Thanks for looking at it, rick
> > > > > I see.
> > > > >
> > > > > My another note is that there is no any protection against
> > > > > parallel
> > > > > instances of suspend/resume happen. For instance, one thread
> > > > > could
> > > > > set
> > > > > suspend_nfsd = 1 and be descheduled, while another executes
> > > > > resume
> > > > > code sequence meantime. Then it would see suspend_nfsd != 0,
> > > > > while
> > > > > nfsv4rootfs_lock not held, and tries to unlock it. It seems
> > > > > that
> > > > > nfsv4_unlock would silently exit. The suspending thread
> > > > > resumes,
> > > > > and obtains the lock. You end up with suspend_nfsd == 0 but
> > > > > lock
> > > > > held.
> > > > Yes. I had assumed that mountd would be the only thing using
> > > > these
> > > > syscalls
> > > > and it is single threaded. (The syscalls can only be done by
> > > > root
> > > > for the
> > > > obvious reasons.;-)
> > > >
> > > > Maybe the following untested version of the syscalls would be
> > > > better, since
> > > > they would allow multiple concurrent calls to either suspend or
> > > > resume.
> > > > (There would still be an indeterminate case if one thread called
> > > > resume
> > > >  concurrently with another few calling suspend, but that is
> > > >  unavoidable,
> > > >  I think?)
> > > >
> > > > Again, thanks for the comments, rick
> > > > --- untested version of syscalls ---
> > > > 	} else if ((uap->flag & NFSSVC_SUSPENDNFSD) != 0) {
> > > > 		NFSLOCKV4ROOTMUTEX();
> > > > 		if (suspend_nfsd == 0) {
> > > > 			/* Lock out all nfsd threads */
> > > > 			igotlock = 0;
> > > > 			while (igotlock == 0 && suspend_nfsd == 0) {
> > > > 				igotlock = nfsv4_lock(&nfsv4rootfs_lock, 1,
> > > > 				    NULL, NFSV4ROOTLOCKMUTEXPTR, NULL);
> > > > 			}
> > > > 			suspend_nfsd = 1;
> > > > 		}
> > > > 		NFSUNLOCKV4ROOTMUTEX();
> > > > 		error = 0;
> > > > 	} else if ((uap->flag & NFSSVC_RESUMENFSD) != 0) {
> > > > 		NFSLOCKV4ROOTMUTEX();
> > > > 		if (suspend_nfsd != 0) {
> > > > 			nfsv4_unlock(&nfsv4rootfs_lock, 0);
> > > > 			suspend_nfsd = 0;
> > > > 		}
> > > > 		NFSUNLOCKV4ROOTMUTEX();
> > > > 		error = 0;
> > > > 	}
> > >
> > > From the cursory look, this variant is an improvement, mostly by
> > > taking
> > > the interlock before testing suspend_nfsd, and using the while
> > > loop.
> > >
> > > Is it possible to also make the sleep for the lock interruptible ?
> > > So that blocked mountd could be killed by a signal ?
> > Well, it would require some coding. An extra argument to
> > nfsv4_lock()
> > to indicate to do so and then either the caller would have to check
> > for a pending termination signal when it returns 0 (indicates didn't
> > get
> > lock) or a new return value to indicate EINTR. The latter would
> > require
> > all the calls to it to be changed to recognize the new 3rd return
> > case.
> > Because there are a lot of these calls, I'd tend towards just having
> > the
> > caller check for a pending signal.
> >
> > Not sure if it would make much difference though. The only time it
> > would get stuck in nfsv4_lock() is if the nfsd threads are all
> > wedged
> > and in that case having mountd wedged too probably doesn't make much
> > difference, since the NFS service is toast in that case anyhow.
> >
> > If you think it is worth doing, I can add that. I basically see this
> > as a "stop-gap" fix until such time as something like nfse is done,
> > but since I haven't the time to look at nfse right now, I have no
> > idea when/if that might happen.
> 
> Ok, please go ahead with the patch. Having the patch even in its
> current
> form is obviously better then not to have it. If the wedged mountd
> appears to be annoying enough for me, I would do the change.
> 
> Thanks.
Oops, I spoke too soon. When I took a look at the code, I realized that
having nfsv4_lock() return when a pending signal interrupts the msleep()
isn't easy. As such, I think I'll leave it out of the patch for now.

For those who find these things interesting, the reason the above is
hard is the funny intentional semantics that nfsv4_lock() implements.
Most of the time, the nfsd threads can concurrently handle the NFSv4
state structures, using a mutex to serialize access to the lists and
never sleeping while doing so. However, there are a few case (mainly
delegation recall) where sleeping and knowing that no other thread
is modifying the lists is necessary.

As such, nfsv4_lock() will be called by potentially many nfsd threads
(up to 200+) wanting this exclusive sleep lock. However, it is coded
so that only the first one that wakes up after the shared locks (I call
it a ref count in the code) have been released, gets the exclusive
lock. The rest of the threads wake up after the first thread releases
the exclusive lock, but simply return without getting the lock.
This avoids up to 200+ threads getting the exclusive lock in turn and
then going "oh, I don't need it since that other thread already got
the work done" so it releases the exclusive lock and gets a shared one.
(It also implements the exclusive lock as having priority over the shared
 lock request, so that nfsv4_lock() won't wait indefinitely for the
 exclusive lock.)

The above is done by having the first thread that wakes up once the
shared locks are released clear the "want an exclusive lock" flag
as it acquires it.

If a call were to return due to a signal, it wouldn't know if it
should clear the "want an exclusive lock" flag or not, since it
wouldn't know if other threads currently want it or not.

This could probably be fixed by adding a count of how many threads
are currently sleeping, waiting for the "want an exclusive flag",
but that's too scary for me to do, unless it really is needed.
(As you might have guessed, it's pretty easy to break this in
 subtle ways and I'm a chicken;-)

rick


From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 22:38:51 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 89E7B10656EA
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2012 22:38:51 +0000 (UTC)
	(envelope-from tjg@soe.ucsc.edu)
Received: from mail-pb0-f54.google.com (mail-pb0-f54.google.com
	[209.85.160.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 55BD18FC0C
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2012 22:38:50 +0000 (UTC)
Received: by pbbrp2 with SMTP id rp2so9362830pbb.13
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2012 15:38:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=ucsc.edu; s=ucsc-google;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:content-type; bh=c1EtDBKD0Udx6sF0Wq+8Z+23Wv+DedvHZOtenLL/Gq0=;
	b=W3jlHWK/R4Vs/SBtjwRIC9Wo9booiD5bIITudW+r1T6/v0ng8KZYQKaf1Q25n/sK/W
	H/hzRyjMp1MtnX7gTTA5nZf0yHN1eJj+OlG8ZIiWvfTdfw3KgqI47Hs9gIvwowWjavXn
	rKaaaEdQQ6LlBMqSrfieVFOuWeub9HE+YVGwQ=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=google.com; s=20120113;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:content-type:x-gm-message-state;
	bh=c1EtDBKD0Udx6sF0Wq+8Z+23Wv+DedvHZOtenLL/Gq0=;
	b=WmBPpXqG6RlxARqwnWK0zaeV1dY8jphRCsafxcdsejWH3xE+SVQlyIN/9UOC9bkIwS
	+J3/0MCrwGZipY430SUp9Od4rNBWdTRSEFtLo94iPR6ig4MBHYmQQh3mp9gFIIjXsMiJ
	b2zPzaOoME+8Xyt0EqOmD/F+X+ItvrIrl/9XsZ+Dad94A8D7ezgOJAF5e824z3WzJKzB
	7HgwIPIg6pqr9gUG3wN0P0XsJLApoTQVIl+dxf9eJ56s3OHUBS6ley+U1UnDy2YlH18U
	+pdOgxmF74mXWoAAWbojPjERUYLxIHpqznB11un0hGeJzWpccV0AjGb3K1sSlPov2zs0
	m4Ng==
MIME-Version: 1.0
Received: by 10.66.85.4 with SMTP id d4mr16460771paz.11.1348267130543; Fri, 21
	Sep 2012 15:38:50 -0700 (PDT)
Received: by 10.68.25.69 with HTTP; Fri, 21 Sep 2012 15:38:50 -0700 (PDT)
In-Reply-To: <CAG27QgTYg-K553bnPffVC9FhCthCDqGWLVayBPv86NJeCAoE6Q@mail.gmail.com>
References: <CAG27QgQbwot1+fu=2PmQYpaqKqb8Ob8sieb28XuXJDaWbZnKqQ@mail.gmail.com>
	<20120922004036.88462@relay.ibs.dn.ua>
	<CAG27QgTYg-K553bnPffVC9FhCthCDqGWLVayBPv86NJeCAoE6Q@mail.gmail.com>
Date: Fri, 21 Sep 2012 15:38:50 -0700
Message-ID: <CAG27QgT-Rr6B+BZ5LUs6++g_BEMoArKBxJW+dOYiJ7yCrenFFw@mail.gmail.com>
From: Tim Gustafson <tjg@soe.ucsc.edu>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=UTF-8
X-Gm-Message-State: ALoCoQkpIKFaOmi9tMDWGf/GdCZqE32XYTIPwxcBflcrfrA1LQIqtsTnFEKYf5V7wk/YA3WLQV+/
Subject: Re: Exporting ZFS File System to Multiple Subnets
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2012 22:38:51 -0000

> the only way to do that I know is described here:
> http://freebsd.1045724.n5.nabble.com/zfs-sharenfs-to-multiple-subnets-found-a-dirty-looking-hack-td4030378.html
>
> I've seen that, but it certainly feels "dirty".

Wouldn't it just be better to disable the "sharenfs" property
altogether, or make it a non-operational property, and allow regular
/etc/exports rules to work?

Or perhaps have a psuedo-value for the "sharenfs" property that would
enable normal /etc/exports processing?  Something like:

zfs set sharenfs=exports tank

I'd rather edit /etc/exports by hand anyhow.

-- 

Tim Gustafson
tjg@soe.ucsc.edu
831-459-5354
Baskin Engineering, Room 313A

From owner-freebsd-fs@FreeBSD.ORG  Sat Sep 22 12:54:09 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 026161065672;
	Sat, 22 Sep 2012 12:54:09 +0000 (UTC) (envelope-from flo@smeets.im)
Received: from mail.solomo.de (mail.solomo.de [85.214.62.193])
	by mx1.freebsd.org (Postfix) with ESMTP id 827CC8FC1B;
	Sat, 22 Sep 2012 12:54:08 +0000 (UTC)
Received: from mail.solomo.de (localhost [127.0.0.1])
	by mail.solomo.de (Postfix) with ESMTP id 3B148C382A;
	Sat, 22 Sep 2012 14:54:01 +0200 (CEST)
X-Virus-Scanned: amavisd-new at solomo.de
Received: from mail.solomo.de ([127.0.0.1])
	by mail.solomo.de (mail.solomo.de [127.0.0.1]) (amavisd-new, port 10024)
	with LMTP id 8fa5t4JIKIkT; Sat, 22 Sep 2012 14:54:00 +0200 (CEST)
Received: from nibbler-osx-wlan.fritz.box (unknown
	[IPv6:2001:4dd0:ff00:8bb6:54b8:e77b:246:f66e])
	(using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.solomo.de (Postfix) with ESMTPSA id 695C8C3833;
	Sat, 22 Sep 2012 14:54:00 +0200 (CEST)
Message-ID: <505DB4E6.8030407@smeets.im>
Date: Sat, 22 Sep 2012 14:53:58 +0200
From: Florian Smeets <flo@smeets.im>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8;
	rv:16.0) Gecko/20120905 Thunderbird/16.0
MIME-Version: 1.0
To: FreeBSD FS <freebsd-fs@freebsd.org>
X-Enigmail-Version: 1.5a1pre
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature";
	boundary="------------enigDB4E30752B92537383E0FAEF"
Subject: panic: _sx_xlock_hard: recursed on non-recursive sx
 zfsvfs->z_hold_mtx[i]
 @ ...cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1407
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 22 Sep 2012 12:54:09 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigDB4E30752B92537383E0FAEF
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi,

I hit the above mentioned panic quite frequently on recent versions of
head (r240806). This happens when building packages in the ports
tinderbox which uses nullfs and zfs extensively. Kib had a look at it
and suspects that his recent nullfs changes expose a bug in zfs.

The backtrace is as follows:

#0  doadump (textdump=3D1) at
/usr/home/flo/dev/checkouts/svn-src/sys/kern/kern_shutdown.c:266
#1  0xffffffff804c6a64 in kern_reboot (howto=3D260) at
/usr/home/flo/dev/checkouts/svn-src/sys/kern/kern_shutdown.c:449
#2  0xffffffff804c648a in panic (fmt=3D0x0) at
/usr/home/flo/dev/checkouts/svn-src/sys/kern/kern_shutdown.c:637
#3  0xffffffff804ce6e5 in _sx_xlock_hard (sx=3DVariable "sx" is not avail=
able.
) at /usr/home/flo/dev/checkouts/svn-src/sys/kern/kern_sx.c:523
#4  0xffffffff804ce77e in _sx_xlock (sx=3DVariable "sx" is not available.=

) at sx.h:152
#5  0xffffffff80e17533 in zfs_zinactive (zp=3D0xfffffe011951ec80) at
/usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/op=
ensolaris/uts/common/fs/zfs/zfs_znode.c:1407
#6  0xffffffff80e45366 in zfs_inactive (vp=3D0xfffffe019bdfad90,
cr=3DVariable "cr" is not available.
) at
/usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/op=
ensolaris/uts/common/fs/zfs/zfs_vnops.c:4590
#7  0xffffffff80e4552a in zfs_freebsd_inactive (ap=3DVariable "ap" is not=

available.
) at
/usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/op=
ensolaris/uts/common/fs/zfs/zfs_vnops.c:6102
#8  0xffffffff8070aae7 in VOP_INACTIVE_APV (vop=3D0xffffffff80eb5fe0,
a=3D0xffffff89092d3d20) at vnode_if.c:1863
#9  0xffffffff8055e3b7 in vinactive (vp=3D0xfffffe019bdfad90,
td=3D0xfffffe0017bad900) at vnode_if.h:807
#10 0xffffffff80562526 in vputx (vp=3D0xfffffe019bdfad90, func=3D2) at
/usr/home/flo/dev/checkouts/svn-src/sys/kern/vfs_subr.c:2290
#11 0xffffffff80d8a5f0 in null_reclaim (ap=3DVariable "ap" is not availab=
le.
) at
/usr/home/flo/dev/checkouts/svn-src/sys/modules/nullfs/../../fs/nullfs/nu=
ll_vnops.c:706
#12 0xffffffff8070a9d7 in VOP_RECLAIM_APV (vop=3D0xffffffff80d8b180,
a=3D0xffffff89092d3e60) at vnode_if.c:1926
#13 0xffffffff8055f64d in vgonel (vp=3D0xfffffe019bdb73e0) at vnode_if.h:=
830
#14 0xffffffff80561815 in vnlru_free (count=3D1) at
/usr/home/flo/dev/checkouts/svn-src/sys/kern/vfs_subr.c:931
#15 0xffffffff80561b1f in getnewvnode (tag=3D0xffffffff80eae0f3 "zfs",
mp=3D0xfffffe0010dc3cc0, vops=3D0xffffffff80eb5fe0, vpp=3D0xffffff89092d3=
f88)
    at /usr/home/flo/dev/checkouts/svn-src/sys/kern/vfs_subr.c:953
#16 0xffffffff80e168b5 in zfs_znode_cache_constructor
(buf=3D0xfffffe019b437af0, arg=3DVariable "arg" is not available.
)
    at
/usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/op=
ensolaris/uts/common/fs/zfs/zfs_znode.c:135
#17 0xffffffff80e189cc in zfs_znode_alloc (zfsvfs=3D0xfffffe0010de4000,
db=3D0xfffffe048c138000, blksz=3D0, obj_type=3DDMU_OT_SA, hdl=3D0xfffffe0=
19b441cd0)
    at
/usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/op=
ensolaris/uts/common/fs/zfs/zfs_znode.c:663
#18 0xffffffff80e19b65 in zfs_mknode (dzp=3D0xfffffe00b84dd7d0,
vap=3D0xffffff89092d4740, tx=3D0xfffffe0303916600, cr=3D0xfffffe000c668e0=
0,
flag=3D0, zpp=3D0xffffff89092d46a0, acl_ids=3D0xffffff89092d4670)
    at
/usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/op=
ensolaris/uts/common/fs/zfs/zfs_znode.c:1012
#19 0xffffffff80e46d6f in zfs_freebsd_create (ap=3DVariable "ap" is not
available.
) at
/usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/op=
ensolaris/uts/common/fs/zfs/zfs_vnops.c:1657
#20 0xffffffff8070cef1 in VOP_CREATE_APV (vop=3D0xffffffff80eb5fe0,
a=3D0xffffff89092d47f0) at vnode_if.c:250
#21 0xffffffff8056f569 in vn_open_cred (ndp=3D0xffffff89092d4880,
flagp=3D0xffffff89092d487c, cmode=3DVariable "cmode" is not available.
) at vnode_if.h:109
#22 0xffffffff80569236 in kern_openat (td=3D0xfffffe0017bad900, fd=3D-100=
,
path=3D0x801c2b300 <Address 0x801c2b300 out of bounds>, pathseg=3DVariabl=
e
"pathseg" is not available.
)
    at /usr/home/flo/dev/checkouts/svn-src/sys/kern/vfs_syscalls.c:1134
#23 0xffffffff806b8329 in amd64_syscall (td=3D0xfffffe0017bad900,
traced=3D0) at subr_syscall.c:135
#24 0xffffffff806a2eb7 in Xfast_syscall () at
/usr/home/flo/dev/checkouts/svn-src/sys/amd64/amd64/exception.S:387
#25 0x00000008017702ec in ?? ()
Previous frame inner to this frame (corrupt stack?)

I have the vmcore and kernel symbols, so if someone wants to know more I
should be able to provide further data.

Florian


--------------enigDB4E30752B92537383E0FAEF
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----

iEYEARECAAYFAlBdtOcACgkQapo8P8lCvwmDrQCg4X40ttRVkbrjx/cbKmNv+oHY
sGQAoK8mpzOUgJYVlTaCZLLGneRlMfBe
=ZGUd
-----END PGP SIGNATURE-----

--------------enigDB4E30752B92537383E0FAEF--

From owner-freebsd-fs@FreeBSD.ORG  Sat Sep 22 13:33:59 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id EEE951065670;
	Sat, 22 Sep 2012 13:33:58 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id E1A668FC08;
	Sat, 22 Sep 2012 13:33:57 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA06629;
	Sat, 22 Sep 2012 16:33:55 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
	by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1TFPqE-000NLt-W9; Sat, 22 Sep 2012 16:33:55 +0300
Message-ID: <505DBE41.20303@FreeBSD.org>
Date: Sat, 22 Sep 2012 16:33:53 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:15.0) Gecko/20120913 Thunderbird/15.0.1
MIME-Version: 1.0
To: FreeBSD FS <freebsd-fs@FreeBSD.org>
References: <505DB4E6.8030407@smeets.im>
In-Reply-To: <505DB4E6.8030407@smeets.im>
X-Enigmail-Version: 1.4.3
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: Florian Smeets <flo@smeets.im>, Pawel Jakub Dawidek <pjd@FreeBSD.org>
Subject: Re: panic: _sx_xlock_hard: recursed on non-recursive sx
 zfsvfs->z_hold_mtx[i]
 @ ...cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1407
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 22 Sep 2012 13:33:59 -0000

on 22/09/2012 15:53 Florian Smeets said the following:
> Hi,
> 
> I hit the above mentioned panic quite frequently on recent versions of head
> (r240806). This happens when building packages in the ports tinderbox which
> uses nullfs and zfs extensively. Kib had a look at it and suspects that his
> recent nullfs changes expose a bug in zfs.
> 
> The backtrace is as follows:

Since getnewvnode() can call vnlru_free() the call flow can recurse back into
fs code.  So it's dangerous in general to hold any fs locks around getnewvnode
call, as kib advises.  In this case it was a nullfs vnode that caused
recursion into zfs, but it could have been a zfs vnode.  The only thing
required for a panic is a hash collision of zfs object id, so that the same
z_hold_mtx is used.

But I imagine that it would be quite tough to drop z_hold_mtx in
zfs_znode_cache_constructor.

> #0  doadump (textdump=1) at 
> /usr/home/flo/dev/checkouts/svn-src/sys/kern/kern_shutdown.c:266 #1
> 0xffffffff804c6a64 in kern_reboot (howto=260) at 
> /usr/home/flo/dev/checkouts/svn-src/sys/kern/kern_shutdown.c:449 #2
> 0xffffffff804c648a in panic (fmt=0x0) at 
> /usr/home/flo/dev/checkouts/svn-src/sys/kern/kern_shutdown.c:637 #3
> 0xffffffff804ce6e5 in _sx_xlock_hard (sx=Variable "sx" is not available. )
> at /usr/home/flo/dev/checkouts/svn-src/sys/kern/kern_sx.c:523 #4
> 0xffffffff804ce77e in _sx_xlock (sx=Variable "sx" is not available. ) at
> sx.h:152 #5  0xffffffff80e17533 in zfs_zinactive (zp=0xfffffe011951ec80)
> at 
> /usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1407
>
> 
#6  0xffffffff80e45366 in zfs_inactive (vp=0xfffffe019bdfad90,
> cr=Variable "cr" is not available. ) at 
> /usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4590
>
> 
#7  0xffffffff80e4552a in zfs_freebsd_inactive (ap=Variable "ap" is not
> available. ) at 
> /usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:6102
>
> 
#8  0xffffffff8070aae7 in VOP_INACTIVE_APV (vop=0xffffffff80eb5fe0,
> a=0xffffff89092d3d20) at vnode_if.c:1863 #9  0xffffffff8055e3b7 in
> vinactive (vp=0xfffffe019bdfad90, td=0xfffffe0017bad900) at vnode_if.h:807 
> #10 0xffffffff80562526 in vputx (vp=0xfffffe019bdfad90, func=2) at 
> /usr/home/flo/dev/checkouts/svn-src/sys/kern/vfs_subr.c:2290 #11
> 0xffffffff80d8a5f0 in null_reclaim (ap=Variable "ap" is not available. )
> at 
> /usr/home/flo/dev/checkouts/svn-src/sys/modules/nullfs/../../fs/nullfs/null_vnops.c:706
>
> 
#12 0xffffffff8070a9d7 in VOP_RECLAIM_APV (vop=0xffffffff80d8b180,
> a=0xffffff89092d3e60) at vnode_if.c:1926 #13 0xffffffff8055f64d in vgonel
> (vp=0xfffffe019bdb73e0) at vnode_if.h:830 #14 0xffffffff80561815 in
> vnlru_free (count=1) at 
> /usr/home/flo/dev/checkouts/svn-src/sys/kern/vfs_subr.c:931 #15
> 0xffffffff80561b1f in getnewvnode (tag=0xffffffff80eae0f3 "zfs", 
> mp=0xfffffe0010dc3cc0, vops=0xffffffff80eb5fe0, vpp=0xffffff89092d3f88) at
> /usr/home/flo/dev/checkouts/svn-src/sys/kern/vfs_subr.c:953 #16
> 0xffffffff80e168b5 in zfs_znode_cache_constructor (buf=0xfffffe019b437af0,
> arg=Variable "arg" is not available. ) at 
> /usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:135
>
> 
#17 0xffffffff80e189cc in zfs_znode_alloc (zfsvfs=0xfffffe0010de4000,
> db=0xfffffe048c138000, blksz=0, obj_type=DMU_OT_SA,
> hdl=0xfffffe019b441cd0) at 
> /usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:663
>
> 
#18 0xffffffff80e19b65 in zfs_mknode (dzp=0xfffffe00b84dd7d0,
> vap=0xffffff89092d4740, tx=0xfffffe0303916600, cr=0xfffffe000c668e00, 
> flag=0, zpp=0xffffff89092d46a0, acl_ids=0xffffff89092d4670) at 
> /usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1012
>
> 
#19 0xffffffff80e46d6f in zfs_freebsd_create (ap=Variable "ap" is not
> available. ) at 
> /usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1657
>
> 
#20 0xffffffff8070cef1 in VOP_CREATE_APV (vop=0xffffffff80eb5fe0,
> a=0xffffff89092d47f0) at vnode_if.c:250 #21 0xffffffff8056f569 in
> vn_open_cred (ndp=0xffffff89092d4880, flagp=0xffffff89092d487c,
> cmode=Variable "cmode" is not available. ) at vnode_if.h:109 #22
> 0xffffffff80569236 in kern_openat (td=0xfffffe0017bad900, fd=-100, 
> path=0x801c2b300 <Address 0x801c2b300 out of bounds>, pathseg=Variable 
> "pathseg" is not available. ) at
> /usr/home/flo/dev/checkouts/svn-src/sys/kern/vfs_syscalls.c:1134 #23
> 0xffffffff806b8329 in amd64_syscall (td=0xfffffe0017bad900, traced=0) at
> subr_syscall.c:135 #24 0xffffffff806a2eb7 in Xfast_syscall () at 
> /usr/home/flo/dev/checkouts/svn-src/sys/amd64/amd64/exception.S:387 #25
> 0x00000008017702ec in ?? () Previous frame inner to this frame (corrupt
> stack?)
> 
> I have the vmcore and kernel symbols, so if someone wants to know more I 
> should be able to provide further data.
> 
> Florian
> 


-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Sat Sep 22 16:20:58 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 9E25C1065678;
	Sat, 22 Sep 2012 16:20:58 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 35A438FC22;
	Sat, 22 Sep 2012 16:20:56 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA07394;
	Sat, 22 Sep 2012 19:20:55 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
	by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1TFSRr-000NSw-H9; Sat, 22 Sep 2012 19:20:55 +0300
Message-ID: <505DE566.2080307@FreeBSD.org>
Date: Sat, 22 Sep 2012 19:20:54 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:15.0) Gecko/20120913 Thunderbird/15.0.1
MIME-Version: 1.0
To: freebsd-fs@FreeBSD.org
X-Enigmail-Version: 1.4.3
Content-Type: text/plain; charset=X-VIET-VPS
Content-Transfer-Encoding: 7bit
Cc: 
Subject: lszfs command for loader
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 22 Sep 2012 16:20:58 -0000


Please find a patch that implements lszfs loader command.
The command can list child filesystems of a specified filesystem (including root
dataset).
The command is really simplistic, a list goes directly to console, so there is
no filtering of hidden filesystem names etc.

The command is intended to facilitate recovery on systems that use "Boot
Environments" approach for boot/root filesystem.

http://people.freebsd.org/~avg/lszfs.diff

diff --git a/sys/boot/i386/loader/main.c b/sys/boot/i386/loader/main.c
index 80c8178..84ae713 100644
--- a/sys/boot/i386/loader/main.c
+++ b/sys/boot/i386/loader/main.c
@@ -330,6 +330,29 @@ command_heap(int argc, char *argv[])
     return(CMD_OK);
 }

+#ifdef LOADER_ZFS_SUPPORT
+COMMAND_SET(lszfs, "lszfs", "list child datasets of a zfs dataset",
+    command_lszfs);
+
+static int
+command_lszfs(int argc, char *argv[])
+{
+    int err;
+
+    if (argc != 2) {
+	command_errmsg = "wrong number of arguments";
+	return (CMD_ERROR);
+    }
+
+    err = zfs_list(argv[1]);
+    if (err != 0) {
+	command_errmsg = strerror(err);
+	return (CMD_ERROR);
+    }
+    return (CMD_OK);
+}
+#endif
+
 /* ISA bus access functions for PnP. */
 static int
 isa_inb(int port)
diff --git a/sys/boot/zfs/libzfs.h b/sys/boot/zfs/libzfs.h
index 7ad3a72..6834f8b 100644
--- a/sys/boot/zfs/libzfs.h
+++ b/sys/boot/zfs/libzfs.h
@@ -61,6 +61,7 @@ int	zfs_parsedev(struct zfs_devdesc *dev, const char *devspec,
 		     const char **path);
 char	*zfs_fmtdev(void *vdev);
 int	zfs_probe_dev(const char *devname, uint64_t *pool_guid);
+int	zfs_list(const char *name);

 extern struct devsw zfs_dev;
 extern struct fs_ops zfs_fsops;
diff --git a/sys/boot/zfs/zfs.c b/sys/boot/zfs/zfs.c
index eb8833f..3fc5f50 100644
--- a/sys/boot/zfs/zfs.c
+++ b/sys/boot/zfs/zfs.c
@@ -658,3 +658,38 @@ zfs_fmtdev(void *vdev)
 		    rootname);
 	return (buf);
 }
+
+int
+zfs_list(const char *name)
+{
+	static char	poolname[ZFS_MAXNAMELEN];
+	uint64_t	objid;
+	spa_t		*spa;
+	const char	*dsname;
+	int		len;
+	int		rv;
+
+	len = strlen(name);
+	dsname = strchr(name, '/');
+	if (dsname != NULL) {
+		len = dsname - name;
+		dsname++;
+	}
+	memcpy(poolname, name, len);
+	poolname[len] = '\0';
+
+	spa = spa_find_by_name(poolname);
+	if (!spa)
+		return (ENXIO);
+	rv = zfs_spa_init(spa);
+	if (rv != 0)
+		return (rv);
+	if (dsname != NULL)
+		rv = zfs_lookup_dataset(spa, dsname, &objid);
+	else
+		rv = zfs_get_root(spa, &objid);
+	if (rv != 0)
+		return (rv);
+	rv = zfs_list_dataset(spa, objid);
+	return (0);
+}
diff --git a/sys/boot/zfs/zfsimpl.c b/sys/boot/zfs/zfsimpl.c
index 219d7af..18f5d9a 100644
--- a/sys/boot/zfs/zfsimpl.c
+++ b/sys/boot/zfs/zfsimpl.c
@@ -1415,8 +1415,6 @@ zap_lookup(const spa_t *spa, const dnode_phys_t *dnode,
const char *name, uint64
 	return (EIO);
 }

-#ifdef BOOT2
-
 /*
  * List a microzap directory. Assumes that the zap scratch buffer contains
  * the directory contents.
@@ -1541,8 +1539,6 @@ zap_list(const spa_t *spa, const dnode_phys_t *dnode)
 		return fzap_list(spa, dnode);
 }

-#endif
-
 static int
 objset_get_dnode(const spa_t *spa, const objset_phys_t *os, uint64_t objnum,
dnode_phys_t *dnode)
 {
@@ -1779,6 +1775,38 @@ zfs_lookup_dataset(const spa_t *spa, const char *name,
uint64_t *objnum)
 	return (0);
 }

+#ifndef BOOT2
+static int
+zfs_list_dataset(const spa_t *spa, uint64_t objnum/*, int pos, char *entry*/)
+{
+	uint64_t dir_obj, child_dir_zapobj;
+	dnode_phys_t child_dir_zap, dir, dataset;
+	dsl_dataset_phys_t *ds;
+	dsl_dir_phys_t *dd;
+
+	if (objset_get_dnode(spa, &spa->spa_mos, objnum, &dataset)) {
+		printf("ZFS: can't find dataset %ju\n", (uintmax_t)objnum);
+		return (EIO);
+	}
+	ds = (dsl_dataset_phys_t *) &dataset.dn_bonus;
+	dir_obj = ds->ds_dir_obj;
+
+	if (objset_get_dnode(spa, &spa->spa_mos, dir_obj, &dir)) {
+		printf("ZFS: can't find dirobj %ju\n", (uintmax_t)dir_obj);
+		return (EIO);
+	}
+	dd = (dsl_dir_phys_t *)&dir.dn_bonus;
+
+	child_dir_zapobj = dd->dd_child_dir_zapobj;
+	if (objset_get_dnode(spa, &spa->spa_mos, child_dir_zapobj, &child_dir_zap) != 0) {
+		printf("ZFS: can't find child zap %ju\n", (uintmax_t)dir_obj);
+		return (EIO);
+	}
+
+	return (zap_list(spa, &child_dir_zap) != 0);
+}
+#endif
+
 /*
  * Find the object set given the object number of its dataset object
  * and return its details in *objset

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Sat Sep 22 16:28:09 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 5194A106564A;
	Sat, 22 Sep 2012 16:28:09 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 1B2998FC08;
	Sat, 22 Sep 2012 16:28:07 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA07410;
	Sat, 22 Sep 2012 19:28:06 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
	by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1TFSYo-000NTB-7M; Sat, 22 Sep 2012 19:28:06 +0300
Message-ID: <505DE715.8020806@FreeBSD.org>
Date: Sat, 22 Sep 2012 19:28:05 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:15.0) Gecko/20120913 Thunderbird/15.0.1
MIME-Version: 1.0
To: freebsd-fs@FreeBSD.org
X-Enigmail-Version: 1.4.3
Content-Type: text/plain; charset=X-VIET-VPS
Content-Transfer-Encoding: 7bit
Cc: 
Subject: zfs: allow to mount root from a pool not in zpool.cache
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 22 Sep 2012 16:28:09 -0000


Currently FreeBSD ZFS kernel code doesn't allow to mount root filesystem on a
pool that is not listed in zpool.cache as only pools from the cache are known to
ZFS at that time.

This patch is an attempt to improve the behavior:
http://people.freebsd.org/~avg/spa_import_rootpool.diff

This could be useful when importing pools that were exported from other systems.
There is a tunable vfs.zfs.rootpool.prefer_cached_config which is set to 1 by
default.  1 means just use a cached pool config if it's found in the cache, 0
means to re-probe disks and read supposedly latest/actual config in any case.

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Sat Sep 22 16:49:38 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 3C056106564A;
	Sat, 22 Sep 2012 16:49:38 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 587368FC08;
	Sat, 22 Sep 2012 16:49:36 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA07478;
	Sat, 22 Sep 2012 19:49:35 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
	by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1TFStb-000NTq-6o; Sat, 22 Sep 2012 19:49:35 +0300
Message-ID: <505DEC1C.4000305@FreeBSD.org>
Date: Sat, 22 Sep 2012 19:49:32 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:15.0) Gecko/20120913 Thunderbird/15.0.1
MIME-Version: 1.0
To: freebsd-fs@FreeBSD.org
X-Enigmail-Version: 1.4.3
Content-Type: text/plain; charset=X-VIET-VPS
Content-Transfer-Encoding: 7bit
Cc: 
Subject: znextboot: nextboot-like tool for zfs at zfsboot level
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 22 Sep 2012 16:49:38 -0000


Please find here a patchset that implement znextboot, a nextboot-like tool for
zfs at zfsboot level:
http://people.freebsd.org/~avg/znextboot.diff

Theory of operation.
zfsboot, through loader, exports to kernel environment the GUIDs of the very
first pool it found ("primary pool") and the very first leaf vdev of that pool
("primary vdev").  Note that the primary pool is not necessarily a boot pool or
a root pool, since a user can switch between pools and filesystems at various
stages: zfsboot, zfsloader, rootfs specification.
znextboot is a new tool that simply passes zfsboot/boot2 options to kernel ZFS
via ioctl.  Kernel ZFS writes the options as a NUL terminated ASCII string to
the Pad2 area of the primary vdev of the primary pool.  The Pad2 area has been
known as "Boot Block Header" before.  Its use was never formalized.  Peviously
it used to contain a special header (with zero useful information), now ZFS just
zeroes it out.
So, upon reboot zfsboot reads options from that area and zeros the area.

The tool is intended for remote management of systems that use approaches
similar to "Boot Environments".
It is implemented at zfsboot level as opposed to loader level, because it was
easier.  My skills weren't sufficient to integrate the ZFS logic with loader's
nextboot logic implemented in Forth.

Some problematic areas in the current patchset:
- I used just the next number for the nextboot ioctl.  This will result in
conflict when a new ioctl is added upstream.  We need to think about reserving a
range for OS-specific ioctls.
- znextboot userland utility currently lacks any documentation.
- znextboot lacks any sanity checking / validation for arguments that are passed
to it.
- probably more...

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Sat Sep 22 17:03:30 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E3E4F1065673
	for <freebsd-fs@FreeBSD.org>; Sat, 22 Sep 2012 17:03:30 +0000 (UTC)
	(envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 205738FC08
	for <freebsd-fs@FreeBSD.org>; Sat, 22 Sep 2012 17:03:29 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id UAA07530
	for <freebsd-fs@FreeBSD.ORG>; Sat, 22 Sep 2012 20:03:28 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
	by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1TFT72-000NUP-7o
	for freebsd-fs@FreeBSD.ORG; Sat, 22 Sep 2012 20:03:28 +0300
Message-ID: <505DEF5F.8060401@FreeBSD.org>
Date: Sat, 22 Sep 2012 20:03:27 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:15.0) Gecko/20120913 Thunderbird/15.0.1
MIME-Version: 1.0
To: freebsd-fs@FreeBSD.org
X-Enigmail-Version: 1.4.3
Content-Type: text/plain; charset=X-VIET-VPS
Content-Transfer-Encoding: 7bit
Cc: 
Subject: zfsboot and zfsloader: normalization of filesystem names
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 22 Sep 2012 17:03:31 -0000


Currently zfsboot uses the following format to specify a ZFS filesystem name in
a full file path:
poolname:filesystem/name:/path/to/file
ZFS loader uses this format:
zfs:poolname/filesystemname:/path/to/file

The following patchset:
http://people.freebsd.org/~avg/zfs-boot-naming.diff
unifies the naming.
zfsboot format will be: poolname/filesystemname:/path/to/file
Note that it is still different from zfsloader - "zfs:" prefix is missing.  This
is because unlike the loader zfsboot supports only ZFS filesystem, so the prefix
is redundant.  But I can still add support for it if there is a popular request.

Also, current code treats lone pool name as a pool's boot data set name.  That
is, whatever is specified in bootfs property.  If the property is unset, then
the root dataset is the boot dataset.
I want to change this to always mean the root dataset.  boot dataset is selected
by default anyways and its name is expanded to the actual name when it is printed.

Also, lsdev -v for a zfs pool will print bootfs property.
The same goes for zfsboot's "status" command.

A final note.  All this stuff really needs to be documented.  Currently the
documentation on boot blocks seems to totally miss on ZFS boot.

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Sat Sep 22 17:13:11 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 27BD4106566B;
	Sat, 22 Sep 2012 17:13:11 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id E32BB8FC17;
	Sat, 22 Sep 2012 17:13:09 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id UAA07567;
	Sat, 22 Sep 2012 20:13:08 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
	by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1TFTGN-000NUn-W4; Sat, 22 Sep 2012 20:13:08 +0300
Message-ID: <505DF1A3.1020809@FreeBSD.org>
Date: Sat, 22 Sep 2012 20:13:07 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:15.0) Gecko/20120913 Thunderbird/15.0.1
MIME-Version: 1.0
To: freebsd-fs@FreeBSD.org
X-Enigmail-Version: 1.4.3
Content-Type: text/plain; charset=X-VIET-VPS
Content-Transfer-Encoding: 7bit
Cc: freebsd-geom@FreeBSD.org
Subject: zfs zvol: set geom mediasize right at creation time
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 22 Sep 2012 17:13:11 -0000


Please review the following patch.

In addition to what the description says I almost by accident sneaked another
change into the patch.  It's setting of stripesize to volblocksize.  I think
that the change should make sense, but it is really a different change.


A side note: setting sectorsize to volblocksize seemed like an overkill and it
would certainly mess the existing zvols in use.  Maybe there should be another
property like reportedblocksize or something.

commit 1585e6cfb602c2a2647b9f802445bb174bc430a4
Author: Andriy Gapon <avg@icyb.net.ua>
Date:   Wed Sep 19 20:49:28 2012 +0300

    zvol: set mediasize in geom provider right upon its creation

    ... instead of deferring the action until first open.
    Unlike upstream this has no benefit on FreeBSD.
    We know that as soon as the provider is created it is going to be tasted
    and thus opened.  Initial mediasize of zero causes tasting failure
    and subsequent retasting because of the size change.

diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
index d47d270..6e9e7a3 100644
--- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
+++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
@@ -475,6 +475,7 @@ zvol_create_minor(const char *name)
 	zvol_state_t *zv;
 	objset_t *os;
 	dmu_object_info_t doi;
+	uint64_t volblocksize, volsize;
 	int error;

 	ZFS_LOG(1, "Creating ZVOL %s...", name);
@@ -535,9 +536,20 @@ zvol_create_minor(const char *name)
 	zv = zs->zss_data = kmem_zalloc(sizeof (zvol_state_t), KM_SLEEP);
 #else	/* !sun */

+	error = zap_lookup(os, ZVOL_ZAP_OBJ, "size", 8, 1, &volsize);
+	if (error) {
+		ASSERT(error == 0);
+		dmu_objset_disown(os, zvol_tag);
+		mutex_exit(&spa_namespace_lock);
+		return (error);
+	}
+
 	DROP_GIANT();
 	g_topology_lock();
 	zv = zvol_geom_create(name);
+	zv->zv_volsize = volsize;
+	zv->zv_provider->mediasize = zv->zv_volsize;
+
 #endif	/* !sun */

 	(void) strlcpy(zv->zv_name, name, MAXPATHLEN);
@@ -554,6 +566,7 @@ zvol_create_minor(const char *name)
 	error = dmu_object_info(os, ZVOL_OBJ, &doi);
 	ASSERT(error == 0);
 	zv->zv_volblocksize = doi.doi_data_block_size;
+	zv->zv_provider->stripesize = zv->zv_volblocksize;

 	if (spa_writeable(dmu_objset_spa(os))) {
 		if (zil_replay_disable)

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Sat Sep 22 18:24:27 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3801D106564A
	for <freebsd-fs@freebsd.org>; Sat, 22 Sep 2012 18:24:27 +0000 (UTC)
	(envelope-from freebsd-listen@fabiankeil.de)
Received: from smtprelay01.ispgateway.de (smtprelay01.ispgateway.de
	[80.67.31.39]) by mx1.freebsd.org (Postfix) with ESMTP id E18418FC08
	for <freebsd-fs@freebsd.org>; Sat, 22 Sep 2012 18:24:26 +0000 (UTC)
Received: from [87.79.193.113] (helo=fabiankeil.de)
	by smtprelay01.ispgateway.de with esmtpsa (SSLv3:AES128-SHA:128)
	(Exim 4.68) (envelope-from <freebsd-listen@fabiankeil.de>)
	id 1TFUNI-00007P-4V
	for freebsd-fs@freebsd.org; Sat, 22 Sep 2012 20:24:20 +0200
Date: Sat, 22 Sep 2012 20:24:14 +0200
From: Fabian Keil <freebsd-listen@fabiankeil.de>
To: freebsd-fs@freebsd.org
Message-ID: <20120922202414.7ed96a21@fabiankeil.de>
In-Reply-To: <20110625134031.3cbc5952@fabiankeil.de>
References: <20110227202957.GD1992@garage.freebsd.pl>
	<20110228192129.119cac0c@r500.local>
	<20110307200634.3c0f92df@r500.local>
	<20110307202531.2c90ff5a@r500.local>
	<20110625134031.3cbc5952@fabiankeil.de>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=PGP-SHA1;
	boundary="Sig_/cewI3g88P4=WQ__Y4mACAgb";
	protocol="application/pgp-signature"
X-Df-Sender: Nzc1MDY3
Subject: Re: g_wither_washer() called 470000 times per second
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 22 Sep 2012 18:24:27 -0000

--Sig_/cewI3g88P4=WQ__Y4mACAgb
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

Fabian Keil <freebsd-listen@fabiankeil.de> wrote:

> Apparently what's eating the cpu is the kernel calling
> g_wither_washer() about 470000 time per second which
> seems a bit excessive:
>=20
> r500# dtrace -n 'fbt:kernel:g_*:entry { @[probefunc, stack()] =3D count()=
; } tick-1sec { trunc(@, 15); printa(@); trunc(@)}'
> dtrace: description 'fbt:kernel:g_*:entry ' matched 232 probes
> CPU     ID                    FUNCTION:NAME
> [...]

>   g_wither_washer                                  =20
>               kernel`g_run_events+0x358
>               kernel`fork_exit+0x11f
>               kernel`0xffffffff808debde
>            475959
>=20

This is now kern/171865:
http://www.freebsd.org/cgi/query-pr.cgi?pr=3D171865

Fabian

--Sig_/cewI3g88P4=WQ__Y4mACAgb
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iEYEARECAAYFAlBeAlQACgkQBYqIVf93VJ3z0gCdFKfwM97OYGIOvd+RHr++LyyZ
6BwAn0FfFyF35ycj5jYwT2nsqlhqrEyC
=0kU6
-----END PGP SIGNATURE-----

--Sig_/cewI3g88P4=WQ__Y4mACAgb--