From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 16 05:13:22 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8BB051065673
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 05:13:22 +0000 (UTC)
	(envelope-from batrick@batbytes.com)
Received: from mail-iy0-f182.google.com (mail-iy0-f182.google.com
	[209.85.210.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 650808FC19
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 05:13:22 +0000 (UTC)
Received: by iaky10 with SMTP id y10so5751322iak.13
	for <freebsd-fs@freebsd.org>; Sat, 15 Oct 2011 22:13:21 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.42.158.136 with SMTP id h8mr28297990icx.22.1318740351026; Sat,
	15 Oct 2011 21:45:51 -0700 (PDT)
Received: by 10.231.19.66 with HTTP; Sat, 15 Oct 2011 21:45:50 -0700 (PDT)
Date: Sun, 16 Oct 2011 00:45:50 -0400
Message-ID: <CACh33Fpz=uAp8h0Bjsi1Be=ob_94jXtN51mAHvGPkReY5MpTcg@mail.gmail.com>
From: Patrick Donnelly <batrick@batbytes.com>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=UTF-8
Subject: [ZFS] Using SSD with partitions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Oct 2011 05:13:22 -0000

Hi list,

I've got an array for home use where my boot drive (UFS) finally died.
I've decided to upgrade to a SSD for a replacement but am looking to
maybe simultaneously improving performance of my ZFS array. Naturally
a FreeBSD install doesn't use much space so partitioning the drive to
get maximum usage seems wise. I was thinking for a hypothetical 40GB
drive:

20GB -- FreeBSD / partition
2GB  -- ZFS ZIL
18GB -- ZFS Cache

What I'm wondering is if this will be a bad idea. I know that SSDs are
not designed to be written to *a lot*, which a ZIL will experience. Is
this a bad idea? I'm hoping for experiences from people in similar
scenarios. As I'm not an enterprise IT person who can't simply choose
to just throw more mon-- I mean SSDs -- at the problem, I need to be
efficient. :) [I'm thinking the cache drive partition might be
pointless as I don't think I'd benefit that much from it.]

Disclaimer: I've looked at a lot of guides, including the standard
best practices guide, and none of it seemed helpful for my particular
problem, especially given that I'm using FreeBSD.

Thanks for any advice,

-- 
- Patrick Donnelly

From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 16 05:25:27 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 46832106566B
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 05:25:27 +0000 (UTC)
	(envelope-from fjwcash@gmail.com)
Received: from mail-iy0-f182.google.com (mail-iy0-f182.google.com
	[209.85.210.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 12BC58FC12
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 05:25:26 +0000 (UTC)
Received: by iaky10 with SMTP id y10so5762104iak.13
	for <freebsd-fs@freebsd.org>; Sat, 15 Oct 2011 22:25:26 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	bh=dWSCMADqJks7EtUtMxQ+YvbaQBAM1skD9bjAc0Bf698=;
	b=G0JVCQvLRg49dVxdZi3vl4o9+YKt3xkZ103flx9FOggpK/XTQLusFdDFwkSwT5C/s6
	QWXi5DOe/h1Pp4pNQLg3Rz4+1wZ/Yuar5Yb7Iw5zew3LNwn2i6CNpShokqvyucQcFCpS
	diNRaaKESNRz+V5gA75Ytxwq0/P+UzrEgz2Pw=
MIME-Version: 1.0
Received: by 10.68.6.234 with SMTP id e10mr10617632pba.86.1318742726032; Sat,
	15 Oct 2011 22:25:26 -0700 (PDT)
Received: by 10.142.239.12 with HTTP; Sat, 15 Oct 2011 22:25:25 -0700 (PDT)
In-Reply-To: <CACh33Fpz=uAp8h0Bjsi1Be=ob_94jXtN51mAHvGPkReY5MpTcg@mail.gmail.com>
References: <CACh33Fpz=uAp8h0Bjsi1Be=ob_94jXtN51mAHvGPkReY5MpTcg@mail.gmail.com>
Date: Sat, 15 Oct 2011 22:25:25 -0700
Message-ID: <CAOjFWZ4P9f-As4EtHQ5_bC7QWgBRGXBZD4Oq=Co684F8qOqhcQ@mail.gmail.com>
From: Freddie Cash <fjwcash@gmail.com>
To: Patrick Donnelly <batrick@batbytes.com>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-fs@freebsd.org
Subject: Re: [ZFS] Using SSD with partitions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Oct 2011 05:25:27 -0000

On Sat, Oct 15, 2011 at 9:45 PM, Patrick Donnelly <batrick@batbytes.com>wrote:

> I've got an array for home use where my boot drive (UFS) finally died.
> I've decided to upgrade to a SSD for a replacement but am looking to
> maybe simultaneously improving performance of my ZFS array. Naturally
> a FreeBSD install doesn't use much space so partitioning the drive to
> get maximum usage seems wise. I was thinking for a hypothetical 40GB
> drive:
>
> 20GB -- FreeBSD / partition
> 2GB  -- ZFS ZIL
> 18GB -- ZFS Cache
>
> What I'm wondering is if this will be a bad idea. I know that SSDs are
> not designed to be written to *a lot*, which a ZIL will experience. Is
> this a bad idea? I'm hoping for experiences from people in similar
> scenarios. As I'm not an enterprise IT person who can't simply choose
> to just throw more mon-- I mean SSDs -- at the problem, I need to be
> efficient. :) [I'm thinking the cache drive partition might be
> pointless as I don't think I'd benefit that much from it.]
>
> Disclaimer: I've looked at a lot of guides, including the standard
> best practices guide, and none of it seemed helpful for my particular
> problem, especially given that I'm using FreeBSD.
>

For home use, there's nothing wrong with doing this.

Unless it's an NFS server used by multiple clients, you won't be pounding
the ZIL; and you may not even need to have a separate log device.  Create
the pool, create a test filesystem, do some benchmarks to get a baseline
(preferably with the "normal" workload you'd be doing).  Then destroy/create
the filesystem again, "zfs set sync=off" on the filesystem, and benchmark
the filesystem again.  If you get a huge performance gain, then turn sync on
again, create the separate log and test again.

Using the SSD for the OS and the cache will be fine.  L2ARC is throttled to
7 MBps of writes, and is then a read-heavy partition, so is very easy on the
drive.  Whether or not you benefit from the L2ARC depends on whether you
will be using dedupe and whether or not your files are accessed multiple
times within short periods of times.

If you are really worried about the longevity of the SSD, then
under-provision it.  Only partition/format 36 GB of it, leaving the extra 4
GB to be used internally for extra wear-leveling.

-- 
Freddie Cash
fjwcash@gmail.com

From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 16 12:14:30 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5D27B106566B
	for <freebsd-fs@FreeBSD.org>; Sun, 16 Oct 2011 12:14:30 +0000 (UTC)
	(envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id A60928FC08
	for <freebsd-fs@FreeBSD.org>; Sun, 16 Oct 2011 12:14:29 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id PAA20998;
	Sun, 16 Oct 2011 15:14:26 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
	by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1RFPbl-0006kS-Qa; Sun, 16 Oct 2011 15:14:25 +0300
Message-ID: <4E9ACA9F.5090308@FreeBSD.org>
Date: Sun, 16 Oct 2011 15:14:23 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:7.0.1) Gecko/20111002 Thunderbird/7.0.1
MIME-Version: 1.0
To: Florian Wagner <florian@wagner-flo.net>
References: <20111015214347.09f68e4e@naclador.mos32.de>
In-Reply-To: <20111015214347.09f68e4e@naclador.mos32.de>
X-Enigmail-Version: undefined
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org
Subject: Re: Extending zfsboot.c to allow selecting filesystem from
	boot.config
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Oct 2011 12:14:30 -0000

on 15/10/2011 22:43 Florian Wagner said the following:
> Hi,
> 
> from looking at the code in sys/boot/i386/zfsboot/zfsboot.c the ZFS aware
> boot block already allows to select pool to load the kernel from by adding
> <POOL>:<FILE TO BOOT> to the boot.config. As this code calls the
> zfs_mount_pool function it will look for the bootfs property on the new
> pool or use its root dataset to get the file from there.
> 
> How much work would it be to extend the loader to also allow selecting a
> ZFS filesystem?
> 
> What I'd like to do is place a boot.config on the (otherwise empty) root of
> my system pool and then tell it to get the loader from another filesystem
> by putting "rpool/root/stable-8-r226381:/boot/zfsloader" in there.

Please check out the following changes:
https://gitorious.org/~avg/freebsd/avgbsd/commit/8c3808c4bb2a2cd746db3e9c46871c9bdf943ef6
https://gitorious.org/~avg/freebsd/avgbsd/commit/0b4279c0d366d9f2b5bb9d4c0dd3229d8936d92b
https://gitorious.org/~avg/freebsd/avgbsd/commit/b29ab78b079f27918de1683e88bcb1817a0e5969
https://gitorious.org/~avg/freebsd/avgbsd/commit/f49add15516dfd582258b6820b8f0254cf9419a3
https://gitorious.org/~avg/freebsd/avgbsd/commit/e072b443b0f59fe1ff54a70d2437d63698bbf597
https://gitorious.org/~avg/freebsd/avgbsd/commit/f701760c10812c5b6925352fb003408c19170063

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 16 14:44:33 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 499DD106564A
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 14:44:33 +0000 (UTC)
	(envelope-from luchesar.iliev@gmail.com)
Received: from mail-ey0-f182.google.com (mail-ey0-f182.google.com
	[209.85.215.182])
	by mx1.freebsd.org (Postfix) with ESMTP id D17058FC14
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 14:44:32 +0000 (UTC)
Received: by eyd10 with SMTP id 10so2886521eyd.13
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 07:44:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=message-id:disposition-notification-to:date:from:user-agent
	:mime-version:to:cc:subject:references:in-reply-to
	:x-enigmail-version:openpgp:content-type:content-transfer-encoding;
	bh=gfGlrJxD+iHNimqd0P7Mcnfxitgs/gMna1ov/EOWu88=;
	b=AyiCFhqxP2QCaY5losVsHj3204VCAqWKJMy2jHx2jS5DRDtNVm/lYrYHnN8bccaykP
	ly3ODRK6Qc7J17HxW31p+F0AqPOyEpOTYRYQ1SGILsYKMAmCGmQcov+ZU5Mi4w+YbBSk
	3yzFVxjo1LVi/fA7PJA6QUFt2vQyyPEFQ87ZM=
Received: by 10.223.81.205 with SMTP id y13mr17771083fak.34.1318774567893;
	Sun, 16 Oct 2011 07:16:07 -0700 (PDT)
Received: from [79.124.93.41] ([79.124.93.41])
	by mx.google.com with ESMTPS id y8sm17403394faj.10.2011.10.16.07.16.06
	(version=SSLv3 cipher=OTHER); Sun, 16 Oct 2011 07:16:06 -0700 (PDT)
Message-ID: <4E9AE725.4040001@gmail.com>
Date: Sun, 16 Oct 2011 17:16:05 +0300
From: "Luchesar V. ILIEV" <luchesar.iliev@gmail.com>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:7.0.1) Gecko/20111002 Thunderbird/7.0.1
MIME-Version: 1.0
To: Patrick Donnelly <batrick@batbytes.com>
References: <CACh33Fpz=uAp8h0Bjsi1Be=ob_94jXtN51mAHvGPkReY5MpTcg@mail.gmail.com>
In-Reply-To: <CACh33Fpz=uAp8h0Bjsi1Be=ob_94jXtN51mAHvGPkReY5MpTcg@mail.gmail.com>
X-Enigmail-Version: undefined
OpenPGP: id=9A1FEEFF;
	url=https://cert.acad.bg/pgp-keys/
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
Subject: Re: [ZFS] Using SSD with partitions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Oct 2011 14:44:33 -0000

On 16/10/2011 07:45, Patrick Donnelly wrote:
> Hi list,
> 
> I've got an array for home use where my boot drive (UFS) finally died.
> I've decided to upgrade to a SSD for a replacement but am looking to
> maybe simultaneously improving performance of my ZFS array. Naturally
> a FreeBSD install doesn't use much space so partitioning the drive to
> get maximum usage seems wise. I was thinking for a hypothetical 40GB
> drive:
> 
> 20GB -- FreeBSD / partition
> 2GB  -- ZFS ZIL
> 18GB -- ZFS Cache
> 
> What I'm wondering is if this will be a bad idea. I know that SSDs are
> not designed to be written to *a lot*, which a ZIL will experience. Is
> this a bad idea? I'm hoping for experiences from people in similar
> scenarios. As I'm not an enterprise IT person who can't simply choose
> to just throw more mon-- I mean SSDs -- at the problem, I need to be
> efficient. :) [I'm thinking the cache drive partition might be
> pointless as I don't think I'd benefit that much from it.]
> 
> Disclaimer: I've looked at a lot of guides, including the standard
> best practices guide, and none of it seemed helpful for my particular
> problem, especially given that I'm using FreeBSD.
> 
> Thanks for any advice,
> 

There are other, much more knowledgeable people around, who might give
you better advice, but let me just make a few points:

1. If you can afford more RAM, it's (much) better for ZFS than L2ARC.

2. It's not just the ZIL devices that get heavily written. L2ARC ones
also get their hefty share of writes. And even when the cache becomes
"hot" enough, keep in mind that...

3. You lose all L2ARC contents once the system gets rebooted. It's kind
of counter-intuitive, but that's how it is (and for a reason).

4. L2ARC and ZIL have almost the opposite performance requirements, so
putting them on the same device is likely never going to be optimal
(unless you spend a fortune on that SSD).

5. Check the output of "zpool upgrade". If your zpool version is
anything below 19 (likely 14 or 15), I'd strongly recommend that you
avoid setting up a separate ZIL. Pools before v19 fail critically when
the ZIL is removed or is corrupted, which means you lose them for good.
You might mitigate the risk with a mirrored ZIL, but it's still likely
not worth it in your case.

6. If, OTOH, you're running a reasonably recent -STABLE (8 or 9), then
your zpool version is likely 28 (thanks, pjd@), which means ZIL is not
that scary, but you might still lose some data. Even an unexpected power
failure might cause trouble, unless the SSD is designed to handle it
gracefully (this typically involves some sort of capacitor).

The topic is quite popular, and I'd suggest you do some searching and
reading around ("ZFS SSD" on Google brings a lot of interesting and
helpful things, especially on the OpenSolaris and FreeBSD's forums). If
you don't feel that geeky, a good starting point might be this one:

http://constantin.glez.de/blog/2011/02/frequently-asked-questions-about-flash-memory-ssds-and-zfs

It really depends on your needs, your current (and potential future)
system configuration, and the time and effort you're ready to spend.

Again, I'm no expert in those things, so take all my comments with a
grain of salt. Good luck!

Cheers,
Luchesar

-- 
i.dea.is/luchesar

From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 16 16:17:51 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E42491065672
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 16:17:51 +0000 (UTC)
	(envelope-from daniel@digsys.bg)
Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230])
	by mx1.freebsd.org (Postfix) with ESMTP id 6DB5A8FC16
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 16:17:50 +0000 (UTC)
Received: from [192.92.129.101] ([192.92.129.101]) (authenticated bits=0)
	by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p9GGHcTa026595
	(version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO);
	Sun, 16 Oct 2011 19:17:44 +0300 (EEST)
	(envelope-from daniel@digsys.bg)
Mime-Version: 1.0 (Apple Message framework v1251.1)
Content-Type: text/plain; charset=us-ascii
From: Daniel Kalchev <daniel@digsys.bg>
In-Reply-To: <4E9AE725.4040001@gmail.com>
Date: Sun, 16 Oct 2011 19:17:39 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <169E82FD-3B61-4CAB-B067-D380D69CDED5@digsys.bg>
References: <CACh33Fpz=uAp8h0Bjsi1Be=ob_94jXtN51mAHvGPkReY5MpTcg@mail.gmail.com>
	<4E9AE725.4040001@gmail.com>
To: "Luchesar V. ILIEV" <luchesar.iliev@gmail.com>
X-Mailer: Apple Mail (2.1251.1)
Cc: freebsd-fs@freebsd.org
Subject: Re: [ZFS] Using SSD with partitions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Oct 2011 16:17:52 -0000


On Oct 16, 2011, at 17:16 , Luchesar V. ILIEV wrote:

> 6. If, OTOH, you're running a reasonably recent -STABLE (8 or 9), then
> your zpool version is likely 28 (thanks, pjd@), which means ZIL is not
> that scary, but you might still lose some data. Even an unexpected =
power
> failure might cause trouble, unless the SSD is designed to handle it
> gracefully (this typically involves some sort of capacitor).

Just for the record: even without ZIL, you will most definitely lose =
data at power outage. In most cases, this will not damage the ZFS =
filesystem, but data will be lost. There is nothing that can prevent =
this.

Therefore, with ZFS v28, adding ZIL does not introduce any more risk to =
your data.

One thing to have in mind is ZIL will help only under certain workloads, =
sequential write is not one of these. It helps most with database-type =
loads and sync writes like an NFS server that is written heavily. =
Freddie have good advice on determining if it will help.

L2ARC on the other hand may help enormously, especially if the spool is =
big. Workstation-class motherboards until recently were topped at 8GB =
RAM and ZFS is happy with as much RAM as you can offer. Adding L2ARC may =
provide more headroom. Benefits of course depend on the workload. =
Neither L2ARC or ZIL provide magical benefits.

Daniel=

From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 16 18:02:11 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id AA7BB106564A
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 18:02:11 +0000 (UTC)
	(envelope-from luchesar.iliev@gmail.com)
Received: from mail-ey0-f182.google.com (mail-ey0-f182.google.com
	[209.85.215.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 3463C8FC08
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 18:02:10 +0000 (UTC)
Received: by eyd10 with SMTP id 10so3039725eyd.13
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 11:02:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=message-id:disposition-notification-to:date:from:user-agent
	:mime-version:to:cc:subject:references:in-reply-to
	:x-enigmail-version:openpgp:content-type:content-transfer-encoding;
	bh=QOOUgb8RcyyrpwaS1VzsZRQpmbZEqK6Tc2R46U2qG9c=;
	b=V4gDDt1u4QTh/5qXyxdiZ88fEYjD08zQRHuxXsa9tualP7SY4luRT6Uv9CBR7FJcHN
	+AiuiDWB6DLbzPxNPoZ4wItjfnVJ5o12ymyscQC6m8u0s0mj9WJb8yeiuxiPXkONIeWD
	aX96I5tU6PubuXBTPh7gQvi3eg4Vkfntxy5f0=
Received: by 10.223.17.11 with SMTP id q11mr19125531faa.13.1318788129324;
	Sun, 16 Oct 2011 11:02:09 -0700 (PDT)
Received: from [79.124.93.41] ([79.124.93.41])
	by mx.google.com with ESMTPS id m26sm18516987fac.6.2011.10.16.11.02.07
	(version=SSLv3 cipher=OTHER); Sun, 16 Oct 2011 11:02:08 -0700 (PDT)
Message-ID: <4E9B1C1E.7090804@gmail.com>
Date: Sun, 16 Oct 2011 21:02:06 +0300
From: "Luchesar V. ILIEV" <luchesar.iliev@gmail.com>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:7.0.1) Gecko/20111002 Thunderbird/7.0.1
MIME-Version: 1.0
To: Daniel Kalchev <daniel@digsys.bg>, Patrick Donnelly <batrick@batbytes.com>
References: <CACh33Fpz=uAp8h0Bjsi1Be=ob_94jXtN51mAHvGPkReY5MpTcg@mail.gmail.com>
	<4E9AE725.4040001@gmail.com>
	<169E82FD-3B61-4CAB-B067-D380D69CDED5@digsys.bg>
In-Reply-To: <169E82FD-3B61-4CAB-B067-D380D69CDED5@digsys.bg>
X-Enigmail-Version: undefined
OpenPGP: id=9A1FEEFF;
	url=https://cert.acad.bg/pgp-keys/
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
Subject: Re: [ZFS] Using SSD with partitions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Oct 2011 18:02:11 -0000

On 16/10/2011 19:17, Daniel Kalchev wrote:
> 
> On Oct 16, 2011, at 17:16 , Luchesar V. ILIEV wrote:
> 
>> 6. If, OTOH, you're running a reasonably recent -STABLE (8 or 9),
>> then your zpool version is likely 28 (thanks, pjd@), which means
>> ZIL is not that scary, but you might still lose some data. Even an
>> unexpected power failure might cause trouble, unless the SSD is
>> designed to handle it gracefully (this typically involves some sort
>> of capacitor).
> 
> Just for the record: even without ZIL, you will most definitely lose
> data at power outage. In most cases, this will not damage the ZFS
> filesystem, but data will be lost. There is nothing that can prevent
> this.
> 
> Therefore, with ZFS v28, adding ZIL does not introduce any more risk
> to your data.

I might be wrong in my interpretation, but from what I remember, when
the power goes down, an unprotected SSD is likely to lose _more_ data
than simply its write buffers -- that's quite unlike a hard-drive. So
much, in fact, that the whole ZIL might become corrupted (and that's
potentially way more data than any device cache).

_If_ that's true, then isn't an array of only "conventional" HDDs, where
the ZIL is interleaved with the zpool itself, at least a bit safer from
power failures? Again, if we are taking the cheaper SSDs into account.

> One thing to have in mind is ZIL will help only under certain
> workloads, sequential write is not one of these. It helps most with
> database-type loads and sync writes like an NFS server that is
> written heavily. Freddie have good advice on determining if it will
> help.
> 
> L2ARC on the other hand may help enormously, especially if the spool
> is big. Workstation-class motherboards until recently were topped at
> 8GB RAM and ZFS is happy with as much RAM as you can offer. Adding
> L2ARC may provide more headroom. Benefits of course depend on the
> workload. Neither L2ARC or ZIL provide magical benefits.

Which is yet another reason to go for more RAM, as it tends to be quite
magic-yielding. Just kidding here, but, seriously, if Patrick has room
for some RAM upgrade, I think he should consider this, at least for
performance (a boot and OS drive, obviously, are another matter).

Cheers,
Luchesar

From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 16 18:10:05 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 54A581065673
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 18:10:05 +0000 (UTC)
	(envelope-from daniel@digsys.bg)
Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230])
	by mx1.freebsd.org (Postfix) with ESMTP id CECC18FC12
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 18:10:04 +0000 (UTC)
Received: from [192.92.129.101] ([192.92.129.101]) (authenticated bits=0)
	by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p9GI9rLe026908
	(version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO);
	Sun, 16 Oct 2011 21:10:00 +0300 (EEST)
	(envelope-from daniel@digsys.bg)
Mime-Version: 1.0 (Apple Message framework v1251.1)
Content-Type: text/plain; charset=us-ascii
From: Daniel Kalchev <daniel@digsys.bg>
In-Reply-To: <4E9B1C1E.7090804@gmail.com>
Date: Sun, 16 Oct 2011 21:09:53 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <14628DFB-AA3E-4D2D-9D4F-723B6327B6C0@digsys.bg>
References: <CACh33Fpz=uAp8h0Bjsi1Be=ob_94jXtN51mAHvGPkReY5MpTcg@mail.gmail.com>
	<4E9AE725.4040001@gmail.com>
	<169E82FD-3B61-4CAB-B067-D380D69CDED5@digsys.bg>
	<4E9B1C1E.7090804@gmail.com>
To: "Luchesar V. ILIEV" <luchesar.iliev@gmail.com>
X-Mailer: Apple Mail (2.1251.1)
Cc: freebsd-fs@freebsd.org
Subject: Re: [ZFS] Using SSD with partitions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Oct 2011 18:10:05 -0000


On Oct 16, 2011, at 21:02 , Luchesar V. ILIEV wrote:

> On 16/10/2011 19:17, Daniel Kalchev wrote:
>> Therefore, with ZFS v28, adding ZIL does not introduce any more risk
>> to your data.
>=20
> I might be wrong in my interpretation, but from what I remember, when
> the power goes down, an unprotected SSD is likely to lose _more_ data
> than simply its write buffers -- that's quite unlike a hard-drive. So
> much, in fact, that the whole ZIL might become corrupted (and that's
> potentially way more data than any device cache).

The real risk with low-grade "unprotected" SSDs is that the SSD may well =
become damaged, sometimes beyond repair.

It is the same risk with SSDs or with magnetic drives. If the drive lies =
to the OS that it has safely written data -- then data will be lost. =
Thing is, we know what a cheap HDD is. Most SSDs however lie, because =
otherwise they will offer very poor write performance.

ZIL is not about RAM. ZIL is for low latency synchronous writing. It =
does not matter how much RAM do you have -- it will not help if you have =
heavy synchronous writing (of small records). =20

Anyway, as it was mentioned -- with moderate activity on the pool, it is =
not problem to use the same SSD for boot/ZIL/L2ARC.


Daniel=

From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 16 18:30:13 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 49634106564A
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 18:30:13 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta04.westchester.pa.mail.comcast.net
	(qmta04.westchester.pa.mail.comcast.net [76.96.62.40])
	by mx1.freebsd.org (Postfix) with ESMTP id E8D4F8FC13
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 18:30:12 +0000 (UTC)
Received: from omta16.westchester.pa.mail.comcast.net ([76.96.62.88])
	by qmta04.westchester.pa.mail.comcast.net with comcast
	id lWSz1h0061uE5Es54WW6aU; Sun, 16 Oct 2011 18:30:06 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta16.westchester.pa.mail.comcast.net with comcast
	id lWW41h01h1t3BNj3cWW5uK; Sun, 16 Oct 2011 18:30:06 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 572CE102C1C; Sun, 16 Oct 2011 11:30:03 -0700 (PDT)
Date: Sun, 16 Oct 2011 11:30:03 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: "Luchesar V. ILIEV" <luchesar.iliev@gmail.com>
Message-ID: <20111016183003.GA29466@icarus.home.lan>
References: <CACh33Fpz=uAp8h0Bjsi1Be=ob_94jXtN51mAHvGPkReY5MpTcg@mail.gmail.com>
	<4E9AE725.4040001@gmail.com>
	<169E82FD-3B61-4CAB-B067-D380D69CDED5@digsys.bg>
	<4E9B1C1E.7090804@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4E9B1C1E.7090804@gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
Subject: Re: [ZFS] Using SSD with partitions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Oct 2011 18:30:13 -0000

On Sun, Oct 16, 2011 at 09:02:06PM +0300, Luchesar V. ILIEV wrote:
> On 16/10/2011 19:17, Daniel Kalchev wrote:
> > 
> > On Oct 16, 2011, at 17:16 , Luchesar V. ILIEV wrote:
> > 
> >> 6. If, OTOH, you're running a reasonably recent -STABLE (8 or 9),
> >> then your zpool version is likely 28 (thanks, pjd@), which means
> >> ZIL is not that scary, but you might still lose some data. Even an
> >> unexpected power failure might cause trouble, unless the SSD is
> >> designed to handle it gracefully (this typically involves some sort
> >> of capacitor).
> > 
> > Just for the record: even without ZIL, you will most definitely lose
> > data at power outage. In most cases, this will not damage the ZFS
> > filesystem, but data will be lost. There is nothing that can prevent
> > this.
> > 
> > Therefore, with ZFS v28, adding ZIL does not introduce any more risk
> > to your data.
> 
> I might be wrong in my interpretation, but from what I remember, when
> the power goes down, an unprotected SSD is likely to lose _more_ data
> than simply its write buffers -- that's quite unlike a hard-drive. So
> much, in fact, that the whole ZIL might become corrupted (and that's
> potentially way more data than any device cache).
> 
> _If_ that's true, then isn't an array of only "conventional" HDDs, where
> the ZIL is interleaved with the zpool itself, at least a bit safer from
> power failures? Again, if we are taking the cheaper SSDs into account.

Please expand on the above, providing reference materials or links to
things you've read that help shed light on all of this.  More
specifically:

1) I would like a definition of what "unprotected SSD" means and what a
"protected SSD" is.

2) I would like an explanation as to what "SSDs are more likely than an
MHDD to lose data on a power outage" means exactly (on a technical
level, not something vague) and from where you got this interpretation.

Thanks!

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 16 18:40:13 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4D3ED106568A
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 18:40:13 +0000 (UTC)
	(envelope-from luchesar.iliev@gmail.com)
Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com
	[209.85.214.54])
	by mx1.freebsd.org (Postfix) with ESMTP id C958E8FC1C
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 18:40:12 +0000 (UTC)
Received: by bkbzu17 with SMTP id zu17so2891556bkb.13
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 11:40:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=message-id:disposition-notification-to:date:from:user-agent
	:mime-version:to:cc:subject:references:in-reply-to
	:x-enigmail-version:openpgp:content-type:content-transfer-encoding;
	bh=wuTMRkXxvI/4uaWfKuWMj4VpFQ+HDun/d4xT26GERNk=;
	b=hAQF9yR6VgQieQnkRu5fjYjzfOaXXzSjm7eyf6ocRVNqKEdrBibOdNEdMzjfzLeVjI
	LRWMmWlHEXg9ZVwrwa+hJBym2uHfCM6046vj+a1lU/Z8DmFpBdYbJQDpj5lGryvWJZk5
	Q8EosHO4CTCFX8GT/AMdfSX2hyyS1nmiA0NNE=
Received: by 10.223.17.11 with SMTP id q11mr19300391faa.13.1318790411561;
	Sun, 16 Oct 2011 11:40:11 -0700 (PDT)
Received: from [79.124.93.41] ([79.124.93.41])
	by mx.google.com with ESMTPS id m26sm18703099fac.6.2011.10.16.11.40.10
	(version=SSLv3 cipher=OTHER); Sun, 16 Oct 2011 11:40:10 -0700 (PDT)
Message-ID: <4E9B2509.3030701@gmail.com>
Date: Sun, 16 Oct 2011 21:40:09 +0300
From: "Luchesar V. ILIEV" <luchesar.iliev@gmail.com>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:7.0.1) Gecko/20111002 Thunderbird/7.0.1
MIME-Version: 1.0
To: Daniel Kalchev <daniel@digsys.bg>
References: <CACh33Fpz=uAp8h0Bjsi1Be=ob_94jXtN51mAHvGPkReY5MpTcg@mail.gmail.com>
	<4E9AE725.4040001@gmail.com>
	<169E82FD-3B61-4CAB-B067-D380D69CDED5@digsys.bg>
	<4E9B1C1E.7090804@gmail.com>
	<14628DFB-AA3E-4D2D-9D4F-723B6327B6C0@digsys.bg>
In-Reply-To: <14628DFB-AA3E-4D2D-9D4F-723B6327B6C0@digsys.bg>
X-Enigmail-Version: undefined
OpenPGP: id=9A1FEEFF;
	url=https://cert.acad.bg/pgp-keys/
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Cc: freebsd-fs@freebsd.org
Subject: Re: [ZFS] Using SSD with partitions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Oct 2011 18:40:13 -0000

On 16/10/2011 21:09, Daniel Kalchev wrote:
> 
> On Oct 16, 2011, at 21:02 , Luchesar V. ILIEV wrote:
> 
>> On 16/10/2011 19:17, Daniel Kalchev wrote:
>>> Therefore, with ZFS v28, adding ZIL does not introduce any more 
>>> risk to your data.
>> 
>> I might be wrong in my interpretation, but from what I remember, 
>> when the power goes down, an unprotected SSD is likely to lose 
>> _more_ data than simply its write buffers -- that's quite unlike a 
>> hard-drive. So much, in fact, that the whole ZIL might become 
>> corrupted (and that's potentially way more data than any device 
>> cache).
> 
> The real risk with low-grade "unprotected" SSDs is that the SSD may 
> well become damaged, sometimes beyond repair.
> 
> It is the same risk with SSDs or with magnetic drives. If the drive 
> lies to the OS that it has safely written data -- then data will be 
> lost. Thing is, we know what a cheap HDD is. Most SSDs however lie, 
> because otherwise they will offer very poor write performance.

That's true, but my understanding is that the differences go further
beyond that. To quote one paper: "Our data show that flash memory’s
behavior under power failure is surprising in several ways. First,
operations that come closer to completion do not necessarily exhibit
fewer bit errors. Second, power failure not only results in failure of
the operation in progress, it can also corrupt data already present in
the flash device. Third, power failure can negatively impact the
integrity of future data written to the device."

http://cseweb.ucsd.edu/users/swanson/papers/DAC2011PowerCut.pdf

However, that's probably getting too academic (also beyond my own
qualifications), and I wouldn't like to hijack the thread.

> ZIL is not about RAM. ZIL is for low latency synchronous writing. It 
> does not matter how much RAM do you have -- it will not help if you 
> have heavy synchronous writing (of small records).

>From what I understand, Patrick is talking about a home system, which is
not very likely to be heavy on the synchronous writes, unless, of
course, he's using NFS or a database. On the other hand, most desktop
applications would happily use the additional memory, so it benefits not
just the storage subsystem. That's why I'm making the point about the
RAM upgrade, but, apart from that, you're absolutely correct about ZIL
and synchronous writes.

Cheers,
Luchesar

From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 16 18:52:12 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B4A0B106564A
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 18:52:12 +0000 (UTC)
	(envelope-from luchesar.iliev@gmail.com)
Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com
	[209.85.214.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 3D69F8FC12
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 18:52:12 +0000 (UTC)
Received: by bkbzu17 with SMTP id zu17so2903311bkb.13
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 11:52:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=message-id:disposition-notification-to:date:from:user-agent
	:mime-version:to:cc:subject:references:in-reply-to
	:x-enigmail-version:openpgp:content-type:content-transfer-encoding;
	bh=oksDqTP2HOnK8TrZNOj4iOET6qRsNAodX4TQu7I90Mo=;
	b=gJqexuC3douGb8aICmV4arg0YYE7DPiwV26mMD9rosCscMZOCF6WLtknXkq9gkhhcA
	QYyAfm7d7XXXb6kTHfTgoUme2ehY5ZW852t3jgYPHvHlcgEqWwHHRRNIUpe8oxS2Aa+V
	rOaBiFy38N4z9N27z8jjwrkA/mA5gTuCufbwk=
Received: by 10.223.77.26 with SMTP id e26mr9791168fak.37.1318791131198;
	Sun, 16 Oct 2011 11:52:11 -0700 (PDT)
Received: from [79.124.93.41] ([79.124.93.41])
	by mx.google.com with ESMTPS id f23sm4616464faf.0.2011.10.16.11.52.09
	(version=SSLv3 cipher=OTHER); Sun, 16 Oct 2011 11:52:10 -0700 (PDT)
Message-ID: <4E9B27D8.70106@gmail.com>
Date: Sun, 16 Oct 2011 21:52:08 +0300
From: "Luchesar V. ILIEV" <luchesar.iliev@gmail.com>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:7.0.1) Gecko/20111002 Thunderbird/7.0.1
MIME-Version: 1.0
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
References: <CACh33Fpz=uAp8h0Bjsi1Be=ob_94jXtN51mAHvGPkReY5MpTcg@mail.gmail.com>
	<4E9AE725.4040001@gmail.com>
	<169E82FD-3B61-4CAB-B067-D380D69CDED5@digsys.bg>
	<4E9B1C1E.7090804@gmail.com>
	<20111016183003.GA29466@icarus.home.lan>
In-Reply-To: <20111016183003.GA29466@icarus.home.lan>
X-Enigmail-Version: undefined
OpenPGP: id=9A1FEEFF;
	url=https://cert.acad.bg/pgp-keys/
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
Subject: Re: [ZFS] Using SSD with partitions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Oct 2011 18:52:12 -0000

On 16/10/2011 21:30, Jeremy Chadwick wrote:
> On Sun, Oct 16, 2011 at 09:02:06PM +0300, Luchesar V. ILIEV wrote:
>> On 16/10/2011 19:17, Daniel Kalchev wrote:
>>>
>>> On Oct 16, 2011, at 17:16 , Luchesar V. ILIEV wrote:
>>>
>>>> 6. If, OTOH, you're running a reasonably recent -STABLE (8 or 9),
>>>> then your zpool version is likely 28 (thanks, pjd@), which means
>>>> ZIL is not that scary, but you might still lose some data. Even an
>>>> unexpected power failure might cause trouble, unless the SSD is
>>>> designed to handle it gracefully (this typically involves some sort
>>>> of capacitor).
>>>
>>> Just for the record: even without ZIL, you will most definitely lose
>>> data at power outage. In most cases, this will not damage the ZFS
>>> filesystem, but data will be lost. There is nothing that can prevent
>>> this.
>>>
>>> Therefore, with ZFS v28, adding ZIL does not introduce any more risk
>>> to your data.
>>
>> I might be wrong in my interpretation, but from what I remember, when
>> the power goes down, an unprotected SSD is likely to lose _more_ data
>> than simply its write buffers -- that's quite unlike a hard-drive. So
>> much, in fact, that the whole ZIL might become corrupted (and that's
>> potentially way more data than any device cache).
>>
>> _If_ that's true, then isn't an array of only "conventional" HDDs, where
>> the ZIL is interleaved with the zpool itself, at least a bit safer from
>> power failures? Again, if we are taking the cheaper SSDs into account.
> 
> Please expand on the above, providing reference materials or links to
> things you've read that help shed light on all of this.  More
> specifically:

I haven't really dug that much into that. Apart from general comments
(mostly on the OpenSolaris forums), the most technical (and academic)
source of information is the paper that I already quoted:

Hung-Wei Tseng, Laura M. Grupp, Steven Swanson, "Understanding the
Impact of Power Loss on Flash Memory", DCSE-UCSD.

http://cseweb.ucsd.edu/users/swanson/papers/DAC2011PowerCut.pdf

> 1) I would like a definition of what "unprotected SSD" means and what a
> "protected SSD" is.

Let me better quote again, "Many high-end SSDs have backup batteries or
capacitors to ensure that operations complete even if power fails. Our
results argue that these systems should provide power until the chip
signals that the operation is finished rather than until the data
appears to be correct. Low-end SSDs and embedded systems, however, often
do not contain backup power sources due to cost or space constraints,
and these systems must be extremely careful to prevent data loss and/or
reduced reliability after a power failure."

> 2) I would like an explanation as to what "SSDs are more likely than an
> MHDD to lose data on a power outage" means exactly (on a technical
> level, not something vague) and from where you got this interpretation.

Again, to quote "The flash memory devices we studied in this work
demonstrated unexpected behavior when power failure occurs. The error
rates do not always decrease as the operation proceeds, and power
failure can corrupt the data from operations that completed
successfully. We also found that relying on blocks that have been
programmed or erased during a power failure is unreliable, even if the
data appears to be intact."

I'd actually be interested to hear what the more experienced folks here
think about this; however, again, it's probably not right to hijack the
current thread.

Cheers,
Luchesar

From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 16 19:10:41 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8CD33106566C
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 19:10:41 +0000 (UTC)
	(envelope-from florian@wagner-flo.net)
Received: from umbracor.wagner-flo.net (umbracor.wagner-flo.net
	[213.165.81.202])
	by mx1.freebsd.org (Postfix) with ESMTP id 4BE9C8FC0C
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 19:10:41 +0000 (UTC)
Received: from naclador.mos32.de (ppp-88-217-87-168.dynamic.mnet-online.de
	[88.217.87.168])
	by umbracor.wagner-flo.net (Postfix) with ESMTPSA id EFE9F3C06145
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 21:10:42 +0200 (CEST)
Date: Sun, 16 Oct 2011 21:10:38 +0200
From: Florian Wagner <florian@wagner-flo.net>
To: freebsd-fs@freebsd.org
Message-ID: <20111016211038.05de98d2@naclador.mos32.de>
In-Reply-To: <4E9B27D8.70106@gmail.com>
References: <CACh33Fpz=uAp8h0Bjsi1Be=ob_94jXtN51mAHvGPkReY5MpTcg@mail.gmail.com>
	<4E9AE725.4040001@gmail.com>
	<169E82FD-3B61-4CAB-B067-D380D69CDED5@digsys.bg>
	<4E9B1C1E.7090804@gmail.com>
	<20111016183003.GA29466@icarus.home.lan> <4E9B27D8.70106@gmail.com>
X-Mailer: Claws Mail 3.7.9 (GTK+ 2.24.5; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=PGP-SHA1;
	boundary="Sig_//AIY9Dwzj2aC435CJeGIEvF";
	protocol="application/pgp-signature"
Subject: Re: [ZFS] Using SSD with partitions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Oct 2011 19:10:41 -0000

--Sig_//AIY9Dwzj2aC435CJeGIEvF
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

> > 1) I would like a definition of what "unprotected SSD" means and
> > what a "protected SSD" is.
>=20
> Let me better quote again, "Many high-end SSDs have backup batteries
> or capacitors to ensure that operations complete even if power fails.
> Our results argue that these systems should provide power until the
> chip signals that the operation is finished rather than until the data
> appears to be correct. Low-end SSDs and embedded systems, however,
> often do not contain backup power sources due to cost or space
> constraints, and these systems must be extremely careful to prevent
> data loss and/or reduced reliability after a power failure."

I can provide a practical data point on this. I've tested power failure
with some Corsair Force SSD. I've used one of the tools available for
that. The process goes like that:

 1. Start some kind of server application which waits for messages.
 2. Start a client application which in a loop does:
    a. Write a block of data to disk.
    b. Call fsync/fdatasync to make sure the written data is on this.
       This systemcall sould block the application until all layers
       (including) the disk driver and thus the disk signal the write
       has completed.
    c. Send a message to the server which then displays the block
       number written.
 3. Cut power to the SSD.

A correctly behaving drive should have at least as many data blocks on
disk as are displayed in the server application. Sometimes even more.

For the tested SSD data blocks amounting to about 1 to 1.2 MB of data
were consistently missing even thought they were signaled to be on disk.

Care was taken to ensure that all involved OS subsystems were capable
of handling the fsync/fdatasync calls correctly and passing them
to lower layers (which is not the case for all filesystems in older
versions of Linux for example).

I've just recently repeated the test for a Intel 320 drive (the 120 GB
version, but should be the same for all models) which includes a set of
capacitors. These exhibit correct behavior. No missing data for about a
dozen trials.


Regards
Florian Wagner

--Sig_//AIY9Dwzj2aC435CJeGIEvF
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)

iEYEARECAAYFAk6bLC4ACgkQLvW/2gp2pPxruACfVVrvI1bUhcgdGidrTQsd66AW
Dp8AoIu4cYPoD5XMH/vQgllpc1uXglqG
=BCeM
-----END PGP SIGNATURE-----

--Sig_//AIY9Dwzj2aC435CJeGIEvF--

From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 16 20:03:12 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 84F811065676
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 20:03:12 +0000 (UTC)
	(envelope-from julian@freebsd.org)
Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16])
	by mx1.freebsd.org (Postfix) with ESMTP id 560748FC13
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 20:03:12 +0000 (UTC)
Received: from julian-mac.elischer.org (home-nat.elischer.org [67.100.89.137])
	(authenticated bits=0)
	by vps1.elischer.org (8.14.4/8.14.4) with ESMTP id p9GK2ri7046326
	(version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
	Sun, 16 Oct 2011 13:02:58 -0700 (PDT)
	(envelope-from julian@freebsd.org)
Message-ID: <4E9B3876.2010809@freebsd.org>
Date: Sun, 16 Oct 2011 13:03:02 -0700
From: Julian Elischer <julian@freebsd.org>
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.4; en-US;
	rv:1.9.2.23) Gecko/20110920 Thunderbird/3.1.15
MIME-Version: 1.0
To: "Luchesar V. ILIEV" <luchesar.iliev@gmail.com>
References: <CACh33Fpz=uAp8h0Bjsi1Be=ob_94jXtN51mAHvGPkReY5MpTcg@mail.gmail.com>	<4E9AE725.4040001@gmail.com>	<169E82FD-3B61-4CAB-B067-D380D69CDED5@digsys.bg>	<4E9B1C1E.7090804@gmail.com>	<14628DFB-AA3E-4D2D-9D4F-723B6327B6C0@digsys.bg>
	<4E9B2509.3030701@gmail.com>
In-Reply-To: <4E9B2509.3030701@gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Cc: freebsd-fs@freebsd.org
Subject: Re: [ZFS] Using SSD with partitions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Oct 2011 20:03:12 -0000

On 10/16/11 11:40 AM, Luchesar V. ILIEV wrote:
>
> That's true, but my understanding is that the differences go further
> beyond that. To quote one paper: "Our data show that flash memory’s
> behavior under power failure is surprising in several ways. First,
> operations that come closer to completion do not necessarily exhibit
> fewer bit errors. Second, power failure not only results in failure of
> the operation in progress, it can also corrupt data already present in
> the flash device. Third, power failure can negatively impact the
> integrity of future data written to the device."


However one must not confuse Flash memeory with drives using flash memory.
part of the added value brought to the tabel by SSD manufacturers is  
(SHOULD BE)
the addition of mechanisms to cope with unexpected power down. whether 
it be
hold-up caoacitors, battery backed up ram or any other mechanism they 
can think of.

Certainly Fusion-IO Flash cards will never lose data that they have
reported as having been written.

> http://cseweb.ucsd.edu/users/swanson/papers/DAC2011PowerCut.pdf
>
> However, that's probably getting too academic (also beyond my own
> qualifications), and I wouldn't like to hijack the thread.
>
> Cheers,
> Luchesar
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>
>


From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 16 21:13:27 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EA5DD106566B
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 21:13:27 +0000 (UTC)
	(envelope-from bfriesen@simple.dallas.tx.us)
Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74])
	by mx1.freebsd.org (Postfix) with ESMTP id 9017E8FC14
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 21:13:27 +0000 (UTC)
Received: from freddy.simplesystems.org (freddy.simplesystems.org
	[65.66.246.65])
	by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id
	p9GLDQ4A005157; Sun, 16 Oct 2011 16:13:26 -0500 (CDT)
Date: Sun, 16 Oct 2011 16:13:26 -0500 (CDT)
From: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
X-X-Sender: bfriesen@freddy.simplesystems.org
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
In-Reply-To: <20111016183003.GA29466@icarus.home.lan>
Message-ID: <alpine.GSO.2.01.1110161607120.4501@freddy.simplesystems.org>
References: <CACh33Fpz=uAp8h0Bjsi1Be=ob_94jXtN51mAHvGPkReY5MpTcg@mail.gmail.com>
	<4E9AE725.4040001@gmail.com>
	<169E82FD-3B61-4CAB-B067-D380D69CDED5@digsys.bg>
	<4E9B1C1E.7090804@gmail.com>
	<20111016183003.GA29466@icarus.home.lan>
User-Agent: Alpine 2.01 (GSO 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2
	(blade.simplesystems.org [65.66.246.90]);
	Sun, 16 Oct 2011 16:13:26 -0500 (CDT)
Cc: freebsd-fs@freebsd.org
Subject: Re: [ZFS] Using SSD with partitions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Oct 2011 21:13:28 -0000

On Sun, 16 Oct 2011, Jeremy Chadwick wrote:
>
> 2) I would like an explanation as to what "SSDs are more likely than an
> MHDD to lose data on a power outage" means exactly (on a technical
> level, not something vague) and from where you got this interpretation.

The reason is that normal operation of the SSD will move and/or 
rewrite existing data, which is also likely to be much older than the 
data currently being written.  Common reasons are wear leveling, 
garbage collection (compacting) and because the block written is not 
identically sized and aligned with the SSDs native underlying blocks.

While data is being re-written, moved, or copied, a copy resides in 
RAM.  A SSD which is more defensive about avoiding corrupting old data 
is also likely to be slower to synchronously write.

There are certainly algorithms (e.g. as used by zfs) which can help an 
SSD avoid issues.

Bob
-- 
Bob Friesenhahn
bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 16 21:49:42 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E9D071065674
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 21:49:41 +0000 (UTC)
	(envelope-from spork@bway.net)
Received: from xena.bway.net (xena.bway.net [216.220.96.26])
	by mx1.freebsd.org (Postfix) with ESMTP id AF8668FC0A
	for <freebsd-fs@freebsd.org>; Sun, 16 Oct 2011 21:49:41 +0000 (UTC)
Received: (qmail 13010 invoked by uid 0); 16 Oct 2011 21:23:00 -0000
Received: from smtp.bway.net (216.220.96.25)
	by xena.bway.net with (DHE-RSA-AES256-SHA encrypted) SMTP;
	16 Oct 2011 21:23:00 -0000
Received: (qmail 13002 invoked by uid 90); 16 Oct 2011 21:23:00 -0000
Received: from unknown (HELO ?10.3.2.40?) (spork@96.57.144.66)
	by smtp.bway.net with (AES128-SHA encrypted) SMTP;
	16 Oct 2011 21:23:00 -0000
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
From: Charles Sprickman <spork@bway.net>
In-Reply-To: <20111016211038.05de98d2@naclador.mos32.de>
Date: Sun, 16 Oct 2011 17:22:59 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <D11E1330-6FB5-462F-8878-BF70AD77B7B4@bway.net>
References: <CACh33Fpz=uAp8h0Bjsi1Be=ob_94jXtN51mAHvGPkReY5MpTcg@mail.gmail.com>
	<4E9AE725.4040001@gmail.com>
	<169E82FD-3B61-4CAB-B067-D380D69CDED5@digsys.bg>
	<4E9B1C1E.7090804@gmail.com>
	<20111016183003.GA29466@icarus.home.lan> <4E9B27D8.70106@gmail.com>
	<20111016211038.05de98d2@naclador.mos32.de>
To: Florian Wagner <florian@wagner-flo.net>
X-Mailer: Apple Mail (2.1084)
Cc: freebsd-fs@freebsd.org
Subject: Re: [ZFS] Using SSD with partitions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Oct 2011 21:49:42 -0000


On Oct 16, 2011, at 3:10 PM, Florian Wagner wrote:

>>> 1) I would like a definition of what "unprotected SSD" means and
>>> what a "protected SSD" is.
>>=20
>> Let me better quote again, "Many high-end SSDs have backup batteries
>> or capacitors to ensure that operations complete even if power fails.
>> Our results argue that these systems should provide power until the
>> chip signals that the operation is finished rather than until the =
data
>> appears to be correct. Low-end SSDs and embedded systems, however,
>> often do not contain backup power sources due to cost or space
>> constraints, and these systems must be extremely careful to prevent
>> data loss and/or reduced reliability after a power failure."
>=20
> I can provide a practical data point on this. I've tested power =
failure
> with some Corsair Force SSD. I've used one of the tools available for
> that. The process goes like that:
>=20
> 1. Start some kind of server application which waits for messages.
> 2. Start a client application which in a loop does:
>    a. Write a block of data to disk.
>    b. Call fsync/fdatasync to make sure the written data is on this.
>       This systemcall sould block the application until all layers
>       (including) the disk driver and thus the disk signal the write
>       has completed.
>    c. Send a message to the server which then displays the block
>       number written.
> 3. Cut power to the SSD.
>=20
> A correctly behaving drive should have at least as many data blocks on
> disk as are displayed in the server application. Sometimes even more.
>=20
> For the tested SSD data blocks amounting to about 1 to 1.2 MB of data
> were consistently missing even thought they were signaled to be on =
disk.
>=20
> Care was taken to ensure that all involved OS subsystems were capable
> of handling the fsync/fdatasync calls correctly and passing them
> to lower layers (which is not the case for all filesystems in older
> versions of Linux for example).
>=20
> I've just recently repeated the test for a Intel 320 drive (the 120 GB
> version, but should be the same for all models) which includes a set =
of
> capacitors. These exhibit correct behavior. No missing data for about =
a
> dozen trials.

This sounds like diskchecker:
http://brad.livejournal.com/2116715.html

There are finally some affordable drives (eg: $100 40GB for ZIL)
that incorporate a capacitor that allows the drive to flush cache
to flash.  My understanding is that the Intel 320 series have this
and supposedly some of the OCZ drives (Vertex Pro 2, Vertex Pro 3).
Intel is the only one I'm finding that has an explicit declaration
that data is safely flushed though:

=
http://newsroom.intel.com/servlet/JiveServlet/download/38-4324/Intel_SSD_3=
20_Series_Enhance_Power_Loss_Technology_Brief.pdf

The PostgreSQL lists have some interesting info, as they are pretty
conservative about the definition of "safe" writes.  Also one of
the devs posted this earlier this year:

=
http://blog.2ndquadrant.com/en/2011/04/intel-ssd-now-off-the-sherr-sh.html=


For ZFS ZIL use, it also seems like mirroring is generally
recommended.  Since ZIL doesn't benefit from large drives, the
lowest priced Intel 320 x2 is not a bank-breaker.

Charles


>=20
> Regards
> Florian Wagner


From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 16 22:48:44 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 06F95106566B;
	Sun, 16 Oct 2011 22:48:44 +0000 (UTC)
	(envelope-from eadler@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id D32648FC0C;
	Sun, 16 Oct 2011 22:48:43 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p9GMmh1T062070;
	Sun, 16 Oct 2011 22:48:43 GMT
	(envelope-from eadler@freefall.freebsd.org)
Received: (from eadler@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p9GMmhqk062066;
	Sun, 16 Oct 2011 22:48:43 GMT (envelope-from eadler)
Date: Sun, 16 Oct 2011 22:48:43 GMT
Message-Id: <201110162248.p9GMmhqk062066@freefall.freebsd.org>
To: eadler@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org
From: eadler@FreeBSD.org
Cc: 
Subject: Re: kern/161674: [ufs] snapshot on journaled ufs doesn't work
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Oct 2011 22:48:44 -0000

Old Synopsis: snapshot on journaled ufs doesn't work
New Synopsis: [ufs] snapshot on journaled ufs doesn't work

Responsible-Changed-From-To: freebsd-bugs->freebsd-fs
Responsible-Changed-By: eadler
Responsible-Changed-When: Sun Oct 16 22:48:01 UTC 2011
Responsible-Changed-Why: 
over to maintainer

http://www.freebsd.org/cgi/query-pr.cgi?pr=161674

From owner-freebsd-fs@FreeBSD.ORG  Sun Oct 16 23:13:53 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EA2C7106566C;
	Sun, 16 Oct 2011 23:13:53 +0000 (UTC)
	(envelope-from eadler@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id C24788FC12;
	Sun, 16 Oct 2011 23:13:53 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p9GNDr6W089436;
	Sun, 16 Oct 2011 23:13:53 GMT
	(envelope-from eadler@freefall.freebsd.org)
Received: (from eadler@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p9GNDr2D089432;
	Sun, 16 Oct 2011 23:13:53 GMT (envelope-from eadler)
Date: Sun, 16 Oct 2011 23:13:53 GMT
Message-Id: <201110162313.p9GNDr2D089432@freefall.freebsd.org>
To: eadler@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org
From: eadler@FreeBSD.org
Cc: 
Subject: Re: kern/161205: [nfs] [pfsync] [regression] [build] Bug report
	freebsd 9.0 beta 3 
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Oct 2011 23:13:54 -0000

Old Synopsis: [build] Bug report freebsd 9.0 beta 3 [nfs] [pfsync] [regression]
New Synopsis: [nfs] [pfsync] [regression] [build] Bug report freebsd 9.0 beta 3 

Responsible-Changed-From-To: freebsd-bugs->freebsd-fs
Responsible-Changed-By: eadler
Responsible-Changed-When: Sun Oct 16 23:12:53 UTC 2011
Responsible-Changed-Why: 
over to maintainer

http://www.freebsd.org/cgi/query-pr.cgi?pr=161205

From owner-freebsd-fs@FreeBSD.ORG  Mon Oct 17 08:24:59 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 97C481065673
	for <freebsd-fs@freebsd.org>; Mon, 17 Oct 2011 08:24:59 +0000 (UTC)
	(envelope-from maurizio.vairani@cloverinformatica.it)
Received: from smtplq03.aruba.it (smtplq-out18.aruba.it [62.149.158.38])
	by mx1.freebsd.org (Postfix) with SMTP id 081A88FC13
	for <freebsd-fs@freebsd.org>; Mon, 17 Oct 2011 08:24:57 +0000 (UTC)
Received: (qmail 20766 invoked by uid 89); 17 Oct 2011 08:24:55 -0000
Received: from unknown (HELO smtp7.aruba.it) (62.149.158.227)
	by smtplq03.aruba.it with SMTP; 17 Oct 2011 08:24:55 -0000
Received: (qmail 2057 invoked by uid 89); 17 Oct 2011 08:24:55 -0000
Received: from unknown (HELO clover.dyndns.biz)
	(info@cloverinformatica.it@151.55.127.210)
	by smtp7.ad.aruba.it with SMTP; 17 Oct 2011 08:24:54 -0000
Received: from [192.168.0.185] ([192.168.0.185]) by clover.dyndns.biz
	; Mon, 17 Oct 2011 10:24:53 +0200
Message-ID: <4E9BE653.7070008@cloverinformatica.it>
Date: Mon, 17 Oct 2011 10:24:51 +0200
From: Maurizio Vairani <maurizio.vairani@cloverinformatica.it>
User-Agent: Mozilla/5.0 (Windows NT 5.1;
	rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1
MIME-Version: 1.0
To: Gleb Kurtsou <gleb.kurtsou@gmail.com>, freebsd-fs@freebsd.org
X-Spam-Rating: smtp7.ad.aruba.it 1.6.2 0/1000/N
X-Spam-Rating: smtplq03.aruba.it 1.6.2 0/1000/N
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: 
Subject: [TMPFS] patch for FreeBSD 8.2-RELEASE
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Oct 2011 08:24:59 -0000

Hi list,

Gleb Kurtsou in this thread 
http://lists.freebsd.org/pipermail/freebsd-fs/2011-October/012650.html 
proposes a patch for solving the well known TMPSF problem: the free 
space drops down to zero when ZFS consumes the kernel memory and there 
isn't enough free swap space.

Unfortunately the patch is not directly applicable to FreeBSD 
8.2-RELEASE so I have modified the source code using the Gleb's patch as 
reference, recompiled and installed the new driver. I am testing it for 
a week on my AMD64 16G RAM server reducing the swap space from 28G to 
8G, 4G or none and seems the the problem is solved.

Regards
-Maurizio

/sys/fs/tmpfs/tmpfs.h
===================================================================
--- tmpfs.h.orig    2010-12-21 18:09:00.000000000 +0100 (v 1.17.2.2.2.1)
+++ tmpfs.h    2011-10-13 15:16:26.900043000 +0200         (working copy)
@@ -304,10 +304,30 @@

  #define TMPFS_NODE_LOCK(node) mtx_lock(&(node)->tn_interlock)
  #define TMPFS_NODE_UNLOCK(node) mtx_unlock(&(node)->tn_interlock)
-#define        TMPFS_NODE_MTX(node) (&(node)->tn_interlock)
+#define TMPFS_NODE_MTX(node) (&(node)->tn_interlock)
+
+#ifdef INVARIANTS
+#define TMPFS_ASSERT_LOCKED(node) do {                    \
+        MPASS(node != NULL);                    \
+        MPASS(node->tn_vnode != NULL);                \
+        if (!VOP_ISLOCKED(node->tn_vnode) &&            \
+            !mtx_owned(TMPFS_NODE_MTX(node)))            \
+            panic("tmpfs: node is not locked: %p", node);    \
+    } while (0)
+#define TMPFS_ASSERT_ELOCKED(node) do {                    \
+        MPASS((node) != NULL);                    \
+        MPASS((node)->tn_vnode != NULL);            \
+        mtx_assert(TMPFS_NODE_MTX(node), MA_OWNED);        \
+        ASSERT_VOP_LOCKED((node)->tn_vnode, "tmpfs");        \
+    } while (0)
+#else
+#define TMPFS_ASSERT_LOCKED(node) (void)0
+#define TMPFS_ASSERT_ELOCKED(node) (void)0
+#endif

  #define TMPFS_VNODE_ALLOCATING    1
  #define TMPFS_VNODE_WANT    2
+#define TMPFS_VNODE_DOOMED    4
  /* 
--------------------------------------------------------------------- */

  /*
@@ -467,65 +487,30 @@
   * Memory management stuff.
   */

-/* Amount of memory pages to reserve for the system (e.g., to not use by
- * tmpfs).
- * XXX: Should this be tunable through sysctl, for instance? */
-#define TMPFS_PAGES_RESERVED (4 * 1024 * 1024 / PAGE_SIZE)
-
  /*
- * Returns information about the number of available memory pages,
- * including physical and virtual ones.
- *
- * If 'total' is TRUE, the value returned is the total amount of memory
- * pages configured for the system (either in use or free).
- * If it is FALSE, the value returned is the amount of free memory pages.
- *
- * Remember to remove TMPFS_PAGES_RESERVED from the returned value to avoid
- * excessive memory usage.
- *
+ * Number of reserved swap pages should not be lower than
+ * swap_pager_almost_full high water mark.
   */
+#define TMPFS_SWAP_MINRESERVED        1024
+
  static __inline size_t
-tmpfs_mem_info(void)
+tmpfs_pages_max(struct tmpfs_mount *tmp)
  {
-    size_t size;
-
-    size = swap_pager_avail + cnt.v_free_count + cnt.v_inactive_count;
-    size -= size > cnt.v_wire_count ? cnt.v_wire_count : size;
-    return size;
+    return (tmp->tm_pages_max);
  }

-/* Returns the maximum size allowed for a tmpfs file system.  This macro
- * must be used instead of directly retrieving the value from tm_pages_max.
- * The reason is that the size of a tmpfs file system is dynamic: it lets
- * the user store files as long as there is enough free memory (including
- * physical memory and swap space).  Therefore, the amount of memory to be
- * used is either the limit imposed by the user during mount time or the
- * amount of available memory, whichever is lower.  To avoid consuming all
- * the memory for a given mount point, the system will always reserve a
- * minimum of TMPFS_PAGES_RESERVED pages, which is also taken into account
- * by this macro (see above). */
  static __inline size_t
-TMPFS_PAGES_MAX(struct tmpfs_mount *tmp)
+tmpfs_pages_used(struct tmpfs_mount *tmp)
  {
-    size_t freepages;
-
-    freepages = tmpfs_mem_info();
-    freepages -= freepages < TMPFS_PAGES_RESERVED ?
-        freepages : TMPFS_PAGES_RESERVED;
-
-    return MIN(tmp->tm_pages_max, freepages + tmp->tm_pages_used);
+    const size_t node_size = sizeof(struct tmpfs_node) +
+        sizeof(struct tmpfs_dirent);
+    size_t meta_pages;
+
+    meta_pages = howmany((uintmax_t)tmp->tm_nodes_inuse * node_size,
+        PAGE_SIZE);
+    return (meta_pages + tmp->tm_pages_used);
  }

-/* Returns the available space for the given file system. */
-#define TMPFS_META_PAGES(tmp) (howmany((tmp)->tm_nodes_inuse * 
(sizeof(struct tmpfs_node) \
-                + sizeof(struct tmpfs_dirent)), PAGE_SIZE))
-#define TMPFS_FILE_PAGES(tmp) ((tmp)->tm_pages_used)
-
-#define TMPFS_PAGES_AVAIL(tmp) (TMPFS_PAGES_MAX(tmp) > \
-            TMPFS_META_PAGES(tmp)+TMPFS_FILE_PAGES(tmp)? \
-            TMPFS_PAGES_MAX(tmp) - TMPFS_META_PAGES(tmp) \
-            - TMPFS_FILE_PAGES(tmp):0)
-
  #endif

  /* 
--------------------------------------------------------------------- */

/sys/fs/tmpfs/tmpfs_subr.c
===================================================================
--- tmpfs_subr.c.orig    2010-12-21 18:09:00.000000000 +0100 (v 
1.23.2.2.2.1)
+++ tmpfs_subr.c    2011-10-06 14:31:26.007163000 +0200     (working copy)
@@ -41,6 +41,7 @@
  #include <sys/priv.h>
  #include <sys/proc.h>
  #include <sys/stat.h>
+#include <sys/sysctl.h>
  #include <sys/systm.h>
  #include <sys/vnode.h>
  #include <sys/vmmeter.h>
@@ -55,6 +56,60 @@
  #include <fs/tmpfs/tmpfs_fifoops.h>
  #include <fs/tmpfs/tmpfs_vnops.h>

+static long tmpfs_swap_reserved = TMPFS_SWAP_MINRESERVED * 2;
+
+SYSCTL_NODE(_vfs, OID_AUTO, tmpfs, CTLFLAG_RW, 0, "tmpfs memory file 
system");
+
+static int
+sysctl_swap_reserved(SYSCTL_HANDLER_ARGS)
+{
+    int error;
+    long pages, bytes;
+
+    pages = *(long *)arg1;
+    bytes = pages * PAGE_SIZE;
+
+    error = sysctl_handle_long(oidp, &bytes, 0, req);
+    if (error || !req->newptr)
+        return (error);
+
+    pages = bytes / PAGE_SIZE;
+    if (pages < TMPFS_SWAP_MINRESERVED)
+        return (EINVAL);
+
+    *(long *)arg1 = pages;
+    return (0);
+}
+
+SYSCTL_PROC(_vfs_tmpfs, OID_AUTO, swap_reserved, CTLTYPE_LONG|CTLFLAG_RW,
+ &tmpfs_swap_reserved, 0, sysctl_swap_reserved, "L", "reserved swap 
space");
+
+static __inline size_t
+tmpfs_pages_avail(struct tmpfs_mount *tmp, size_t req_pages)
+{
+    vm_ooffset_t avail;
+
+    if (tmpfs_pages_max(tmp) < tmpfs_pages_used(tmp) + req_pages)
+        return (0);
+
+    if (!vm_page_count_target())
+        return (1);
+
+    /*
+     * Fail if pagedaemon wasn't able to free desired number of pages and
+     * we are running out of swap.
+     */
+    avail = swap_pager_avail - vm_paging_target() - req_pages;
+    if (avail < tmpfs_swap_reserved) {    /* avail is signed */
+        printf("tmpfs: low memory: available %jd, "
+            "paging target %d, requested %zd\n",
+            (intmax_t)swap_pager_avail, vm_paging_target(), req_pages);
+        return (0);
+    }
+
+    return (1);
+}
+
  /* 
--------------------------------------------------------------------- */

  /*
@@ -95,6 +150,8 @@

      if (tmp->tm_nodes_inuse > tmp->tm_nodes_max)
          return (ENOSPC);
+    if (tmpfs_pages_avail(tmp, 1) == 0)
+        return (ENOSPC);

      nnode = (struct tmpfs_node *)uma_zalloc_arg(
                  tmp->tm_node_pool, tmp, M_WAITOK);
@@ -882,7 +939,7 @@
      newpages = round_page(newsize) / PAGE_SIZE;

      if (newpages > oldpages &&
-        newpages - oldpages > TMPFS_PAGES_AVAIL(tmp)) {
+        tmpfs_pages_avail(tmp, newpages - oldpages) == 0) {
          error = ENOSPC;
          goto out;
      }


/sys/fs/tmpfs/tmpfs_vfsops.c
===================================================================
--- tmpfs_vfsops.c.orig    2010-12-21 18:09:00.000000000 +0100 (v 
1.21.2.1.6.1)
+++ tmpfs_vfsops.c    2011-10-07 14:10:15.137747000 +0200     (working copy)
@@ -85,53 +85,6 @@

  #define SWI_MAXMIB    3

-static u_int
-get_swpgtotal(void)
-{
-    struct xswdev xsd;
-    char *sname = "vm.swap_info";
-    int soid[SWI_MAXMIB], oid[2];
-    u_int unswdev, total, dmmax, nswapdev;
-    size_t mibi, len;
-
-    total = 0;
-
-    len = sizeof(dmmax);
-    if (kernel_sysctlbyname(curthread, "vm.dmmax", &dmmax, &len,
-                NULL, 0, NULL, 0) != 0)
-        return total;
-
-    len = sizeof(nswapdev);
-    if (kernel_sysctlbyname(curthread, "vm.nswapdev",
- &nswapdev, &len,
-                NULL, 0, NULL, 0) != 0)
-        return total;
-
-    mibi = (SWI_MAXMIB - 1) * sizeof(int);
-    oid[0] = 0;
-    oid[1] = 3;
-
-    if (kernel_sysctl(curthread, oid, 2,
-            soid, &mibi, (void *)sname, strlen(sname),
-            NULL, 0) != 0)
-        return total;
-
-    mibi = (SWI_MAXMIB - 1);
-    for (unswdev = 0; unswdev < nswapdev; ++unswdev) {
-        soid[mibi] = unswdev;
-        len = sizeof(struct xswdev);
-        if (kernel_sysctl(curthread,
-                soid, mibi + 1, &xsd, &len, NULL, 0,
-                NULL, 0) != 0)
-            return total;
-        if (len == sizeof(struct xswdev))
-            total += (xsd.xsw_nblks - dmmax);
-    }
-
-    /* Not Reached */
-    return total;
-}
-
  /* 
--------------------------------------------------------------------- */
  static int
  tmpfs_node_ctor(void *mem, int size, void *arg, int flags)
@@ -179,14 +132,13 @@
  static int
  tmpfs_mount(struct mount *mp)
  {
+    const size_t nodes_per_page = howmany(PAGE_SIZE,
+        sizeof(struct tmpfs_dirent) + sizeof(struct tmpfs_node));
      struct tmpfs_mount *tmp;
      struct tmpfs_node *root;
-    size_t pages, mem_size;
-    ino_t nodes;
+    u_quad_t pages;
+    u_quad_t nodes_max, size_max, maxfilesize;
      int error;
-    /* Size counters. */
-    ino_t    nodes_max;
-    size_t    size_max;

      /* Root node attributes. */
      uid_t    root_uid;
@@ -223,42 +175,55 @@
      if (mp->mnt_cred->cr_ruid != 0 ||
          vfs_scanopt(mp->mnt_optnew, "mode", "%ho", &root_mode) != 1)
          root_mode = va.va_mode;
-    if (vfs_scanopt(mp->mnt_optnew, "inodes", "%d", &nodes_max) != 1)
+    if (vfs_scanopt(mp->mnt_optnew, "inodes", "%qu", &nodes_max) != 1)
          nodes_max = 0;
      if (vfs_scanopt(mp->mnt_optnew, "size", "%qu", &size_max) != 1)
          size_max = 0;
-
-    /* Do not allow mounts if we do not have enough memory to preserve
-     * the minimum reserved pages. */
-    mem_size = cnt.v_free_count + cnt.v_inactive_count + get_swpgtotal();
-    mem_size -= mem_size > cnt.v_wire_count ? cnt.v_wire_count : mem_size;
-    if (mem_size < TMPFS_PAGES_RESERVED)
+    if (vfs_scanopt(mp->mnt_optnew, "maxfilesize", "%qu", &maxfilesize) 
!= 0)
+        maxfilesize = 0;
+    /*
+     * XXX Deny mounts if pagedaemon wasn't able to recovery desired
+     * number of pages.
+     */
+    if (vm_page_count_target())
          return ENOSPC;

      /* Get the maximum number of memory pages this file system is
       * allowed to use, based on the maximum size the user passed in
-     * the mount structure.  A value of zero is treated as if the
-     * maximum available space was requested. */
-    if (size_max < PAGE_SIZE || size_max >= SIZE_MAX)
-        pages = SIZE_MAX;
+     * the mount structure. Use half of RAM by default. */
+    if (size_max < PAGE_SIZE*4 || size_max > SIZE_MAX - PAGE_SIZE)
+        pages = cnt.v_page_count / 2;
      else
          pages = howmany(size_max, PAGE_SIZE);
      MPASS(pages > 0);
+    MPASS(pages < SIZE_MAX);

-    if (nodes_max <= 3)
-        nodes = 3 + pages * PAGE_SIZE / 1024;
+    if (pages < SIZE_MAX / PAGE_SIZE)
+        size_max = pages * PAGE_SIZE;
      else
-        nodes = nodes_max;
-    MPASS(nodes >= 3);
+        size_max = SIZE_MAX;
+
+    if (nodes_max <= 3) {
+        if (pages < UINT32_MAX / nodes_per_page)
+            nodes_max = pages * nodes_per_page;
+         else
+            nodes_max = UINT32_MAX;
+    }
+    if (nodes_max > UINT32_MAX)
+        nodes_max = UINT32_MAX;
+    MPASS(nodes_max >= 3);
+
+    if (maxfilesize < PAGE_SIZE || maxfilesize > size_max)
+        maxfilesize = size_max;

      /* Allocate the tmpfs mount structure and fill it. */
      tmp = (struct tmpfs_mount *)malloc(sizeof(struct tmpfs_mount),
          M_TMPFSMNT, M_WAITOK | M_ZERO);

      mtx_init(&tmp->allnode_lock, "tmpfs allnode lock", NULL, MTX_DEF);
-    tmp->tm_nodes_max = nodes;
+    tmp->tm_nodes_max = nodes_max;
      tmp->tm_nodes_inuse = 0;
-    tmp->tm_maxfilesize = (u_int64_t)(cnt.v_page_count + 
get_swpgtotal()) * PAGE_SIZE;
+    tmp->tm_maxfilesize = maxfilesize;
      LIST_INIT(&tmp->tm_nodes_used);

      tmp->tm_pages_max = pages;
@@ -427,22 +392,23 @@
  static int
  tmpfs_statfs(struct mount *mp, struct statfs *sbp)
  {
-    fsfilcnt_t freenodes;
      struct tmpfs_mount *tmp;
+    size_t used;

      tmp = VFS_TO_TMPFS(mp);

      sbp->f_iosize = PAGE_SIZE;
      sbp->f_bsize = PAGE_SIZE;

-    sbp->f_blocks = TMPFS_PAGES_MAX(tmp);
-    sbp->f_bavail = sbp->f_bfree = TMPFS_PAGES_AVAIL(tmp);
-
-    freenodes = MIN(tmp->tm_nodes_max - tmp->tm_nodes_inuse,
-        TMPFS_PAGES_AVAIL(tmp) * PAGE_SIZE / sizeof(struct tmpfs_node));
-
-    sbp->f_files = freenodes + tmp->tm_nodes_inuse;
-    sbp->f_ffree = freenodes;
+    sbp->f_blocks = tmpfs_pages_max(tmp);
+    used = tmpfs_pages_used(tmp);
+    if (tmpfs_pages_max(tmp) <= used)
+        sbp->f_bavail = 0;
+    else
+        sbp->f_bavail = tmpfs_pages_max(tmp) - used;
+    sbp->f_bfree = sbp->f_bavail;
+    sbp->f_files = tmp->tm_nodes_max;
+    sbp->f_ffree = tmp->tm_nodes_max - tmp->tm_nodes_inuse;
      /* sbp->f_owner = tmp->tn_uid; */

      return 0;


From owner-freebsd-fs@FreeBSD.ORG  Mon Oct 17 08:50:42 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9179A106566B
	for <freebsd-fs@freebsd.org>; Mon, 17 Oct 2011 08:50:42 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from lo.gmane.org (lo.gmane.org [80.91.229.12])
	by mx1.freebsd.org (Postfix) with ESMTP id 1C66E8FC0C
	for <freebsd-fs@freebsd.org>; Mon, 17 Oct 2011 08:50:40 +0000 (UTC)
Received: from list by lo.gmane.org with local (Exim 4.69)
	(envelope-from <freebsd-fs@m.gmane.org>) id 1RFiu7-0001HU-91
	for freebsd-fs@freebsd.org; Mon, 17 Oct 2011 10:50:39 +0200
Received: from lara.cc.fer.hr ([161.53.72.113])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Mon, 17 Oct 2011 10:50:39 +0200
Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Mon, 17 Oct 2011 10:50:39 +0200
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-fs@freebsd.org
From: Ivan Voras <ivoras@freebsd.org>
Date: Mon, 17 Oct 2011 10:50:17 +0200
Lines: 50
Message-ID: <j7gq8f$uq3$1@dough.gmane.org>
References: <4E97FEDD.7060205@quip.cz> <j7938v$66s$1@dough.gmane.org>
	<4E982C0E.2060900@quip.cz>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature";
	boundary="------------enig77B17ED0F6D5B60FA9E29D3C"
X-Complaints-To: usenet@dough.gmane.org
X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:7.0.1) Gecko/20111004 Thunderbird/7.0.1
In-Reply-To: <4E982C0E.2060900@quip.cz>
X-Enigmail-Version: 1.1.2
Subject: Re: dirhash and dynamic memory allocation
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Oct 2011 08:50:42 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig77B17ED0F6D5B60FA9E29D3C
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 14/10/2011 14:33, Miroslav Lachman wrote:
>=20
>=20
> Ivan Voras wrote:
>> On 14/10/2011 11:20, Miroslav Lachman wrote:
>>> Hi all,
>>>
>>> I tried some tuning of dirhash on our servers and after googlig a bit=
, I
>>> found an old GSoC project wiki page about Dynamic Memory Allocation f=
or
>>> Dirhash: http://wiki.freebsd.org/DirhashDynamicMemory
>>> Is there any reason not to use it / not commit it to HEAD?
>>
>> AFAIK it's sort-of already present. In 8-stable and recent kernels you=

>> can give huge amounts of memory to dirhash via vfs.ufs.dirhash_maxmem
>> (but except in really large edge cases I don't think you *need* more
>> than 32 MB), and the kernel will scale-down or free the memory if not
>> needed.
>=20
> Is this change documented somewhere? Maybe it could be noticed on
> DirhashDynamicMemory wiki page. Otherwise it seems as abandoned GSoC
> project.

I'm not touching the wiki page because I don't know really if the
functionality was committed from the GSoC project or from somewhere
else; what I did was much later and much smaller in scope.


--------------enig77B17ED0F6D5B60FA9E29D3C
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk6b7E8ACgkQldnAQVacBcgHNQCgkhqum8SWEg2epPVLWg73sKvu
MVoAnjoQ6vkH5kDl/Ac/jvh41pLSzAak
=3uJw
-----END PGP SIGNATURE-----

--------------enig77B17ED0F6D5B60FA9E29D3C--


From owner-freebsd-fs@FreeBSD.ORG  Mon Oct 17 11:07:02 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 18A0A1065673
	for <freebsd-fs@FreeBSD.org>; Mon, 17 Oct 2011 11:07:02 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 06AE48FC1D
	for <freebsd-fs@FreeBSD.org>; Mon, 17 Oct 2011 11:07:02 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p9HB727M099209
	for <freebsd-fs@FreeBSD.org>; Mon, 17 Oct 2011 11:07:02 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p9HB71nh099207
	for freebsd-fs@FreeBSD.org; Mon, 17 Oct 2011 11:07:01 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 17 Oct 2011 11:07:01 GMT
Message-Id: <201110171107.p9HB71nh099207@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
	owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@FreeBSD.org>
To: freebsd-fs@FreeBSD.org
Cc: 
Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Oct 2011 11:07:02 -0000

Note: to view an individual PR, use:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=(number).

The following is a listing of current problems submitted by FreeBSD users.
These represent problem reports covering all versions including
experimental development code and obsolete releases.


S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/161674  fs         [ufs] snapshot on journaled ufs doesn't work
o kern/161579  fs         [smbfs] FreeBSD sometimes panics when an smb share is 
o kern/161533  fs         [zfs] [panic] zfs receive panic: system ioctl returnin
o kern/161511  fs         [unionfs] Filesystem deadlocks when using multiple uni
o kern/161493  fs         [nfs] NFS v3 directory structure update slow
o kern/161438  fs         [zfs] [panic] recursed on non-recursive spa_namespace_
o kern/161424  fs         [nullfs] __getcwd() calls fail when used on nullfs mou
o kern/161280  fs         [zfs] Stack overflow in gptzfsboot
o kern/161205  fs         [nfs] [pfsync] [regression] [build] Bug report freebsd
o kern/161169  fs         [zfs] [panic] ZFS causes kernel panic in dbuf_dirty
o kern/161112  fs         [ufs] [lor] filesystem LOR in FreeBSD 9.0-BETA3
o kern/160893  fs         [zfs] [panic] 9.0-BETA2 kernel panic
o kern/160860  fs         Random UFS root filesystem corruption with SU+J [regre
o kern/160801  fs         [zfs] zfsboot on 8.2-RELEASE fails to boot from root-o
o kern/160790  fs         [fusefs] [panic] VPUTX: negative ref count with FUSE
o kern/160777  fs         [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/impo
o kern/160706  fs         [zfs] zfs bootloader fails when a non-root vdev exists
o kern/160591  fs         [zfs] Fail to boot on zfs root with degraded raidz2 [r
o kern/160410  fs         [smbfs] [hang] smbfs hangs when transferring large fil
o kern/160283  fs         [zfs] [patch] 'zfs list' does abort in make_dataset_ha
o kern/159971  fs         [ffs] [panic] panic with soft updates journaling durin
o kern/159930  fs         [ufs] [panic] kernel core
o kern/159418  fs         [tmpfs] [panic] tmpfs kernel panic: recursing on non r
o kern/159402  fs         [zfs][loader] symlinks cause I/O errors
o kern/159357  fs         [zfs] ZFS MAXNAMELEN macro has confusing name (off-by-
o kern/159356  fs         [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s
o kern/159351  fs         [nfs] [patch] - divide by zero in mountnfs()
o kern/159251  fs         [zfs] [request]: add FLETCHER4 as DEDUP hash option
o kern/159233  fs         [ext2fs] [patch] fs/ext2fs: finish reallocblk implemen
o kern/159232  fs         [ext2fs] [patch] fs/ext2fs: merge ext2_readwrite into 
o kern/159077  fs         [zfs] Can't cd .. with latest zfs version
o kern/159048  fs         [smbfs] smb mount corrupts large files
o kern/159045  fs         [zfs] [hang] ZFS scrub freezes system
o kern/158839  fs         [zfs] ZFS Bootloader Fails if there is a Dead Disk
o kern/158802  fs         [amd] amd(8) ICMP storm and unkillable process.
o kern/158711  fs         [ffs] [panic] panic in ffs_blkfree and ffs_valloc
o kern/158231  fs         [nullfs] panic on unmounting nullfs mounted over ufs o
f kern/157929  fs         [nfs] NFS slow read
o kern/157722  fs         [geli] unable to newfs a geli encrypted partition
o kern/157399  fs         [zfs] trouble with: mdconfig force delete && zfs strip
o kern/157179  fs         [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov
o kern/156797  fs         [zfs] [panic] Double panic with FreeBSD 9-CURRENT and 
o kern/156781  fs         [zfs] zfs is losing the snapshot directory,
p kern/156545  fs         [ufs] mv could break UFS on SMP systems
o kern/156193  fs         [ufs] [hang] UFS snapshot hangs && deadlocks processes
o kern/156168  fs         [nfs] [panic] Kernel panic under concurrent access ove
o kern/156039  fs         [nullfs] [unionfs] nullfs + unionfs do not compose, re
o kern/155615  fs         [zfs] zfs v28 broken on sparc64 -current
o kern/155587  fs         [zfs] [panic] kernel panic with zfs
o kern/155411  fs         [regression] [8.2-release] [tmpfs]: mount: tmpfs : No 
o kern/155199  fs         [ext2fs] ext3fs mounted as ext2fs gives I/O errors
o bin/155104   fs         [zfs][patch] use /dev prefix by default when importing
o kern/154930  fs         [zfs] cannot delete/unlink file from full volume -> EN
o kern/154828  fs         [msdosfs] Unable to create directories on external USB
o kern/154491  fs         [smbfs] smb_co_lock: recursive lock for object 1
o kern/154447  fs         [zfs] [panic] Occasional panics - solaris assert somew
p kern/154228  fs         [md] md getting stuck in wdrain state
o kern/153996  fs         [zfs] zfs root mount error while kernel is not located
o kern/153847  fs         [nfs] [panic] Kernel panic from incorrect m_free in nf
o kern/153753  fs         [zfs] ZFS v15 - grammatical error when attempting to u
o kern/153716  fs         [zfs] zpool scrub time remaining is incorrect
o kern/153695  fs         [patch] [zfs] Booting from zpool created on 4k-sector 
o kern/153680  fs         [xfs] 8.1 failing to mount XFS partitions
o kern/153520  fs         [zfs] Boot from GPT ZFS root on HP BL460c G1 unstable
o kern/153418  fs         [zfs] [panic] Kernel Panic occurred writing to zfs vol
o kern/153351  fs         [zfs] locking directories/files in ZFS
o bin/153258   fs         [patch][zfs] creating ZVOLs requires `refreservation' 
s kern/153173  fs         [zfs] booting from a gzip-compressed dataset doesn't w
o kern/153126  fs         [zfs] vdev failure, zpool=peegel type=vdev.too_small
p kern/152488  fs         [tmpfs] [patch] mtime of file updated when only inode 
o kern/152022  fs         [nfs] nfs service hangs with linux client [regression]
o kern/151942  fs         [zfs] panic during ls(1) zfs snapshot directory
o kern/151905  fs         [zfs] page fault under load in /sbin/zfs
o kern/151845  fs         [smbfs] [patch] smbfs should be upgraded to support Un
o bin/151713   fs         [patch] Bug in growfs(8) with respect to 32-bit overfl
o kern/151648  fs         [zfs] disk wait bug
o kern/151629  fs         [fs] [patch] Skip empty directory entries during name 
o kern/151330  fs         [zfs] will unshare all zfs filesystem after execute a 
o kern/151326  fs         [nfs] nfs exports fail if netgroups contain duplicate 
o kern/151251  fs         [ufs] Can not create files on filesystem with heavy us
o kern/151226  fs         [zfs] can't delete zfs snapshot
o kern/151111  fs         [zfs] vnodes leakage during zfs unmount
o kern/150503  fs         [zfs] ZFS disks are UNAVAIL and corrupted after reboot
o kern/150501  fs         [zfs] ZFS vdev failure vdev.bad_label on amd64
o kern/150390  fs         [zfs] zfs deadlock when arcmsr reports drive faulted
o kern/150336  fs         [nfs] mountd/nfsd became confused; refused to reload n
o kern/150207  fs         zpool(1): zpool import -d /dev tries to open weird dev
o kern/149208  fs         mksnap_ffs(8) hang/deadlock
o kern/149173  fs         [patch] [zfs] make OpenSolaris <sys/nvpair.h> installa
o kern/149015  fs         [zfs] [patch] misc fixes for ZFS code to build on Glib
o kern/149014  fs         [zfs] [patch] declarations in ZFS libraries/utilities 
o kern/149013  fs         [zfs] [patch] make ZFS makefiles use the libraries fro
o kern/148504  fs         [zfs] ZFS' zpool does not allow replacing drives to be
o kern/148490  fs         [zfs]: zpool attach - resilver bidirectionally, and re
o kern/148368  fs         [zfs] ZFS hanging forever on 8.1-PRERELEASE
o kern/148204  fs         [nfs] UDP NFS causes overload
o kern/148138  fs         [zfs] zfs raidz pool commands freeze
o kern/147903  fs         [zfs] [panic] Kernel panics on faulty zfs device
o kern/147881  fs         [zfs] [patch] ZFS "sharenfs" doesn't allow different "
o kern/147560  fs         [zfs] [boot] Booting 8.1-PRERELEASE raidz system take 
o kern/147420  fs         [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt 
o kern/146941  fs         [zfs] [panic] Kernel Double Fault - Happens constantly
o kern/146786  fs         [zfs] zpool import hangs with checksum errors
o kern/146708  fs         [ufs] [panic] Kernel panic in softdep_disk_write_compl
o kern/146528  fs         [zfs] Severe memory leak in ZFS on i386
o kern/146502  fs         [nfs] FreeBSD 8 NFS Client Connection to Server
s kern/145712  fs         [zfs] cannot offline two drives in a raidz2 configurat
o kern/145411  fs         [xfs] [panic] Kernel panics shortly after mounting an 
o bin/145309   fs         bsdlabel: Editing disk label invalidates the whole dev
o kern/145272  fs         [zfs] [panic] Panic during boot when accessing zfs on 
o kern/145246  fs         [ufs] dirhash in 7.3 gratuitously frees hashes when it
o kern/145238  fs         [zfs] [panic] kernel panic on zpool clear tank
o kern/145229  fs         [zfs] Vast differences in ZFS ARC behavior between 8.0
o kern/145189  fs         [nfs] nfsd performs abysmally under load
o kern/144929  fs         [ufs] [lor] vfs_bio.c + ufs_dirhash.c
p kern/144447  fs         [zfs] sharenfs fsunshare() & fsshare_main() non functi
o kern/144416  fs         [panic] Kernel panic on online filesystem optimization
s kern/144415  fs         [zfs] [panic] kernel panics on boot after zfs crash
o kern/144234  fs         [zfs] Cannot boot machine with recent gptzfsboot code 
o kern/143825  fs         [nfs] [panic] Kernel panic on NFS client
o bin/143572   fs         [zfs] zpool(1): [patch] The verbose output from iostat
o kern/143212  fs         [nfs] NFSv4 client strange work ...
o kern/143184  fs         [zfs] [lor] zfs/bufwait LOR
o kern/142878  fs         [zfs] [vfs] lock order reversal
o kern/142597  fs         [ext2fs] ext2fs does not work on filesystems with real
o kern/142489  fs         [zfs] [lor] allproc/zfs LOR
o kern/142466  fs         Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re
o kern/142306  fs         [zfs] [panic] ZFS drive (from OSX Leopard) causes two 
o kern/142068  fs         [ufs] BSD labels are got deleted spontaneously
o kern/141897  fs         [msdosfs] [panic] Kernel panic. msdofs: file name leng
o kern/141463  fs         [nfs] [panic] Frequent kernel panics after upgrade fro
o kern/141305  fs         [zfs] FreeBSD ZFS+sendfile severe performance issues (
o kern/141091  fs         [patch] [nullfs] fix panics with DIAGNOSTIC enabled
o kern/141086  fs         [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS
o kern/141010  fs         [zfs] "zfs scrub" fails when backed by files in UFS2
o kern/140888  fs         [zfs] boot fail from zfs root while the pool resilveri
o kern/140661  fs         [zfs] [patch] /boot/loader fails to work on a GPT/ZFS-
o kern/140640  fs         [zfs] snapshot crash
o kern/140068  fs         [smbfs] [patch] smbfs does not allow semicolon in file
o kern/139725  fs         [zfs] zdb(1) dumps core on i386 when examining zpool c
o kern/139715  fs         [zfs] vfs.numvnodes leak on busy zfs
p bin/139651   fs         [nfs] mount(8): read-only remount of NFS volume does n
o kern/139597  fs         [patch] [tmpfs] tmpfs initializes va_gen but doesn't u
o kern/139564  fs         [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo
o kern/139407  fs         [smbfs] [panic] smb mount causes system crash if remot
o kern/138662  fs         [panic] ffs_blkfree: freeing free block
o kern/138421  fs         [ufs] [patch] remove UFS label limitations
o kern/138202  fs         mount_msdosfs(1) see only 2Gb
o kern/136968  fs         [ufs] [lor] ufs/bufwait/ufs (open)
o kern/136945  fs         [ufs] [lor] filedesc structure/ufs (poll)
o kern/136944  fs         [ffs] [lor] bufwait/snaplk (fsync)
o kern/136873  fs         [ntfs] Missing directories/files on NTFS volume
o kern/136865  fs         [nfs] [patch] NFS exports atomic and on-the-fly atomic
p kern/136470  fs         [nfs] Cannot mount / in read-only, over NFS
o kern/135546  fs         [zfs] zfs.ko module doesn't ignore zpool.cache filenam
o kern/135469  fs         [ufs] [panic] kernel crash on md operation in ufs_dirb
o kern/135050  fs         [zfs] ZFS clears/hides disk errors on reboot
o kern/134491  fs         [zfs] Hot spares are rather cold...
o kern/133676  fs         [smbfs] [panic] umount -f'ing a vnode-based memory dis
o kern/133174  fs         [msdosfs] [patch] msdosfs must support multibyte inter
o kern/132960  fs         [ufs] [panic] panic:ffs_blkfree: freeing free frag
o kern/132397  fs         reboot causes filesystem corruption (failure to sync b
o kern/132331  fs         [ufs] [lor] LOR ufs and syncer
o kern/132237  fs         [msdosfs] msdosfs has problems to read MSDOS Floppy
o kern/132145  fs         [panic] File System Hard Crashes
o kern/131441  fs         [unionfs] [nullfs] unionfs and/or nullfs not combineab
o kern/131360  fs         [nfs] poor scaling behavior of the NFS server under lo
o kern/131342  fs         [nfs] mounting/unmounting of disks causes NFS to fail
o bin/131341   fs         makefs: error "Bad file descriptor"  on the mount poin
o kern/130920  fs         [msdosfs] cp(1) takes 100% CPU time while copying file
o kern/130210  fs         [nullfs] Error by check nullfs
o kern/129760  fs         [nfs] after 'umount -f' of a stale NFS share FreeBSD l
o kern/129488  fs         [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: 
o kern/129231  fs         [ufs] [patch] New UFS mount (norandom) option - mostly
o kern/129152  fs         [panic] non-userfriendly panic when trying to mount(8)
o kern/127787  fs         [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs
f kern/127375  fs         [zfs] If vm.kmem_size_max>"1073741823" then write spee
o bin/127270   fs         fsck_msdosfs(8) may crash if BytesPerSec is zero
o kern/127029  fs         [panic] mount(8): trying to mount a write protected zi
o kern/126287  fs         [ufs] [panic] Kernel panics while mounting an UFS file
o kern/125895  fs         [ffs] [panic] kernel: panic: ffs_blkfree: freeing free
s kern/125738  fs         [zfs] [request] SHA256 acceleration in ZFS
o kern/123939  fs         [msdosfs] corrupts new files
f sparc/123566 fs         [zfs] zpool import issue: EOVERFLOW
o kern/122380  fs         [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash
o bin/122172   fs         [fs]: amd(8) automount daemon dies on 6.3-STABLE i386,
o bin/121898   fs         [nullfs] pwd(1)/getcwd(2) fails with Permission denied
o bin/121072   fs         [smbfs] mount_smbfs(8) cannot normally convert the cha
o kern/120483  fs         [ntfs] [patch] NTFS filesystem locking changes
o kern/120482  fs         [ntfs] [patch] Sync style changes between NetBSD and F
o kern/118912  fs         [2tb] disk sizing/geometry problem with large array
o kern/118713  fs         [minidump] [patch] Display media size required for a k
o bin/118249   fs         [ufs] mv(1): moving a directory changes its mtime
o kern/118126  fs         [nfs] [patch] Poor NFS server write performance
o kern/118107  fs         [ntfs] [panic] Kernel panic when accessing a file at N
o kern/117954  fs         [ufs] dirhash on very large directories blocks the mac
o bin/117315   fs         [smbfs] mount_smbfs(8) and related options can't mount
o kern/117314  fs         [ntfs] Long-filename only NTFS fs'es cause kernel pani
o kern/117158  fs         [zfs] zpool scrub causes panic if geli vdevs detach on
o bin/116980   fs         [msdosfs] [patch] mount_msdosfs(8) resets some flags f
o conf/116931  fs         lack of fsck_cd9660 prevents mounting iso images with 
o kern/116583  fs         [ffs] [hang] System freezes for short time when using 
o bin/115361   fs         [zfs] mount(8) gets into a state where it won't set/un
o kern/114955  fs         [cd9660] [patch] [request] support for mask,dirmask,ui
o kern/114847  fs         [ntfs] [patch] [request] dirmask support for NTFS ala 
o kern/114676  fs         [ufs] snapshot creation panics: snapacct_ufs2: bad blo
o bin/114468   fs         [patch] [request] add -d option to umount(8) to detach
o kern/113852  fs         [smbfs] smbfs does not properly implement DFS referral
o bin/113838   fs         [patch] [request] mount(8): add support for relative p
o bin/113049   fs         [patch] [request] make quot(8) use getopt(3) and show 
o kern/112658  fs         [smbfs] [patch] smbfs and caching problems (resolves b
o kern/111843  fs         [msdosfs] Long Names of files are incorrectly created 
o kern/111782  fs         [ufs] dump(8) fails horribly for large filesystems
s bin/111146   fs         [2tb] fsck(8) fails on 6T filesystem
o kern/109024  fs         [msdosfs] [iconv] mount_msdosfs: msdosfs_iconv: Operat
o kern/109010  fs         [msdosfs] can't mv directory within fat32 file system
o bin/107829   fs         [2TB] fdisk(8): invalid boundary checking in fdisk / w
o kern/106107  fs         [ufs] left-over fsck_snapshot after unfinished backgro
o kern/104406  fs         [ufs] Processes get stuck in "ufs" state under persist
o kern/104133  fs         [ext2fs] EXT2FS module corrupts EXT2/3 filesystems
o kern/103035  fs         [ntfs] Directories in NTFS mounted disc images appear 
o kern/101324  fs         [smbfs] smbfs sometimes not case sensitive when it's s
o kern/99290   fs         [ntfs] mount_ntfs ignorant of cluster sizes
s bin/97498    fs         [request] newfs(8) has no option to clear the first 12
o kern/97377   fs         [ntfs] [patch] syntax cleanup for ntfs_ihash.c
o kern/95222   fs         [cd9660] File sections on ISO9660 level 3 CDs ignored
o kern/94849   fs         [ufs] rename on UFS filesystem is not atomic
o bin/94810    fs         fsck(8) incorrectly reports 'file system marked clean'
o kern/94769   fs         [ufs] Multiple file deletions on multi-snapshotted fil
o kern/94733   fs         [smbfs] smbfs may cause double unlock
o kern/93942   fs         [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D
o kern/92272   fs         [ffs] [hang] Filling a filesystem while creating a sna
o kern/91134   fs         [smbfs] [patch] Preserve access and modification time 
a kern/90815   fs         [smbfs] [patch] SMBFS with character conversions somet
o kern/88657   fs         [smbfs] windows client hang when browsing a samba shar
o kern/88555   fs         [panic] ffs_blkfree: freeing free frag on AMD 64
o kern/88266   fs         [smbfs] smbfs does not implement UIO_NOCOPY and sendfi
o bin/87966    fs         [patch] newfs(8): introduce -A flag for newfs to enabl
o kern/87859   fs         [smbfs] System reboot while umount smbfs.
o kern/86587   fs         [msdosfs] rm -r /PATH fails with lots of small files
o bin/85494    fs         fsck_ffs: unchecked use of cg_inosused macro etc.
o kern/80088   fs         [smbfs] Incorrect file time setting on NTFS mounted vi
o bin/74779    fs         Background-fsck checks one filesystem twice and omits 
o kern/73484   fs         [ntfs] Kernel panic when doing `ls` from the client si
o bin/73019    fs         [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino
o kern/71774   fs         [ntfs] NTFS cannot "see" files on a WinXP filesystem
o bin/70600    fs         fsck(8) throws files away when it can't grow lost+foun
o kern/68978   fs         [panic] [ufs] crashes with failing hard disk, loose po
o kern/65920   fs         [nwfs] Mounted Netware filesystem behaves strange
o kern/65901   fs         [smbfs] [patch] smbfs fails fsx write/truncate-down/tr
o kern/61503   fs         [smbfs] mount_smbfs does not work as non-root
o kern/55617   fs         [smbfs] Accessing an nsmb-mounted drive via a smb expo
o kern/51685   fs         [hang] Unbounded inode allocation causes kernel to loc
o kern/51583   fs         [nullfs] [patch] allow to work with devices and socket
o kern/36566   fs         [smbfs] System reboot with dead smb mount and umount
o bin/27687    fs         fsck(8) wrapper is not properly passing options to fsc
o kern/18874   fs         [2TB] 32bit NFS servers export wrong negative values t

257 problems total.


From owner-freebsd-fs@FreeBSD.ORG  Mon Oct 17 12:25:30 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 29620106566B
	for <freebsd-fs@freebsd.org>; Mon, 17 Oct 2011 12:25:30 +0000 (UTC)
	(envelope-from bra@fsn.hu)
Received: from people.fsn.hu (people.fsn.hu [195.228.252.137])
	by mx1.freebsd.org (Postfix) with ESMTP id 986CC8FC0A
	for <freebsd-fs@freebsd.org>; Mon, 17 Oct 2011 12:25:29 +0000 (UTC)
Received: by people.fsn.hu (Postfix, from userid 1001)
	id 23A5CAADD9E; Mon, 17 Oct 2011 14:25:28 +0200 (CEST)
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.2
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MF-ACE0E1EA [pR:
	27.1455]
X-CRM114-CacheID: sfid-20111017_14252_85334530 
X-CRM114-Status: Good  ( pR: 27.1455 )
X-DSPAM-Result: Whitelisted
X-DSPAM-Processed: Mon Oct 17 14:25:28 2011
X-DSPAM-Confidence: 0.9969
X-DSPAM-Probability: 0.0000
X-DSPAM-Signature: 4e9c1eb8873599527411137
X-DSPAM-Factors: 27, From*Attila Nagy <bra@fsn.hu>, 0.00010, >+On, 0.00099,
	the+>, 0.00134, the+>, 0.00134, >+the, 0.00134,
	>+the, 0.00134, wrote+>, 0.00178, conf, 0.00227,
	cache, 0.00256, cache, 0.00256, >+If, 0.00267,
	wrote+>>, 0.00267, )+>, 0.00279, >+>, 0.00286, >+>, 0.00286,
	in+>, 0.00307, I+>, 0.00341, you+>, 0.00361, >>+>>, 0.00396,
	>>+>>, 0.00396, >+You, 0.00409, wrote, 0.00490,
	wrote, 0.00490, adding, 0.00510, 2011+at, 0.00510,
	STABLE, 0.00556,
X-Spambayes-Classification: ham; 0.00
Received: from japan.t-online.private (japan.t-online.co.hu [195.228.243.99])
	by people.fsn.hu (Postfix) with ESMTPSA id B369CAADD91;
	Mon, 17 Oct 2011 14:25:27 +0200 (CEST)
Message-ID: <4E9C1EB7.60907@fsn.hu>
Date: Mon, 17 Oct 2011 14:25:27 +0200
From: Attila Nagy <bra@fsn.hu>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
	rv:1.8.1.23) Gecko/20090817 Thunderbird/2.0.0.23 Mnenhy/0.7.6.0
MIME-Version: 1.0
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
References: <4E97F710.8000004@fsn.hu> <20111014090000.GA66602@icarus.home.lan>
In-Reply-To: <20111014090000.GA66602@icarus.home.lan>
X-Stationery: 0.7.5
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-fs@freebsd.org
Subject: Re: cache devices come up as dsk/original_device_name in zpools
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Oct 2011 12:25:30 -0000

On 10/14/11 11:00, Jeremy Chadwick wrote:
> On Fri, Oct 14, 2011 at 10:47:12AM +0200, Attila Nagy wrote:
>> Hi,
>>
>> A have a zpool with cache devices on 8-STABLE (csuped and compiled
>> at Sep 14 15:01:25 CEST 2011). The problem is every time I reboot,
>> the cache devices turn to UNAVAIL (because device name changes to
>> dsk/daXX):
>> dsk/da37 UNAVAIL 0 0 0 cannot open
>> dsk/da38 UNAVAIL 0 0 0 cannot open
>>
>> After removing and re-adding them, everyting goes back to normal,
>> until the next reboot. I have no /boot/zfs/zpool.cache (because the
>> machine is netbooted), maybe this is the cause? In previous versions
>> everything was fine.
> Obviously at some point when you built this system you entered
> "dsk/da37" and "dsk/da38".  So the metadata on those drives probably
> contains references to those strings.  You need to clear/change that.
Pretty unlikely. And given that this happens on all machines, upgraded 
to a recent 8-STABLE (and never happened before), I would say something 
has been changed regarding this.
In the user space tools there are a lot of occurrences of /dev/dsk...
>
> I'm not sure how to go about doing that, especially on a system which
> lacks /boot/zfs/zpool.cache.  A one-time "zpool export" then a reboot, I
> imagine, would suffice, but I'm not sure if export actually changes the
> metadata on the disk itself or just updates the zpool.cache file.
>
> If you ran "zdb" on this system (the output will be HUGE given the
> number of vdevs and devices you have!), you should see some relevant
> information under each disk (child), specifically "path" vs.
> "phys_path".  Maybe these differ?
Well, zdb seems to be quite useless without zpool.cache...
But I found a spare machine where I could do an export. The zdb output 
does not contain the above disks (da37, da38, the cache devices).
I let the tool run for about 30 minutes only.
>
> You might also try tinkering about with the loader.conf(5) variables
> zpool_cache_*.  Depending on your setup, you might be able to move the
> zpool.cache file to a different location -- I realise you PXE boot, but
> if you have any sort of storage media on that system that isn't under
> ZFS that *is* available (e.g. a small UFS partition, etc.) then you
> might consider storing it there.  See /boot/defaults/loader.conf.
I have no UFS on these machines, and this is just fine. And was fine 
always, I hope this stays this way. :)
>
> Otherwise I'm not sure how to go about changing the actual strings in
> the disk metadata.  Maybe remove the cache devices entirely, zero out
> the first and last ~16MBytes of the da37 and da38 disks (using dd), then
> re-add them using their "daXX" name?  That might suffice.
>
I'm not sure whether this is in the on-disk metadata. How could I add a 
/dev/dsk/da38 disk with zpool? It does not exist.

From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 18 00:36:03 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5D6CB106564A
	for <freebsd-fs@freebsd.org>; Tue, 18 Oct 2011 00:36:03 +0000 (UTC)
	(envelope-from haroldp@internal.org)
Received: from pluto.internal.org (mail.internal.org [64.191.53.117])
	by mx1.freebsd.org (Postfix) with ESMTP id 0785A8FC13
	for <freebsd-fs@freebsd.org>; Tue, 18 Oct 2011 00:36:02 +0000 (UTC)
Received: from [10.0.0.79] (99-46-24-87.lightspeed.renonv.sbcglobal.net
	[99.46.24.87])
	by pluto.internal.org (Postfix) with ESMTPA id 79A5DECBD4
	for <freebsd-fs@freebsd.org>; Mon, 17 Oct 2011 17:17:32 -0700 (PDT)
From: Harold Paulson <haroldp@internal.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Date: Mon, 17 Oct 2011 17:17:31 -0700
Message-Id: <4D8047A6-930E-4DE8-BA55-051890585BFE@internal.org>
To: freebsd-fs@freebsd.org
Mime-Version: 1.0 (Apple Message framework v1084)
X-Mailer: Apple Mail (2.1084)
Subject: Damaged directory on ZFS
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Oct 2011 00:36:03 -0000

Hello,=20

I've had a server that boots from ZFS panicking for a couple days.  I =
have worked around the problem for now, but I hope someone can give me =
some insight into what's going on, and how I can solve it properly. =20

The server is running 8.2-STABLE (zfs v28) with 8G of ram and 4 SATA =
disks in a raid10 type arrangement:

# uname -a             =20
FreeBSD jane.sierraweb.com 8.2-STABLE-201105 FreeBSD 8.2-STABLE-201105 =
#0: Tue May 17 05:18:48 UTC 2011     =
root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64

And zpool status:=20

	NAME           STATE     READ WRITE CKSUM
	tank           ONLINE       0     0     0
	  mirror       ONLINE       0     0     0
	    gpt/disk0  ONLINE       0     0     0
	    gpt/disk1  ONLINE       0     0     0
	  mirror       ONLINE       0     0     0
	    gpt/disk2  ONLINE       0     0     0
	    gpt/disk3  ONLINE       0     0     0

It started panicking under load a couple days ago.  We replaced RAM and =
motherboard, but problems persisted.  I don't know if a hardware issue =
originally caused the problem or what.  When it panics, I get the usual =
panic message, but I don't get a core file, and it never reboots itself. =
=20

http://pastebin.com/F1J2AjSF

While I was trying to figure out the source of the problem, I notice =
stuck various stuck processes that peg a CPU and can't be killed, such =
as:

  PID JID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU =
COMMAND
48735   0 root        1  46    0 11972K   924K CPU3    3 415:14 100.00% =
find

They are not marked zombie, but I can't kill them, and restarting the =
jail they are in won't even get rid of them.  truss just hangs with no =
output on them.  On different occasions, I noticed pop3d processes for =
the same user getting stuck in this way.  On a hunch I ran a "find" =
through the files in the user's Maildir and got a panic.  I disabled =
this account and now the server is stable again.  At least until =
locate.updatedb walks through that directory, I suppose.   Evidentially, =
there is some kind of hole in the file system below that directory tree =
causing the panic. =20

I can move that directory out of the way, and carry on, but is there =
anything I can do to really *repair* the problem?

Thanks.

	- H


From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 18 00:54:50 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7CBC01065674
	for <freebsd-fs@freebsd.org>; Tue, 18 Oct 2011 00:54:50 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta07.emeryville.ca.mail.comcast.net
	(qmta07.emeryville.ca.mail.comcast.net [76.96.30.64])
	by mx1.freebsd.org (Postfix) with ESMTP id 63A168FC12
	for <freebsd-fs@freebsd.org>; Tue, 18 Oct 2011 00:54:50 +0000 (UTC)
Received: from omta14.emeryville.ca.mail.comcast.net ([76.96.30.60])
	by qmta07.emeryville.ca.mail.comcast.net with comcast
	id m0ty1h0041HpZEsA70ujiE; Tue, 18 Oct 2011 00:54:43 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta14.emeryville.ca.mail.comcast.net with comcast
	id m0ti1h00k1t3BNj8a0tijs; Tue, 18 Oct 2011 00:53:43 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 545D1102C1C; Mon, 17 Oct 2011 17:54:48 -0700 (PDT)
Date: Mon, 17 Oct 2011 17:54:48 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Harold Paulson <haroldp@internal.org>
Message-ID: <20111018005448.GA2855@icarus.home.lan>
References: <4D8047A6-930E-4DE8-BA55-051890585BFE@internal.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4D8047A6-930E-4DE8-BA55-051890585BFE@internal.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
Subject: Re: Damaged directory on ZFS
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Oct 2011 00:54:50 -0000

On Mon, Oct 17, 2011 at 05:17:31PM -0700, Harold Paulson wrote:
> I've had a server that boots from ZFS panicking for a couple days.  I have worked around the problem for now, but I hope someone can give me some insight into what's going on, and how I can solve it properly.  
> 
> The server is running 8.2-STABLE (zfs v28) with 8G of ram and 4 SATA disks in a raid10 type arrangement:
> 
> # uname -a              
> FreeBSD jane.sierraweb.com 8.2-STABLE-201105 FreeBSD 8.2-STABLE-201105 #0: Tue May 17 05:18:48 UTC 2011     root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64

First thing to do is to consider upgrading to a newer RELENG_8 date.
There have been *many* ZFS fixes since May.

> And zpool status: 
> 
> 	NAME           STATE     READ WRITE CKSUM
> 	tank           ONLINE       0     0     0
> 	  mirror       ONLINE       0     0     0
> 	    gpt/disk0  ONLINE       0     0     0
> 	    gpt/disk1  ONLINE       0     0     0
> 	  mirror       ONLINE       0     0     0
> 	    gpt/disk2  ONLINE       0     0     0
> 	    gpt/disk3  ONLINE       0     0     0
> 
> It started panicking under load a couple days ago.  We replaced RAM and motherboard, but problems persisted.  I don't know if a hardware issue originally caused the problem or what.  When it panics, I get the usual panic message, but I don't get a core file, and it never reboots itself.  
> 
> http://pastebin.com/F1J2AjSF

ZFS developers will need to comment on the state of the backtrace.  You
may be requested to examine the core using kgdb and be given some
commands to run on it.

> While I was trying to figure out the source of the problem, I notice stuck various stuck processes that peg a CPU and can't be killed, such as:
> 
>   PID JID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
> 48735   0 root        1  46    0 11972K   924K CPU3    3 415:14 100.00% find

Had you done procstat -k -k 48735 (the "double -k" is not a typo), you
probably would have seen that the process was "stuck" in a ZFS-related
thread.  These are processes which the kernel is hanging on to and will
not let go of, so even kill -9 won't kill these.

It would have also be worthwhile to get the "process tree" of what
spawned the PID.  (Solaris has ptree; I think we have something similar
under FreeBSD but I forget what)  The reason that matters is that it's
probably a periodic job that runs (there are many which use find),
traversing your ZFS filesystems, and tickling a bug/issue somewhere.
You even hint at this in your next paragraph, re: locate.updatedb.

> They are not marked zombie, but I can't kill them, and restarting the jail they are in won't even get rid of them.  truss just hangs with no output on them.  On different occasions, I noticed pop3d processes for the same user getting stuck in this way.  On a hunch I ran a "find" through the files in the user's Maildir and got a panic.  I disabled this account and now the server is stable again.  At least until locate.updatedb walks through that directory, I suppose.   Evidentially, there is some kind of hole in the file system below that directory tree causing the panic.  

The fact that jails are involved complicates things even more.

truss and ktrace won't show anything going on because of what I said
above: the kernel bits associated with the process are hung or spinning,
not the actual syscall/userland bits.

Furthermore, truss on FreeBSD is basically worthless; use ktrace.

> I can move that directory out of the way, and carry on, but is there anything I can do to really *repair* the problem?

I would recommend starting with "zpool scrub" on the pool which is
associated with the Maildir/ directory of the account you disable.  I
will not be surprised if it comes back 100% clean.

Given what the backtrace looks like, I would say the Maildir/ has a ton
of files in it.  Is that the case?  Does "echo *" say something about
argument list too long?

You should also be aware that Maildir on ZFS performs horribly.  I've
experienced this, and there are old discussions about it as well.  Here
are some of my findings.

http://koitsu.wordpress.com/2009/06/01/freebsd-and-zfs-horrible-raidz1-read-speed/
http://koitsu.wordpress.com/2009/06/01/freebsd-and-zfs-horrible-raidz1-speed-part-2/
http://koitsu.wordpress.com/2009/10/29/unix-mail-format-annoyances/

The state of mail spools on UNIX is a complete disgrace, and everyone
involved in it should feel ashamed.  MIX is probably the best solution
to this problem, but it's not being adopted by all the major players,
which is very sad.  I realise that doesn't solve your problem, but my
strong recommendation is to use classic UNIX mail spools (one file for
many messages) when the filesystem is ZFS-based.

However, someone familiar with the ZFS internals, as I said, should
investigate the crash you're experiencing regardless.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 18 03:28:14 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0EC66106564A
	for <freebsd-fs@freebsd.org>; Tue, 18 Oct 2011 03:28:14 +0000 (UTC)
	(envelope-from alexander@leidinger.net)
Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de
	[217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id C09BF8FC19
	for <freebsd-fs@freebsd.org>; Tue, 18 Oct 2011 03:28:13 +0000 (UTC)
Received: from outgoing.leidinger.net (p4FC42468.dip.t-dialin.net
	[79.196.36.104])
	by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id 2441F844016;
	Tue, 18 Oct 2011 05:27:55 +0200 (CEST)
Received: from unknown (IO.Leidinger.net [192.168.1.12])
	by outgoing.leidinger.net (Postfix) with ESMTP id 4B03E2929;
	Tue, 18 Oct 2011 05:27:52 +0200 (CEST)
Date: Tue, 18 Oct 2011 05:27:51 +0200
From: Alexander Leidinger <Alexander@Leidinger.net>
To: Patrick Donnelly <batrick@batbytes.com>
Message-ID: <20111018052751.0000273f@unknown>
In-Reply-To: <CACh33Fpz=uAp8h0Bjsi1Be=ob_94jXtN51mAHvGPkReY5MpTcg@mail.gmail.com>
References: <CACh33Fpz=uAp8h0Bjsi1Be=ob_94jXtN51mAHvGPkReY5MpTcg@mail.gmail.com>
X-Mailer: Claws Mail 3.7.10cvs7 (GTK+ 2.16.6; i586-pc-mingw32msvc)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-EBL-MailScanner-Information: Please contact the ISP for more information
X-EBL-MailScanner-ID: 2441F844016.A45C0
X-EBL-MailScanner: Found to be clean
X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN,
	SpamAssassin (not cached, score=-1, required 6, autolearn=disabled,
	ALL_TRUSTED -1.00)
X-EBL-MailScanner-From: alexander@leidinger.net
X-EBL-MailScanner-Watermark: 1319513278.37207@oxF/gA22yq8e7f9X9iW/og
X-EBL-Spam-Status: No
Cc: freebsd-fs@freebsd.org
Subject: Re: [ZFS] Using SSD with partitions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Oct 2011 03:28:14 -0000

On Sun, 16 Oct 2011 00:45:50 -0400 Patrick Donnelly
<batrick@batbytes.com> wrote:


> Hi list,
> 
> I've got an array for home use where my boot drive (UFS) finally died.
> I've decided to upgrade to a SSD for a replacement but am looking to
> maybe simultaneously improving performance of my ZFS array. Naturally

What do you mean by "array"? How many disks and which type (HD or SSD)?
Which pool configuration you are aiming at?

> a FreeBSD install doesn't use much space so partitioning the drive to
> get maximum usage seems wise. I was thinking for a hypothetical 40GB
> drive:
> 
> 20GB -- FreeBSD / partition
> 2GB  -- ZFS ZIL
> 18GB -- ZFS Cache
> 
> What I'm wondering is if this will be a bad idea. I know that SSDs are
> not designed to be written to *a lot*, which a ZIL will experience. Is
> this a bad idea? I'm hoping for experiences from people in similar
> scenarios. As I'm not an enterprise IT person who can't simply choose
> to just throw more mon-- I mean SSDs -- at the problem, I need to be
> efficient. :) [I'm thinking the cache drive partition might be
> pointless as I don't think I'd benefit that much from it.]

ZIL and cache shall be on devices which are faster than the rest of the
pool (exception for an edge case: saturated I/O-path to the main
pool-disks, not saturated I/O-path to the ZIL/cache, and data in cache
or targeted for the ZIL).

If your ZIL/cache partitions on SSD are aimed for a pool which consists
of normal HD's, go for it. If you have only SSDs (or only one single
disk in total), it is most probably better (surely for a desktop system)
to keep it simple (no ZIL/cache partitions).

Bye,
Alexander.

-- 
http://www.Leidinger.net    Alexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org       netchild @ FreeBSD.org  : PGP ID = 72077137

From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 18 03:53:45 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 08621106564A
	for <freebsd-fs@freebsd.org>; Tue, 18 Oct 2011 03:53:44 +0000 (UTC)
	(envelope-from batrick@batbytes.com)
Received: from mail-iy0-f182.google.com (mail-iy0-f182.google.com
	[209.85.210.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 98E408FC18
	for <freebsd-fs@freebsd.org>; Tue, 18 Oct 2011 03:53:44 +0000 (UTC)
Received: by iaky10 with SMTP id y10so255295iak.13
	for <freebsd-fs@freebsd.org>; Mon, 17 Oct 2011 20:53:44 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.231.20.227 with SMTP id g35mr277776ibb.32.1318910023866; Mon,
	17 Oct 2011 20:53:43 -0700 (PDT)
Received: by 10.231.19.66 with HTTP; Mon, 17 Oct 2011 20:53:43 -0700 (PDT)
In-Reply-To: <4E9AE725.4040001@gmail.com>
References: <CACh33Fpz=uAp8h0Bjsi1Be=ob_94jXtN51mAHvGPkReY5MpTcg@mail.gmail.com>
	<4E9AE725.4040001@gmail.com>
Date: Mon, 17 Oct 2011 23:53:43 -0400
Message-ID: <CACh33FpgTUEqsaTuSSOmRsGgk24K4+FuL90Zu-3v3F+RRtaOHw@mail.gmail.com>
From: Patrick Donnelly <batrick@batbytes.com>
To: "Luchesar V. ILIEV" <luchesar.iliev@gmail.com>
Content-Type: text/plain; charset=UTF-8
Cc: freebsd-fs@freebsd.org
Subject: Re: [ZFS] Using SSD with partitions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Oct 2011 03:53:45 -0000

Since people have asked about more details for my system:

It uses old desktop hardware with 5 1TB WD Caviar Blues in a raidz
configuration with 1 of those drives being a hot spare (4 1TB drives
in the raidz). The system currently has 2 GB of RAM IIRC.

I've been using NFS to access the data on my home network which has
worked pretty well. Writing to NFS over my VPN from across the country
is really bad which is one of the reasons I wanted to use a SSD for a
ZIL. read/write performance overall tends to be bad though so I don't
really know how much it will help. After fiddling around with NFS
settings for a long time I soon gave up and instead use SSH when
outside a LAN. That's another matter though and off-topic. :)


On Sun, Oct 16, 2011 at 10:16 AM, Luchesar V. ILIEV
<luchesar.iliev@gmail.com> wrote:
> 1. If you can afford more RAM, it's (much) better for ZFS than L2ARC.

I think I may end up doing this as well. RAM seems to have gotten
extraordinarily large and cheap in the last few years.

> 5. Check the output of "zpool upgrade". If your zpool version is
> anything below 19 (likely 14 or 15), I'd strongly recommend that you
> avoid setting up a separate ZIL. Pools before v19 fail critically when
> the ZIL is removed or is corrupted, which means you lose them for good.
> You might mitigate the risk with a mirrored ZIL, but it's still likely
> not worth it in your case.

Yes, I plan to upgrade the pool to v28 and FreeBSD when I get the SSD.
Speaking of which, should there be any problems with installing the
SSD, putting FreeBSD 9.0-RELEASE (when it comes out) on it, and then
trying to import the pool?

> Again, I'm no expert in those things, so take all my comments with a
> grain of salt. Good luck!

Thank you for your advice!

-- 
- Patrick Donnelly

From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 18 04:28:43 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B4ED9106566B
	for <freebsd-fs@freebsd.org>; Tue, 18 Oct 2011 04:28:43 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta15.emeryville.ca.mail.comcast.net
	(qmta15.emeryville.ca.mail.comcast.net [76.96.27.228])
	by mx1.freebsd.org (Postfix) with ESMTP id 9AC658FC0C
	for <freebsd-fs@freebsd.org>; Tue, 18 Oct 2011 04:28:43 +0000 (UTC)
Received: from omta19.emeryville.ca.mail.comcast.net ([76.96.30.76])
	by qmta15.emeryville.ca.mail.comcast.net with comcast
	id m4KR1h0051eYJf8AF4Uc0Q; Tue, 18 Oct 2011 04:28:36 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta19.emeryville.ca.mail.comcast.net with comcast
	id m4BW1h00U1t3BNj014BXbP; Tue, 18 Oct 2011 04:11:31 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 6D130102C1C; Mon, 17 Oct 2011 21:28:38 -0700 (PDT)
Date: Mon, 17 Oct 2011 21:28:38 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Patrick Donnelly <batrick@batbytes.com>
Message-ID: <20111018042838.GA6246@icarus.home.lan>
References: <CACh33Fpz=uAp8h0Bjsi1Be=ob_94jXtN51mAHvGPkReY5MpTcg@mail.gmail.com>
	<4E9AE725.4040001@gmail.com>
	<CACh33FpgTUEqsaTuSSOmRsGgk24K4+FuL90Zu-3v3F+RRtaOHw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CACh33FpgTUEqsaTuSSOmRsGgk24K4+FuL90Zu-3v3F+RRtaOHw@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
Subject: Re: [ZFS] Using SSD with partitions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Oct 2011 04:28:43 -0000

On Mon, Oct 17, 2011 at 11:53:43PM -0400, Patrick Donnelly wrote:
> Since people have asked about more details for my system:
> 
> It uses old desktop hardware with 5 1TB WD Caviar Blues in a raidz
> configuration with 1 of those drives being a hot spare (4 1TB drives
> in the raidz). The system currently has 2 GB of RAM IIRC.
> 
> I've been using NFS to access the data on my home network which has
> worked pretty well. Writing to NFS over my VPN from across the country
> is really bad which is one of the reasons I wanted to use a SSD for a
> ZIL. read/write performance overall tends to be bad though so I don't
> really know how much it will help. After fiddling around with NFS
> settings for a long time I soon gave up and instead use SSH when
> outside a LAN. That's another matter though and off-topic. :)

I don't see how using an SSD for ZIL is going to address issues of
latency and underlying network filesystems (not NFS but the idea of a
networked filesystem).  So many people think you can just throw a VPN
in between two locations and everything work will perfectly fine with
only nominal delays -- that isn't what happens at all at a packet level.
Network I/O tuning when a VPN is involved is a completely separate
topic.

NFS, by the way, is mainly intended to be used in very low-latency
environments (read: LANs).  You can fiddle with NFS and TCP window
settings all day and accomplish nothing considering cross-country
latency (within the US anyway) is around ~75ms on a good day.

When it comes to networked filesystems on UNIX, we have very little
choice.  NFS is the main one.  Then there's Stanford's Coda filesystem
thing, or maybe that's now part of AFS, I don't know.  Then there's
sshfs, which sounds wonderful until you realise all the dependencies and
nuances involved (mainly due to use of fuse, which we know on FreeBSD is
not so great).  Then there's Samba (CIFS/SMB, and now with Samba 3.6
offering SMB2 for Windows 7 clients), but that gets into issues of
security and cannot be forwarded via SSH (e.g. VPN would be needed)
given all its protocol some of which are UDP (not sure what the state of
NetBIOS is).

And none of this even begins to touch base on resiliency/reliability.
I have to remind people on a weekly basis that the Internet *truly* is
broken 24x7x365.  It cannot be relied upon 100% of the time, or even 90%
of the time.  VPN, SSH, plain-text packets... none of it matters when
the backbone carriers don't take the Internet seriously (and most do
not; it's still a best-effort service, at least that's how it's treated
by NOCs and technicians).

So what I'm saying is: I wish you luck in your endeavour to find
something that works for you.  In general I find SSH to be the most
convenient -- easy to deal with on a firewall level, authentication that
makes sense, and is easily manageable in general.  You might try making
use of things like Compression=yes in SSH as well.

If you find an SSD that is able to solve all of these problems, let me
know and I'll invest in one.  It'll be the first SSD with an RJ45 port
I'm sure.  :-)

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 18 10:46:25 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C8C201065673;
	Tue, 18 Oct 2011 10:46:25 +0000 (UTC) (envelope-from hlh@restart.be)
Received: from tignes.restart.be (tignes.restart.be [94.23.211.191])
	by mx1.freebsd.org (Postfix) with ESMTP id 760178FC14;
	Tue, 18 Oct 2011 10:46:25 +0000 (UTC)
Received: from restart.be (avoriaz.restart.be [IPv6:2001:41d0:2:56bf:1:1::])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "smtp.restart.be", Issuer "CA master" (verified OK))
	by tignes.restart.be (Postfix) with ESMTPS id C4EEC13F0B;
	Tue, 18 Oct 2011 12:35:28 +0200 (CEST)
Received: from morzine.restart.bel (morzine.restart.be
	[IPv6:2001:41d0:2:56bf:1:2::]) (authenticated bits=0)
	by restart.be (8.14.5/8.14.5) with ESMTP id p9IAZRbc002917;
	Tue, 18 Oct 2011 12:35:28 +0200 (CEST) (envelope-from hlh@restart.be)
X-DKIM: Sendmail DKIM Filter v2.8.3 restart.be p9IAZRbc002917
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=restart.be;
	s=avoriaz; t=1318934128;
	bh=tyrc5bxAewlcX03p8JZT8A6e3iZiGPpNkXtYgWBKlCY=;
	h=Message-ID:Date:From:MIME-Version:To:Subject:References:
	In-Reply-To:Content-Type:Content-Transfer-Encoding;
	b=AI5EMhhFe69cSc6qV9TMdpCFVd6xDHnKIUf3rZ/z5IBcd6q964eWesP7KgIOcfYNh
	HOLHSqStfs+xMe1R3vaXg==
X-DomainKeys: Sendmail DomainKeys Filter v1.0.2 restart.be p9IAZRbc002917
DomainKey-Signature: a=rsa-sha1; s=avoriaz; d=restart.be; c=nofws; q=dns;
	h=message-id:date:from:organization:user-agent:mime-version:to:
	subject:references:in-reply-to:content-type:content-transfer-encoding; 
	b=gmlZF1llrpXGXQVocZgcuja/AE25crFaXIgPrZmP6p9+WjxZBVBBOSakP/Sg+Lgu+
	LJTOOe6XlTYwHzkTjAaDA==
Message-ID: <4E9D566F.1040104@restart.be>
Date: Tue, 18 Oct 2011 12:35:27 +0200
From: Henri Hennebert <hlh@restart.be>
Organization: RestartSoft
User-Agent: Mozilla/5.0 (X11; FreeBSD i386;
	rv:7.0.1) Gecko/20111006 Thunderbird/7.0.1
MIME-Version: 1.0
To: freebsd-current@freebsd.org, freebsd-fs@freebsd.org, avg@freebsd.org
References: <4E8D7406.4090302@restart.be> <4E8D86A2.1040508@FreeBSD.org>
	<4E8D9F57.70506@restart.be> <4E8DAEE5.4020004@FreeBSD.org>
In-Reply-To: <4E8DAEE5.4020004@FreeBSD.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: 
Subject: Re: zfsloader 9.0 BETA3 r225759 - i/o error - all block copies
	unavailable
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Oct 2011 10:46:26 -0000

On 10/06/2011 15:36, Andriy Gapon wrote:
> on 06/10/2011 15:30 Henri Hennebert said the following:
>> The pool is a mirror:
>>
>> [root@morzine ~]# zpool status rpool
>>    pool: rpool
>>   state: ONLINE
>>   scan: scrub repaired 0 in 1h0m with 0 errors on Wed Aug 24 15:04:36 2011
>> config:
>>
>>      NAME                                            STATE     READ WRITE CKSUM
>>      rpool                                           ONLINE       0     0     0
>>        mirror-0                                      ONLINE       0     0     0
>>          gptid/e915c6a0-fc72-11de-aa21-00e081706b68  ONLINE       0     0     0
>>          gptid/eac8497d-fc72-11de-aa21-00e081706b68  ONLINE       0     0     0
>>
>> errors: No known data errors
>>
>> and rpool/root is not compressed:
>>
>> [root@morzine ~]# zfs get compression rpool/root
>> NAME        PROPERTY     VALUE     SOURCE
>> rpool/root  compression  off       inherited from rpool
>>
>> pool is v28 and filesystems are v5
>
> No particular recipes for this environment, just a general suggestion.
> If you run into a situation like this again, please try to use
> tools/tools/zfsboottest to diagnose where exactly an error originates.
>

I upgrade another system to 9.0-RC1 and encounter the same problem, this 
time zfsloader do not run.

After

mv /mnt/boot /mnt/Boot
mkdir /mnt/boot
cd /mnt/Boot
find . | cpio -pvdmu /mnt/boot

FreeBSD boot OK


[root@avoriaz zfsboottest]# ./zfsboottest /Boot/zfsloader /dev/ada0p2 
/dev/ada1p2
ZFS: SPA version 28
   pool: rpool
config:

         NAME STATE
         rpool ONLINE
           mirror ONLINE
             ada0p2 ONLINE
             ada1p2 ONLINE
ZFS: i/o error - all block copies unavailable
can't lookup

10 minutes later:

[root@avoriaz zfsboottest]# ./zfsboottest /Boot/zfsloader /dev/ada0p2 
/dev/ada1p2|less
ZFS: SPA version 28
   pool: rpool
config:

         NAME STATE
         rpool ONLINE
           mirror ONLINE
             ada0p2 ONLINE
             ada1p2 ONLINE
<blablabla>

it seems ok :-o

and a other time:
[root@avoriaz zfsboottest]# ./zfsboottest /Boot/zfsloader /dev/ada0p2
segmentation fault...

Strange isn't it.


Henri

From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 18 11:27:41 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 630C2106566B
	for <freebsd-fs@freebsd.org>; Tue, 18 Oct 2011 11:27:41 +0000 (UTC)
	(envelope-from ml@my.gd)
Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50])
	by mx1.freebsd.org (Postfix) with ESMTP id 010098FC16
	for <freebsd-fs@freebsd.org>; Tue, 18 Oct 2011 11:27:40 +0000 (UTC)
Received: by wwi18 with SMTP id 18so656289wwi.31
	for <freebsd-fs@freebsd.org>; Tue, 18 Oct 2011 04:27:40 -0700 (PDT)
Received: by 10.227.59.147 with SMTP id l19mr705867wbh.38.1318935550684;
	Tue, 18 Oct 2011 03:59:10 -0700 (PDT)
Received: from dfleuriot-at-hi-media.com ([83.167.62.196])
	by mx.google.com with ESMTPS id eu16sm2779680wbb.7.2011.10.18.03.59.08
	(version=SSLv3 cipher=OTHER); Tue, 18 Oct 2011 03:59:09 -0700 (PDT)
Message-ID: <4E9D5BFB.5060609@my.gd>
Date: Tue, 18 Oct 2011 12:59:07 +0200
From: Damien Fleuriot <ml@my.gd>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6;
	rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
References: <CACh33Fpz=uAp8h0Bjsi1Be=ob_94jXtN51mAHvGPkReY5MpTcg@mail.gmail.com>
	<4E9AE725.4040001@gmail.com>
	<CACh33FpgTUEqsaTuSSOmRsGgk24K4+FuL90Zu-3v3F+RRtaOHw@mail.gmail.com>
In-Reply-To: <CACh33FpgTUEqsaTuSSOmRsGgk24K4+FuL90Zu-3v3F+RRtaOHw@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Subject: Re: [ZFS] Using SSD with partitions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Oct 2011 11:27:41 -0000


On 10/18/11 5:53 AM, Patrick Donnelly wrote:
> Since people have asked about more details for my system:
> 
> It uses old desktop hardware with 5 1TB WD Caviar Blues in a raidz
> configuration with 1 of those drives being a hot spare (4 1TB drives
> in the raidz). The system currently has 2 GB of RAM IIRC.
> 
> I've been using NFS to access the data on my home network which has
> worked pretty well. Writing to NFS over my VPN from across the country
> is really bad which is one of the reasons I wanted to use a SSD for a
> ZIL. read/write performance overall tends to be bad though so I don't
> really know how much it will help. After fiddling around with NFS
> settings for a long time I soon gave up and instead use SSH when
> outside a LAN. That's another matter though and off-topic. :)
> 
> 

This is going to sound a bit rude, my preemptive apologies.

Just where did you get the notion that changing your bike's tires would
make your car run faster ?

In what world does your *storage* configuration affect your *network*
latency and performance ?

From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 18 14:40:05 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7A15D106566C;
	Tue, 18 Oct 2011 14:40:05 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 8CB008FC08;
	Tue, 18 Oct 2011 14:40:04 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA04559;
	Tue, 18 Oct 2011 17:40:00 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Message-ID: <4E9D8FBF.3030502@FreeBSD.org>
Date: Tue, 18 Oct 2011 17:39:59 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:7.0.1) Gecko/20111003 Thunderbird/7.0.1
MIME-Version: 1.0
To: Henri Hennebert <hlh@restart.be>
References: <4E8D7406.4090302@restart.be> <4E8D86A2.1040508@FreeBSD.org>
	<4E8D9F57.70506@restart.be> <4E8DAEE5.4020004@FreeBSD.org>
	<4E9D566F.1040104@restart.be>
In-Reply-To: <4E9D566F.1040104@restart.be>
X-Enigmail-Version: undefined
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org, freebsd-current@FreeBSD.org
Subject: Re: zfsloader 9.0 BETA3 r225759 - i/o error - all block copies
	unavailable
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Oct 2011 14:40:05 -0000

on 18/10/2011 13:35 Henri Hennebert said the following:
> I upgrade another system to 9.0-RC1 and encounter the same problem, this time
> zfsloader do not run.
> 
> After
> 
> mv /mnt/boot /mnt/Boot
> mkdir /mnt/boot
> cd /mnt/Boot
> find . | cpio -pvdmu /mnt/boot
> 
> FreeBSD boot OK
> 
> 
> [root@avoriaz zfsboottest]# ./zfsboottest /Boot/zfsloader /dev/ada0p2 /dev/ada1p2
> ZFS: SPA version 28
>   pool: rpool
> config:
> 
>         NAME STATE
>         rpool ONLINE
>           mirror ONLINE
>             ada0p2 ONLINE
>             ada1p2 ONLINE
> ZFS: i/o error - all block copies unavailable
> can't lookup
> 
> 10 minutes later:
> 
> [root@avoriaz zfsboottest]# ./zfsboottest /Boot/zfsloader /dev/ada0p2
> /dev/ada1p2|less
> ZFS: SPA version 28
>   pool: rpool
> config:
> 
>         NAME STATE
>         rpool ONLINE
>           mirror ONLINE
>             ada0p2 ONLINE
>             ada1p2 ONLINE
> <blablabla>
> 
> it seems ok :-o
> 
> and a other time:
> [root@avoriaz zfsboottest]# ./zfsboottest /Boot/zfsloader /dev/ada0p2
> segmentation fault...
> 
> Strange isn't it.

I think that it would be smart to not do any filesystem modifications after the
problem is detected / reproduced.
Also, currently zfsboottest doesn't do much of a problem self-diagnostics, so
using gdb or/and adding some printfs in the code are required to understand a
nature of a problem.  Like what kind of block gives an I/O error, if it actual
reading that fails or checksum verification or etc, and so on.

-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 18 15:55:27 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CBECC106564A
	for <freebsd-fs@freebsd.org>; Tue, 18 Oct 2011 15:55:27 +0000 (UTC)
	(envelope-from gpalmer@freebsd.org)
Received: from noop.in-addr.com (mail.in-addr.com [IPv6:2001:470:8:162::1])
	by mx1.freebsd.org (Postfix) with ESMTP id 9DAE78FC13
	for <freebsd-fs@freebsd.org>; Tue, 18 Oct 2011 15:55:27 +0000 (UTC)
Received: from gjp by noop.in-addr.com with local (Exim 4.76 (FreeBSD))
	(envelope-from <gpalmer@freebsd.org>)
	id 1RGC0j-000CSa-Ub; Tue, 18 Oct 2011 11:55:25 -0400
Date: Tue, 18 Oct 2011 11:55:25 -0400
From: Gary Palmer <gpalmer@freebsd.org>
To: Patrick Donnelly <batrick@batbytes.com>
Message-ID: <20111018155525.GH38162@in-addr.com>
References: <CACh33Fpz=uAp8h0Bjsi1Be=ob_94jXtN51mAHvGPkReY5MpTcg@mail.gmail.com>
	<4E9AE725.4040001@gmail.com>
	<CACh33FpgTUEqsaTuSSOmRsGgk24K4+FuL90Zu-3v3F+RRtaOHw@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CACh33FpgTUEqsaTuSSOmRsGgk24K4+FuL90Zu-3v3F+RRtaOHw@mail.gmail.com>
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: gpalmer@freebsd.org
X-SA-Exim-Scanned: No (on noop.in-addr.com); SAEximRunCond expanded to false
Cc: freebsd-fs@freebsd.org
Subject: Re: [ZFS] Using SSD with partitions
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Oct 2011 15:55:27 -0000

On Mon, Oct 17, 2011 at 11:53:43PM -0400, Patrick Donnelly wrote:
> Since people have asked about more details for my system:
> 
> It uses old desktop hardware with 5 1TB WD Caviar Blues in a raidz
> configuration with 1 of those drives being a hot spare (4 1TB drives
> in the raidz). The system currently has 2 GB of RAM IIRC.
> 
> I've been using NFS to access the data on my home network which has
> worked pretty well. Writing to NFS over my VPN from across the country
> is really bad which is one of the reasons I wanted to use a SSD for a
> ZIL. read/write performance overall tends to be bad though so I don't
> really know how much it will help. After fiddling around with NFS
> settings for a long time I soon gave up and instead use SSH when
> outside a LAN. That's another matter though and off-topic. :)

Block access protocols (NFS, CIFS) suck over anything other than a LAN.
If you think about it you basically have (and yes, this is WAY over
simplified but it illustrates the point)

Client: request block 0 of file 1
Server: block 0
Client: request block 1 of file 1
Server: block 1

So for each block (whether its 512 bytes, 4k or larger is irrelevant)
you spend most of your time waiting for the request to transit the
network.  You spend very little time actually sending or receiving data.

You're probably better off with something like WebDAV or the like as
they are less impacted by RTT issues as they request the entire file
at once.

Gary

From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 18 17:46:58 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 176FD106566B;
	Tue, 18 Oct 2011 17:46:58 +0000 (UTC)
	(envelope-from olivier@gid0.org)
Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com
	[209.85.216.54])
	by mx1.freebsd.org (Postfix) with ESMTP id AD9858FC14;
	Tue, 18 Oct 2011 17:46:57 +0000 (UTC)
Received: by qadz30 with SMTP id z30so884629qad.13
	for <multiple recipients>; Tue, 18 Oct 2011 10:46:56 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.224.185.19 with SMTP id cm19mr3025416qab.8.1318960016655; Tue,
	18 Oct 2011 10:46:56 -0700 (PDT)
Received: by 10.224.60.206 with HTTP; Tue, 18 Oct 2011 10:46:56 -0700 (PDT)
In-Reply-To: <20111005092603.GA1874@tops>
References: <alpine.GSO.1.10.1110011122030.882@multics.mit.edu>
	<CADLo83-s_3H8PbbxOPPxbe0m10U0U5JW-feB294dFs+Q3iTWvg@mail.gmail.com>
	<CAGMYy3ssi+kAuufDTHA1z6u7jRrZwRRkCCkcO94GHNGF9Rku_w@mail.gmail.com>
	<20111002020231.GA70864@icarus.home.lan>
	<CAGMYy3sCCxOiVqeP4PVbvxnpcyoyQZoL+w3nY8STYnpUNfj6JQ@mail.gmail.com>
	<j6aorc$hf0$1@dough.gmane.org>
	<CAGMYy3tbMWU6JU1++5XmzjZTrV1=oAgRaaDE3-=MMT73F_ojQQ@mail.gmail.com>
	<CABzXLYNLhP+YFvT5Sw=hKVF6d_Yvmt+e_QjH7ghX-2MyzS0wWA@mail.gmail.com>
	<CAGMYy3s7RrP8oWC+JYgSG3hU1EXgKmnf+WQRiE2CDu4bHuz3UA@mail.gmail.com>
	<CABzXLYO-gRt6o6wrevEFwwtTneYiShYD9UbK=j0UUUBzVob5jA@mail.gmail.com>
	<20111005092603.GA1874@tops>
Date: Tue, 18 Oct 2011 19:46:56 +0200
Message-ID: <CABzXLYNU+397-scL5vGAq1rC+Dx76ceMMNt9HUag+a7iP8eL_w@mail.gmail.com>
From: Olivier Smedts <olivier@gid0.org>
To: Gleb Kurtsou <gleb.kurtsou@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>,
	Ivan Voras <ivoras@freebsd.org>
Subject: Re: is TMPFS still highly experimental?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Oct 2011 17:46:58 -0000

2011/10/5 Gleb Kurtsou <gleb.kurtsou@gmail.com>:
> Free RAM is a bit tricky with virtual memory and overcommit support all
> over the place. There are at least 3 memory hungry subsystems: buffer
> cache, ZFS ARC, tmpfs.
>
> For the first two there is defined maximum size and they can be shrunk
> in low memory situations. Tmpfs grows as much as it can trying to
> calculate "free" memory available. Another difference is that tmpfs
> can't be shrunk in low memory situation.
>
> I proposed a patch changing tmpfs memory allocation:
> - Define maximum file system size (RAM/2 by default)
> - Don't try to check if free memory available, check free swap
> =A0instead and allocate more aggressively, i.e. allocate until
> =A0swap or file system limit is reached.

Patch tested and approved ! I did not test the maximum tmpfs default
size because I allocated a max size in my fstab.

%cat /etc/fstab
none /tmp tmpfs rw,mode=3D1777,size=3D2147483648 0 0

%df -h /tmp
Filesystem    Size    Used   Avail Capacity  Mounted on
tmpfs         2.0G    124k      2G     0%    /tmp

Mem: 622M Active, 351M Inact, 6491M Wired, 4940K Cache, 2160K Buf, 385M Fre=
e
Swap: 2048M Total, 36M Used, 2012M Free, 1% Inuse

(ZFS is using all my wired memory, the ARC is now full, and I deleted
my nearly-never-touched 8G swap in favor of a 2G swap)

A little test now :
%dd if=3D/dev/zero of=3D/tmp/test bs=3D1M count=3D1500
1500+0 records in
1500+0 records out
1572864000 bytes transferred in 0.763368 secs (2060427243 bytes/sec)
%df -h /tmp
Filesystem    Size    Used   Avail Capacity  Mounted on
tmpfs         2.0G    1.5G    542M    74%    /tmp
% top
Mem: 2559M Active, 514M Inact, 4506M Wired, 1656K Cache, 2160K Buf, 274M Fr=
ee
Swap: 2048M Total, 39M Used, 2009M Free, 1% Inuse

So tmpfs made the ZFS ARC cache shrink, without swapping. I did not
test filling my active memory to see if the max tmpfs size was
shrinking.

Cheers !

>
> Patch:
> http://marc.info/?l=3Dfreebsd-fs&m=3D129747367322954&w=3D2
> https://github.com/glk/freebsd-head/tree/tmpfs
>
> Thanks,
> Gleb.
>


--=20
Olivier Smedts=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=A0 _
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 ASCII ribbon campaign ( )
e-mail: olivier@gid0.org=A0 =A0 =A0 =A0 - against HTML email & vCards=A0 X
www: http://www.gid0.org=A0 =A0 - against proprietary attachments / \

=A0 "Il y a seulement 10 sortes de gens dans le monde :
=A0 ceux qui comprennent le binaire,
=A0 et ceux qui ne le comprennent pas."

From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 18 19:30:10 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D2C12106566B
	for <freebsd-fs@hub.freebsd.org>; Tue, 18 Oct 2011 19:30:10 +0000 (UTC)
	(envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id BCBFF8FC12
	for <freebsd-fs@hub.freebsd.org>; Tue, 18 Oct 2011 19:30:10 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p9IJUAtc060289
	for <freebsd-fs@freefall.freebsd.org>; Tue, 18 Oct 2011 19:30:10 GMT
	(envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p9IJUAO6060285;
	Tue, 18 Oct 2011 19:30:10 GMT (envelope-from gnats)
Date: Tue, 18 Oct 2011 19:30:10 GMT
Message-Id: <201110181930.p9IJUAO6060285@freefall.freebsd.org>
To: freebsd-fs@FreeBSD.org
From: 3zstbn24xn@snkmail.com
Cc: 
Subject: Re: kern/160777: [zfs] [hang] RAID-Z3 causes fatal hang upon
 scrub/import on 9.0-BETA2/amd64
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: 3zstbn24xn@snkmail.com
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Oct 2011 19:30:10 -0000

The following reply was made to PR kern/160777; it has been noted by GNATS.

From: 3zstbn24xn@snkmail.com
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/160777: [zfs] [hang] RAID-Z3 causes fatal hang upon
 scrub/import on 9.0-BETA2/amd64
Date: Tue, 18 Oct 2011 19:00:58 +0000

 The following kernel trace may be of relevance (console output shown below).  I receive this on 9.0-BETA3.
 
 lock order reversal:
  1st 0xfffffe000656c278 zfs (zfs) @ /usr/src/sys/kern/vfs_vnops.c:618
  2nd 0xfffffe017795e098 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2134
 KDB: stack backtrace:
 db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
 kdb_backtrace() at kdb_backtrace+0x37
 _witness_debugger() at _witness_debugger+0x2e
 witness_checkorder() at witness_checkorder+0x807
 __lockmgr_args() at __lockmgr_args+0x109c
 ffs_lock() at ffs_lock+0x8c
 VOP_LOCK1_APV() at VOP_LOCK1_APV+0x9b
 _vn_lock() at _vn_lock+0x47
 vget() at vget+0x7b
 vm_fault_hold() at vm_fault_hold+0x1976
 trap_pfault() at trap_pfault+0x118
 trap() at trap+0x39b
 calltrap() at calltrap+0x8
 --- trap 0xc, rip = 0xffffffff80b0aa8d, rsp = 0xffffff82331b5640, rbp = 0xffffff82331b56a0 ---
 copyin() at copyin+0x3d
 zfs_freebsd_write() at zfs_freebsd_write+0x46f
 VOP_WRITE_APV() at VOP_WRITE_APV+0x103
 vn_write() at vn_write+0x2a2
 dofilewrite() at dofilewrite+0x85
 kern_writev() at kern_writev+0x6c
 sys_write() at sys_write+0x55
 amd64_syscall() at amd64_syscall+0x3ba
 Xfast_syscall() at Xfast_syscall+0xf7
 --- syscall (4, FreeBSD ELF64, sys_write), rip = 0x80094533c, rsp = 0x7fffffffd9d8, rbp = 0x80065b000 ---

From owner-freebsd-fs@FreeBSD.ORG  Tue Oct 18 21:49:10 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8BFEC106564A;
	Tue, 18 Oct 2011 21:49:10 +0000 (UTC)
	(envelope-from gleb.kurtsou@gmail.com)
Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com
	[209.85.214.54])
	by mx1.freebsd.org (Postfix) with ESMTP id DDE608FC15;
	Tue, 18 Oct 2011 21:49:09 +0000 (UTC)
Received: by bkbzu17 with SMTP id zu17so1701178bkb.13
	for <multiple recipients>; Tue, 18 Oct 2011 14:49:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=date:from:to:cc:subject:message-id:references:mime-version
	:content-type:content-disposition:content-transfer-encoding
	:in-reply-to:user-agent;
	bh=3RSAw/uH7jgEnEyW5qUid/L+zTWTbwHYuROS800fAAs=;
	b=KNpkrgga9aIq1jY6K20WTc8W48dgukZ6qRY5M+yjGBEbL7jFyg3jXmRFGOk5s+IYCK
	rvLBoQNQL5jGCTfHQXOhKHJbPIFsnYnBF/V9rXnqn+rXOAD1sIazk/dnGXURHI11R9vU
	7Wo9MkefqOSxXFVdrNbWeHjhZJv9x5YOqoj38=
Received: by 10.204.157.142 with SMTP id b14mr3103164bkx.44.1318974547777;
	Tue, 18 Oct 2011 14:49:07 -0700 (PDT)
Received: from localhost ([78.157.92.5])
	by mx.google.com with ESMTPS id z9sm3661323bkn.7.2011.10.18.14.49.05
	(version=SSLv3 cipher=OTHER); Tue, 18 Oct 2011 14:49:06 -0700 (PDT)
Date: Wed, 19 Oct 2011 00:46:34 +0300
From: Gleb Kurtsou <gleb.kurtsou@gmail.com>
To: Olivier Smedts <olivier@gid0.org>
Message-ID: <20111018214634.GA55276@tops>
References: <CAGMYy3ssi+kAuufDTHA1z6u7jRrZwRRkCCkcO94GHNGF9Rku_w@mail.gmail.com>
	<20111002020231.GA70864@icarus.home.lan>
	<CAGMYy3sCCxOiVqeP4PVbvxnpcyoyQZoL+w3nY8STYnpUNfj6JQ@mail.gmail.com>
	<j6aorc$hf0$1@dough.gmane.org>
	<CAGMYy3tbMWU6JU1++5XmzjZTrV1=oAgRaaDE3-=MMT73F_ojQQ@mail.gmail.com>
	<CABzXLYNLhP+YFvT5Sw=hKVF6d_Yvmt+e_QjH7ghX-2MyzS0wWA@mail.gmail.com>
	<CAGMYy3s7RrP8oWC+JYgSG3hU1EXgKmnf+WQRiE2CDu4bHuz3UA@mail.gmail.com>
	<CABzXLYO-gRt6o6wrevEFwwtTneYiShYD9UbK=j0UUUBzVob5jA@mail.gmail.com>
	<20111005092603.GA1874@tops>
	<CABzXLYNU+397-scL5vGAq1rC+Dx76ceMMNt9HUag+a7iP8eL_w@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <CABzXLYNU+397-scL5vGAq1rC+Dx76ceMMNt9HUag+a7iP8eL_w@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>,
	Ivan Voras <ivoras@freebsd.org>
Subject: Re: is TMPFS still highly experimental?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Oct 2011 21:49:10 -0000

On (18/10/2011 19:46), Olivier Smedts wrote:
> 2011/10/5 Gleb Kurtsou <gleb.kurtsou@gmail.com>:
> > Free RAM is a bit tricky with virtual memory and overcommit support all
> > over the place. There are at least 3 memory hungry subsystems: buffer
> > cache, ZFS ARC, tmpfs.
> >
> > For the first two there is defined maximum size and they can be shrunk
> > in low memory situations. Tmpfs grows as much as it can trying to
> > calculate "free" memory available. Another difference is that tmpfs
> > can't be shrunk in low memory situation.
> >
> > I proposed a patch changing tmpfs memory allocation:
> > - Define maximum file system size (RAM/2 by default)
> > - Don't try to check if free memory available, check free swap
> >  instead and allocate more aggressively, i.e. allocate until
> >  swap or file system limit is reached.
> 
> Patch tested and approved ! I did not test the maximum tmpfs default
> size because I allocated a max size in my fstab.
> 
> %cat /etc/fstab
> none /tmp tmpfs rw,mode=1777,size=2147483648 0 0
You may specify human friendly size, e.g. size=2G.

I'll add live tmpfs resize once decision is made that patch can be
committed.

> 
> %df -h /tmp
> Filesystem    Size    Used   Avail Capacity  Mounted on
> tmpfs         2.0G    124k      2G     0%    /tmp
> 
> Mem: 622M Active, 351M Inact, 6491M Wired, 4940K Cache, 2160K Buf, 385M Free
> Swap: 2048M Total, 36M Used, 2012M Free, 1% Inuse
> 
> (ZFS is using all my wired memory, the ARC is now full, and I deleted
> my nearly-never-touched 8G swap in favor of a 2G swap)
> 
> A little test now :
> %dd if=/dev/zero of=/tmp/test bs=1M count=1500
> 1500+0 records in
> 1500+0 records out
> 1572864000 bytes transferred in 0.763368 secs (2060427243 bytes/sec)
> %df -h /tmp
> Filesystem    Size    Used   Avail Capacity  Mounted on
> tmpfs         2.0G    1.5G    542M    74%    /tmp
> % top
> Mem: 2559M Active, 514M Inact, 4506M Wired, 1656K Cache, 2160K Buf, 274M Free
> Swap: 2048M Total, 39M Used, 2009M Free, 1% Inuse
> 
> So tmpfs made the ZFS ARC cache shrink, without swapping. I did not
> test filling my active memory to see if the max tmpfs size was
> shrinking.
tmpfs size won't change. You'll be able to write to tmpfs until either
filesystem size or swap limit reached. It's for administrator to decide
how large tmpfs can grow. Simply put, there is no way to compete with
ZFS and buffer cache in trying to use all "free" memory, unlike those
tmpfs data can't be freed when needed.

> 
> Cheers !
> 
> >
> > Patch:
> > http://marc.info/?l=freebsd-fs&m=129747367322954&w=2
> > https://github.com/glk/freebsd-head/tree/tmpfs
> >
> > Thanks,
> > Gleb.
> >
> 
> 
> -- 
> Olivier Smedts                                                 _
>                                         ASCII ribbon campaign ( )
> e-mail: olivier@gid0.org        - against HTML email & vCards  X
> www: http://www.gid0.org    - against proprietary attachments / \
> 
>   "Il y a seulement 10 sortes de gens dans le monde :
>   ceux qui comprennent le binaire,
>   et ceux qui ne le comprennent pas."

From owner-freebsd-fs@FreeBSD.ORG  Wed Oct 19 02:23:22 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DF2BA106566B
	for <freebsd-fs@freebsd.org>; Wed, 19 Oct 2011 02:23:22 +0000 (UTC)
	(envelope-from kaduk@mit.edu)
Received: from dmz-mailsec-scanner-7.mit.edu (DMZ-MAILSEC-SCANNER-7.MIT.EDU
	[18.7.68.36]) by mx1.freebsd.org (Postfix) with ESMTP id 7DBB08FC0A
	for <freebsd-fs@freebsd.org>; Wed, 19 Oct 2011 02:23:22 +0000 (UTC)
X-AuditID: 12074424-b7ef76d0000008dc-21-4e9e34997345
Received: from mailhub-auth-1.mit.edu ( [18.9.21.35])
	by dmz-mailsec-scanner-7.mit.edu (Symantec Messaging Gateway) with SMTP
	id 0B.01.02268.9943E9E4; Tue, 18 Oct 2011 22:23:21 -0400 (EDT)
Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103])
	by mailhub-auth-1.mit.edu (8.13.8/8.9.2) with ESMTP id p9J2NLOg007413; 
	Tue, 18 Oct 2011 22:23:21 -0400
Received: from multics.mit.edu (MULTICS.MIT.EDU [18.187.1.73])
	(authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU)
	by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id p9J2NJcC015843
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT);
	Tue, 18 Oct 2011 22:23:20 -0400 (EDT)
Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308)
	id p9J2NIOR005263; Tue, 18 Oct 2011 22:23:18 -0400 (EDT)
Date: Tue, 18 Oct 2011 22:23:18 -0400 (EDT)
From: Benjamin Kaduk <kaduk@MIT.EDU>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
In-Reply-To: <20111018042838.GA6246@icarus.home.lan>
Message-ID: <alpine.GSO.1.10.1110182221500.882@multics.mit.edu>
References: <CACh33Fpz=uAp8h0Bjsi1Be=ob_94jXtN51mAHvGPkReY5MpTcg@mail.gmail.com>
	<4E9AE725.4040001@gmail.com>
	<CACh33FpgTUEqsaTuSSOmRsGgk24K4+FuL90Zu-3v3F+RRtaOHw@mail.gmail.com>
	<20111018042838.GA6246@icarus.home.lan>
User-Agent: Alpine 1.10 (GSO 962 2008-03-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrNIsWRmVeSWpSXmKPExsUixCmqrDvTZJ6fwfz/+hbHHv9ks2j8cZrd
	gcljxqf5LB5rrl5lDWCK4rJJSc3JLEst0rdL4Mp42/2fveACe8W3m6uYGhib2LoYOTkkBEwk
	1lztYoGwxSQu3FsPFOfiEBLYxyjxadZ1sISQwAZGiZP7giESB5gk3m/bwwzhNDBKHP26gRWk
	ikVAW+L9k4WMIDabgIrEzDcbwVaICOhJrF21A8jm4GAWkJK4s7YCJCws4Cbx98RFFpAwJ9AV
	f7/6gJi8AvYSyz44QEy/xSixouslO0i5qICOxOr9U8Du4RUQlDg58wmYzSxgKXHuz3W2CYyC
	s5CkZiFJLWBkWsUom5JbpZubmJlTnJqsW5ycmJeXWqRrrpebWaKXmlK6iREUpuwuKjsYmw8p
	HWIU4GBU4uHdITfPT4g1say4MvcQoyQHk5IorxAwyIX4kvJTKjMSizPii0pzUosPMUpwMCuJ
	8N7hAMrxpiRWVqUW5cOkpDlYlMR5bXY6+AkJpCeWpGanphakFsFkZTg4lCR4OUGGChalpqdW
	pGXmlCCkmTg4QYbzAA0/ZwwyvLggMbc4Mx0if4pRUUqc9xZIQgAkkVGaB9cLSyOvGMWBXhHm
	FQRZwQNMQXDdr4AGMwENPqo4F2RwSSJCSqqBUV6hvCH5IO/Dxv8HGnVfKv7yStbzdqnYWHzj
	cVnP7FmHV4q8LHWccmXDhVPF5YbiTNdTam9Jz45dmXU9RtizydaS3yfKqadC7K79cqMHkwJK
	E1flLZrH9eXvbyn/PVO0D08Qzq/g/BZSu6AtViY/fom78r5p+uaZeSZzQn3+pZ/fsFr5dWil
	EktxRqKhFnNRcSIAJ1iAlf4CAAA=
Cc: freebsd-fs@freebsd.org
Subject: network filesystems (was Re: [ZFS] Using SSD with partitions)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Oct 2011 02:23:22 -0000

On Mon, 17 Oct 2011, Jeremy Chadwick wrote:

>
> When it comes to networked filesystems on UNIX, we have very little
> choice.  NFS is the main one.  Then there's Stanford's Coda filesystem
> thing, or maybe that's now part of AFS, I don't know.  Then there's

Coda and AFS are different codebases, but implemenent similar sorts of 
things.  I believe that Coda is still "research-grade", and I know that 
OpenAFS is not ready for production deployment on FreeBSD.  (But I'm 
working on it.)

-Ben Kaduk

> sshfs, which sounds wonderful until you realise all the dependencies and
> nuances involved (mainly due to use of fuse, which we know on FreeBSD is
> not so great).  Then there's Samba (CIFS/SMB, and now with Samba 3.6
> offering SMB2 for Windows 7 clients), but that gets into issues of
> security and cannot be forwarded via SSH (e.g. VPN would be needed)
> given all its protocol some of which are UDP (not sure what the state of
> NetBIOS is).

From owner-freebsd-fs@FreeBSD.ORG  Wed Oct 19 03:49:56 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CFFAD1065675
	for <freebsd-fs@freebsd.org>; Wed, 19 Oct 2011 03:49:56 +0000 (UTC)
	(envelope-from jwd@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id BFBC28FC17
	for <freebsd-fs@freebsd.org>; Wed, 19 Oct 2011 03:49:56 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p9J3nuai032929
	for <freebsd-fs@freebsd.org>; Wed, 19 Oct 2011 03:49:56 GMT
	(envelope-from jwd@freefall.freebsd.org)
Received: (from jwd@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p9J3nuv8032928
	for freebsd-fs@freebsd.org; Wed, 19 Oct 2011 03:49:56 GMT
	(envelope-from jwd)
Date: Wed, 19 Oct 2011 03:49:56 +0000
From: John <jwd@freebsd.org>
To: freebsd-fs@freebsd.org
Message-ID: <20111019034956.GA8345@FreeBSD.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.4.2.3i
Subject: nfsstats for new nfsserver
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Oct 2011 03:49:56 -0000

Hi Folks,

   I've been looking into different performance aspects of running
the new nfsserver servering out zfs filesystems with 9.

   I've run into a nfsstat question I thought I would ask about. 
>From nfsstat on a system that's been up for a few hours:

# nfsstat
Client Info:

 ... deleted ... all 0.

Server Info:
  Getattr   Setattr    Lookup  Readlink      Read     Write    Create    Remove
1014376791     95502 1815135267     10181   8613463   6005951         0         0
   Rename      Link   Symlink     Mkdir     Rmdir   Readdir  RdirPlus    Access
      240         0         0         0         0     47964         0    547155
    Mknod    Fsstat    Fsinfo  PathConf    Commit
        0    595932        45         0     74154
Server Ret-Failed
                0
Server Faults
            0
Server Cache Stats:
   Inprog      Idem  Non-idem    Misses
     6308         0      3852 -1448802368
Server Write Gathering:
 WriteOps  WriteRPC   Opsaved
  6005951   6005951         0


   The 'Misses' value is very large. When looking at the source, if I'm
following the code correctly (and I might not be),  would it make sense
to try increasing the size of the cache, or simply disabling it? Can do
either - looking for opinions.

   The Opsaved value being 0, would it make sense to simply disable
gathering also?

   Last, just more of a comment, would it make sense to go ahead and treat
these values as unsigned?  They'll still wrap, but they would stay positive.

Thanks,
John


From owner-freebsd-fs@FreeBSD.ORG  Wed Oct 19 06:07:21 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 88BDA106564A
	for <freebsd-fs@freebsd.org>; Wed, 19 Oct 2011 06:07:21 +0000 (UTC)
	(envelope-from kaduk@mit.edu)
Received: from dmz-mailsec-scanner-2.mit.edu (DMZ-MAILSEC-SCANNER-2.MIT.EDU
	[18.9.25.13]) by mx1.freebsd.org (Postfix) with ESMTP id 2B1C78FC0C
	for <freebsd-fs@freebsd.org>; Wed, 19 Oct 2011 06:07:20 +0000 (UTC)
X-AuditID: 1209190d-b7f726d0000008d1-a3-4e9e6918e402
Received: from mailhub-auth-4.mit.edu ( [18.7.62.39])
	by dmz-mailsec-scanner-2.mit.edu (Symantec Messaging Gateway) with SMTP
	id 6F.7C.02257.8196E9E4; Wed, 19 Oct 2011 02:07:20 -0400 (EDT)
Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103])
	by mailhub-auth-4.mit.edu (8.13.8/8.9.2) with ESMTP id p9J67Ko2008830; 
	Wed, 19 Oct 2011 02:07:20 -0400
Received: from multics.mit.edu (MULTICS.MIT.EDU [18.187.1.73])
	(authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU)
	by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id p9J67Ieh008769
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT);
	Wed, 19 Oct 2011 02:07:19 -0400 (EDT)
Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308)
	id p9J67IUA010311; Wed, 19 Oct 2011 02:07:18 -0400 (EDT)
Date: Wed, 19 Oct 2011 02:07:17 -0400 (EDT)
From: Benjamin Kaduk <kaduk@MIT.EDU>
To: rmacklem@freebsd.org
Message-ID: <alpine.GSO.1.10.1110190139070.882@multics.mit.edu>
User-Agent: Alpine 1.10 (GSO 962 2008-03-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrDIsWRmVeSWpSXmKPExsUixG6nriuROc/P4EwXv8Wxxz/ZLOb+3c/o
	wOQx49N8lgDGKC6blNSczLLUIn27BK6MB7eOMhc0i1ccvuzRwHiRp4uRg0NCwETi6BrVLkZO
	IFNM4sK99WxdjFwcQgL7GCUmHtvDCOFsYJSY/HcKO4RzgEli2d+9jCAtQgINjBLTjweB2CwC
	2hIvjj9lArHZBFQkZr7ZyAZiiwhISJy8d4wZZBuzgJTEnbUVIKawgLHE1DYLkApeAXuJa0eX
	s4LYogI6Eqv3T2GBiAtKnJz5BMxmFrCU+Lf2F+sERv5ZSFKzkKQWMDKtYpRNya3SzU3MzClO
	TdYtTk7My0st0jXSy80s0UtNKd3ECAo0TkneHYzvDiodYhTgYFTi4d0hN89PiDWxrLgy9xCj
	JAeTkiivWgZQiC8pP6UyI7E4I76oNCe1+BCjBAezkgjv4TSgHG9KYmVValE+TEqag0VJnLdw
	h4OfkEB6YklqdmpqQWoRTFaGg0NJgnc2yFDBotT01Iq0zJwShDQTByfIcB6g4bEgNbzFBYm5
	xZnpEPlTjIpSQKNBEgIgiYzSPLheWCJ4xSgO9IowbwJIFQ8wicB1vwIazAQ0+KjiXJDBJYkI
	KakGRkeH9d8+756/PX7nX5V/BSlFZ2a9mM6+ijVpgmjvcwl92/fpiavOVM2XPL8g++iRR+vX
	xSQEV7MZ3wxhXL32gl+hz9dA3mf3r/ctsVn1j+vjuQhhJ+nlSRM2VWwVzmsSmXZl71LJ5Haf
	7eZC4VLbGupPP1+zv1Nr7e/v8+4E5JhtbzFN1Zuxk1eJpTgj0VCLuag4EQCPPn543wIAAA==
Cc: freebsd-fs@freebsd.org
Subject: lock status of dvp in lookup error return?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Oct 2011 06:07:21 -0000

Hi Rick,

In tracking down a panic trying to recursively lock a vnode in openafs, I 
started questioning my behavior in the ISDOTDOT case, in particular 
whether to drop the dvp lock before the actual call over the network; this 
naturally led me to look at the NFS code as a reference.
Unfortunately, this left me more confused than when I began ...

sys/fs/nfs_clvnops.c, in nfs_lookup():
   1211          if (flags & ISDOTDOT) {
   1212                  ltype = NFSVOPISLOCKED(dvp);
   1213                  error = vfs_busy(mp, MBF_NOWAIT);
   1214                  if (error != 0) {
   1215                      vfs_ref(mp);
   1216                      NFSVOPUNLOCK(dvp, 0);
   1217                      error = vfs_busy(mp, 0);
   1218                      NFSVOPLOCK(dvp, ltype | LK_RETRY);

If we fail to busy the mountpoint, drop the directory lock and try again, 
then relock dvp afterward.

   1219                      vfs_rel(mp);
   1220                      if (error == 0 && (dvp->v_iflag & VI_DOOMED)) {
   1221                          vfs_unbusy(mp);
   1222                          error = ENOENT;
   1223                      }
   1224                      if (error != 0)
   1225                          return (error);

But if the second vfs_busy failed, or dvp is DOOMED, return with dvp 
locked.

   1226                  }
   1227                  NFSVOPUNLOCK(dvp, 0);

But now we always unlock dvp.

   1228                  error = nfscl_nget(mp, dvp, nfhp, cnp, td, &np, NULL,
   1229                      cnp->cn_lkflags);

The call to the network (?)

   1230                  if (error == 0)
   1231                      newvp = NFSTOV(np);
   1232                  vfs_unbusy(mp);
   1233                  if (newvp != dvp)
   1234                      NFSVOPLOCK(dvp, ltype | LK_RETRY);
   1235                  if (dvp->v_iflag & VI_DOOMED) {
   1236                      if (error == 0) {
   1237                          if (newvp == dvp)
   1238                                  vrele(newvp);
   1239                          else
   1240                                  vput(newvp);
   1241                      }
   1242                      error = ENOENT;
   1243                  }
   1244                  if (error != 0)
   1245                      return (error);

And here if there was an error hearing from the network, we return with 
dvp still unlocked.

   1246                  if (attrflag)
   1247                      (void) nfscl_loadattrcache(&newvp, &nfsva, NULL, NULL,
   1248                              0, 1);


So, I'm still confused about whether I should be unlocking dvp in the 
error case for ISDOTDOT (though presumably looking at other filesystems 
would help).  This inconsistency in the NFS client looks like a bug at my 
current level of understanding -- what do you think?

Thanks,

Ben Kaduk

From owner-freebsd-fs@FreeBSD.ORG  Wed Oct 19 06:21:45 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6B4B41065673
	for <freebsd-fs@FreeBSD.org>; Wed, 19 Oct 2011 06:21:45 +0000 (UTC)
	(envelope-from florian@wagner-flo.net)
Received: from umbracor.wagner-flo.net (umbracor.wagner-flo.net
	[213.165.81.202])
	by mx1.freebsd.org (Postfix) with ESMTP id 09A8D8FC08
	for <freebsd-fs@FreeBSD.org>; Wed, 19 Oct 2011 06:21:44 +0000 (UTC)
Received: from auedv3.syscomp.de (umbracor.wagner-flo.net [127.0.0.1])
	by umbracor.wagner-flo.net (Postfix) with ESMTPSA id D6B373C06C30;
	Wed, 19 Oct 2011 08:21:46 +0200 (CEST)
Date: Wed, 19 Oct 2011 08:21:39 +0200
From: Florian Wagner <florian@wagner-flo.net>
To: Andriy Gapon <avg@FreeBSD.org>
Message-ID: <20111019082139.1661868e@auedv3.syscomp.de>
In-Reply-To: <4E9ACA9F.5090308@FreeBSD.org>
References: <20111015214347.09f68e4e@naclador.mos32.de>
	<4E9ACA9F.5090308@FreeBSD.org>
X-Mailer: Claws Mail 3.7.6 (GTK+ 2.20.1; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=PGP-SHA1;
	boundary="Sig_/MqGgbxzbrnCGjY92aocJCb=";
	protocol="application/pgp-signature"
Cc: freebsd-fs@FreeBSD.org
Subject: Re: Extending zfsboot.c to allow selecting filesystem from
 boot.config
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Oct 2011 06:21:45 -0000

--Sig_/MqGgbxzbrnCGjY92aocJCb=
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

> on 15/10/2011 22:43 Florian Wagner said the following:
> > Hi,
> >=20
> > from looking at the code in sys/boot/i386/zfsboot/zfsboot.c the ZFS
> > aware boot block already allows to select pool to load the kernel
> > from by adding <POOL>:<FILE TO BOOT> to the boot.config. As this
> > code calls the zfs_mount_pool function it will look for the bootfs
> > property on the new pool or use its root dataset to get the file
> > from there.
> >=20
> > How much work would it be to extend the loader to also allow
> > selecting a ZFS filesystem?
> >=20
> > What I'd like to do is place a boot.config on the (otherwise empty)
> > root of my system pool and then tell it to get the loader from
> > another filesystem by putting
> > "rpool/root/stable-8-r226381:/boot/zfsloader" in there.
>=20
> Please check out the following changes:
> https://gitorious.org/~avg/freebsd/avgbsd/commit/8c3808c4bb2a2cd746db3e9c=
46871c9bdf943ef6
> https://gitorious.org/~avg/freebsd/avgbsd/commit/0b4279c0d366d9f2b5bb9d4c=
0dd3229d8936d92b
> https://gitorious.org/~avg/freebsd/avgbsd/commit/b29ab78b079f27918de1683e=
88bcb1817a0e5969
> https://gitorious.org/~avg/freebsd/avgbsd/commit/f49add15516dfd582258b682=
0b8f0254cf9419a3
> https://gitorious.org/~avg/freebsd/avgbsd/commit/e072b443b0f59fe1ff54a70d=
2437d63698bbf597
> https://gitorious.org/~avg/freebsd/avgbsd/commit/f701760c10812c5b6925352f=
b003408c19170063

Looks great!

I've applied the patches to my checkout of Stable 8 and gave the
resulting gptzfsboot and zfsloader a cursory try in a virtual machine.

Commit f701760c10812c5b6925352fb003408c19170063 breaks the build of the
non-ZFS-enabled bootcode. The syntax is wrong in the following snippet
if LOADER_ZFS_SUPPORT is not defined. Moving the closing bracket ("};")
right after the second #endif into the preprocessor conditional fixes
that.

@@ -52,14 +52,21 @@
     u_int32_t	howto;
     u_int32_t	bootdev;
     u_int32_t	bootflags;
+#ifdef LOADER_ZFS_SUPPORT
     union {
+#endif
 	struct {
 	    u_int32_t	pxeinfo;
 	    u_int32_t	res2;
 	};
+#ifdef LOADER_ZFS_SUPPORT
 	uint64_t	zfspool;
+#endif
     };
     u_int32_t	bootinfo;
+#ifdef LOADER_ZFS_SUPPORT
+    uint64_t	zfsroot;
+#endif
 } *kargs;


The only thing I was a bit confused by is that on the boot prompt only
the pool and filename to be booted are printed.

Apart from that it worked as expected. Not having to set
vfs.root.mountfrom in the loader is nice.


Regards and thanks
Florian Wagner

--Sig_/MqGgbxzbrnCGjY92aocJCb=
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iEYEARECAAYFAk6ebHMACgkQLvW/2gp2pPw0rQCeN81YhLpkyZtw+KyMScOOSl1s
bxgAoILoMmdsz1lWUC9ex6wunDl+rPRA
=F9Av
-----END PGP SIGNATURE-----

--Sig_/MqGgbxzbrnCGjY92aocJCb=--

From owner-freebsd-fs@FreeBSD.ORG  Wed Oct 19 11:18:04 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E51A71065670
	for <freebsd-fs@freebsd.org>; Wed, 19 Oct 2011 11:18:04 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id 7F6FD8FC15
	for <freebsd-fs@freebsd.org>; Wed, 19 Oct 2011 11:18:03 +0000 (UTC)
Received: from alf.home (alf.kiev.zoral.com.ua [10.1.1.177])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p9JBHkVe036547
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 19 Oct 2011 14:17:47 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from alf.home (kostik@localhost [127.0.0.1])
	by alf.home (8.14.5/8.14.5) with ESMTP id p9JBHkDG063304;
	Wed, 19 Oct 2011 14:17:46 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by alf.home (8.14.5/8.14.5/Submit) id p9JBHksA063303;
	Wed, 19 Oct 2011 14:17:46 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: alf.home: kostik set sender to kostikbel@gmail.com
	using -f
Date: Wed, 19 Oct 2011 14:17:46 +0300
From: Kostik Belousov <kostikbel@gmail.com>
To: Benjamin Kaduk <kaduk@MIT.EDU>
Message-ID: <20111019111746.GP50300@deviant.kiev.zoral.com.ua>
References: <alpine.GSO.1.10.1110190139070.882@multics.mit.edu>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="rQ7Ovc9/RBrrr0/1"
Content-Disposition: inline
In-Reply-To: <alpine.GSO.1.10.1110190139070.882@multics.mit.edu>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-3.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,
	DNS_FROM_OPENWHOIS autolearn=no version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: freebsd-fs@freebsd.org, rmacklem@freebsd.org
Subject: Re: lock status of dvp in lookup error return?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Oct 2011 11:18:05 -0000


--rQ7Ovc9/RBrrr0/1
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Oct 19, 2011 at 02:07:17AM -0400, Benjamin Kaduk wrote:
> Hi Rick,
>=20
> In tracking down a panic trying to recursively lock a vnode in openafs, I=
=20
> started questioning my behavior in the ISDOTDOT case, in particular=20
> whether to drop the dvp lock before the actual call over the network; thi=
s=20
> naturally led me to look at the NFS code as a reference.
> Unfortunately, this left me more confused than when I began ...
>=20
> sys/fs/nfs_clvnops.c, in nfs_lookup():
>   1211          if (flags & ISDOTDOT) {
>   1212                  ltype =3D NFSVOPISLOCKED(dvp);
>   1213                  error =3D vfs_busy(mp, MBF_NOWAIT);
>   1214                  if (error !=3D 0) {
>   1215                      vfs_ref(mp);
>   1216                      NFSVOPUNLOCK(dvp, 0);
>   1217                      error =3D vfs_busy(mp, 0);
>   1218                      NFSVOPLOCK(dvp, ltype | LK_RETRY);
>=20
> If we fail to busy the mountpoint, drop the directory lock and try again,=
=20
> then relock dvp afterward.
>=20
>   1219                      vfs_rel(mp);
>   1220                      if (error =3D=3D 0 && (dvp->v_iflag & VI_DOOM=
ED)) {
>   1221                          vfs_unbusy(mp);
>   1222                          error =3D ENOENT;
>   1223                      }
>   1224                      if (error !=3D 0)
>   1225                          return (error);
>=20
> But if the second vfs_busy failed, or dvp is DOOMED, return with dvp=20
> locked.
>=20
>   1226                  }
>   1227                  NFSVOPUNLOCK(dvp, 0);
>=20
> But now we always unlock dvp.
>=20
>   1228                  error =3D nfscl_nget(mp, dvp, nfhp, cnp, td, &np,=
=20
>   NULL,
>   1229                      cnp->cn_lkflags);
>=20
> The call to the network (?)
>=20
>   1230                  if (error =3D=3D 0)
>   1231                      newvp =3D NFSTOV(np);
>   1232                  vfs_unbusy(mp);
>   1233                  if (newvp !=3D dvp)
>   1234                      NFSVOPLOCK(dvp, ltype | LK_RETRY);
Did you missed line 1234 ?

The code is the copy of the vn_vget_ino(). The logic in the function
might be slightly easier to follow.

>   1235                  if (dvp->v_iflag & VI_DOOMED) {
>   1236                      if (error =3D=3D 0) {
>   1237                          if (newvp =3D=3D dvp)
>   1238                                  vrele(newvp);
>   1239                          else
>   1240                                  vput(newvp);
>   1241                      }
>   1242                      error =3D ENOENT;
>   1243                  }
>   1244                  if (error !=3D 0)
>   1245                      return (error);
>=20
> And here if there was an error hearing from the network, we return with=
=20
> dvp still unlocked.
>=20
>   1246                  if (attrflag)
>   1247                      (void) nfscl_loadattrcache(&newvp, &nfsva,=20
>   NULL, NULL,
>   1248                              0, 1);
>=20
>=20
> So, I'm still confused about whether I should be unlocking dvp in the=20
> error case for ISDOTDOT (though presumably looking at other filesystems=
=20
> would help).  This inconsistency in the NFS client looks like a bug at my=
=20
> current level of understanding -- what do you think?
>=20
> Thanks,
>=20
> Ben Kaduk
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

--rQ7Ovc9/RBrrr0/1
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (FreeBSD)

iEYEARECAAYFAk6esdoACgkQC3+MBN1Mb4gc7wCg45DKU1WoaUBlwI1FrJl1HnPy
pmsAnRkGuubGe5QT55xcNtTpq69GoSFQ
=5Bav
-----END PGP SIGNATURE-----

--rQ7Ovc9/RBrrr0/1--

From owner-freebsd-fs@FreeBSD.ORG  Wed Oct 19 14:53:52 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A7A0C1065673;
	Wed, 19 Oct 2011 14:53:52 +0000 (UTC) (envelope-from kaduk@mit.edu)
Received: from dmz-mailsec-scanner-1.mit.edu (DMZ-MAILSEC-SCANNER-1.MIT.EDU
	[18.9.25.12]) by mx1.freebsd.org (Postfix) with ESMTP id 32B868FC12;
	Wed, 19 Oct 2011 14:53:51 +0000 (UTC)
X-AuditID: 1209190c-b7fd26d0000008df-52-4e9ee47f8e7b
Received: from mailhub-auth-3.mit.edu ( [18.9.21.43])
	by dmz-mailsec-scanner-1.mit.edu (Symantec Messaging Gateway) with SMTP
	id 48.C4.02271.F74EE9E4; Wed, 19 Oct 2011 10:53:51 -0400 (EDT)
Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103])
	by mailhub-auth-3.mit.edu (8.13.8/8.9.2) with ESMTP id p9JErpeG023501; 
	Wed, 19 Oct 2011 10:53:51 -0400
Received: from multics.mit.edu (MULTICS.MIT.EDU [18.187.1.73])
	(authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU)
	by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id p9JErnOD026397
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT);
	Wed, 19 Oct 2011 10:53:50 -0400 (EDT)
Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308)
	id p9JErnEL016061; Wed, 19 Oct 2011 10:53:49 -0400 (EDT)
Date: Wed, 19 Oct 2011 10:53:48 -0400 (EDT)
From: Benjamin Kaduk <kaduk@MIT.EDU>
To: Kostik Belousov <kostikbel@gmail.com>
In-Reply-To: <20111019111746.GP50300@deviant.kiev.zoral.com.ua>
Message-ID: <alpine.GSO.1.10.1110191052540.882@multics.mit.edu>
References: <alpine.GSO.1.10.1110190139070.882@multics.mit.edu>
	<20111019111746.GP50300@deviant.kiev.zoral.com.ua>
User-Agent: Alpine 1.10 (GSO 962 2008-03-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrMIsWRmVeSWpSXmKPExsUixCmqrVv/ZJ6fwYOr8hbHHv9ks2iY9pjN
	Yu7f/YwOzB4zPs1n8dg56y57AFMUl01Kak5mWWqRvl0CV8afu00sBfOFKp4fbmJrYJzE3cXI
	ySEhYCJxdO1RNghbTOLCvfVANheHkMA+Rok9Ta1MEM4GRom3jd9YIJwDTBLbvh1kh3AaGCV6
	N3eyg/SzCGhLLF+9E8xmE1CRmPlmI9hcEQFNiWub7jOB2MwCBhIz2uYzgtjCAuYS69ftYwWx
	OQXsJWZtPgdWwwtkb1r+CCwuJFAkMevNfLC4qICOxOr9U1ggagQlTs58wgIx01Li3J/rbBMY
	BWchSc1CklrAyLSKUTYlt0o3NzEzpzg1Wbc4OTEvL7VI11AvN7NELzWldBMjOGwleXYwvjmo
	dIhRgINRiYd3h9w8PyHWxLLiytxDjJIcTEqivM2PgUJ8SfkplRmJxRnxRaU5qcWHGCU4mJVE
	eF/dAMrxpiRWVqUW5cOkpDlYlMR5D+5w8BMSSE8sSc1OTS1ILYLJynBwKEnw/gMZKliUmp5a
	kZaZU4KQZuLgBBnOAzT8O0gNb3FBYm5xZjpE/hSjopQ473mQhABIIqM0D64XllZeMYoDvSLM
	+xukigeYkuC6XwENZgIafFRxLsjgkkSElFQDo2LU+tgmwcvBsz/M/qbSI7u5snL9vs02c5dE
	BJ/6n6Eur6L4eZLEjv9xIo/OZz2arN/PeSP1McuLD7I54X2pTko2KR5Pzy+a0TxlG29a+Cu2
	dFGOyLJXk7nXryh6GRig9r/SmFvS2u6VTffcAHbuqP+qtTP7L8z3eNr1L+74S95zgg/6AwqL
	lFiKMxINtZiLihMBdIwNewYDAAA=
Cc: freebsd-fs@freebsd.org, rmacklem@freebsd.org
Subject: Re: lock status of dvp in lookup error return?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Oct 2011 14:53:52 -0000

On Wed, 19 Oct 2011, Kostik Belousov wrote:

> On Wed, Oct 19, 2011 at 02:07:17AM -0400, Benjamin Kaduk wrote:
>> Hi Rick,
>>
>> In tracking down a panic trying to recursively lock a vnode in openafs, I
>> started questioning my behavior in the ISDOTDOT case, in particular
>> whether to drop the dvp lock before the actual call over the network; this
>> naturally led me to look at the NFS code as a reference.
>> Unfortunately, this left me more confused than when I began ...
>>
>> sys/fs/nfs_clvnops.c, in nfs_lookup():
>>   1211          if (flags & ISDOTDOT) {
>>   1212                  ltype = NFSVOPISLOCKED(dvp);
>>   1213                  error = vfs_busy(mp, MBF_NOWAIT);
>>   1214                  if (error != 0) {
>>   1215                      vfs_ref(mp);
>>   1216                      NFSVOPUNLOCK(dvp, 0);
>>   1217                      error = vfs_busy(mp, 0);
>>   1218                      NFSVOPLOCK(dvp, ltype | LK_RETRY);
>>
>> If we fail to busy the mountpoint, drop the directory lock and try again,
>> then relock dvp afterward.
>>
>>   1219                      vfs_rel(mp);
>>   1220                      if (error == 0 && (dvp->v_iflag & VI_DOOMED)) {
>>   1221                          vfs_unbusy(mp);
>>   1222                          error = ENOENT;
>>   1223                      }
>>   1224                      if (error != 0)
>>   1225                          return (error);
>>
>> But if the second vfs_busy failed, or dvp is DOOMED, return with dvp
>> locked.
>>
>>   1226                  }
>>   1227                  NFSVOPUNLOCK(dvp, 0);
>>
>> But now we always unlock dvp.
>>
>>   1228                  error = nfscl_nget(mp, dvp, nfhp, cnp, td, &np,
>>   NULL,
>>   1229                      cnp->cn_lkflags);
>>
>> The call to the network (?)
>>
>>   1230                  if (error == 0)
>>   1231                      newvp = NFSTOV(np);
>>   1232                  vfs_unbusy(mp);
>>   1233                  if (newvp != dvp)
>>   1234                      NFSVOPLOCK(dvp, ltype | LK_RETRY);
> Did you missed line 1234 ?
>
> The code is the copy of the vn_vget_ino(). The logic in the function
> might be slightly easier to follow.

Ah, I did miss that, thanks.
Maybe 0200h is not the best time to do these things ...

-Ben

From owner-freebsd-fs@FreeBSD.ORG  Wed Oct 19 15:39:53 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C5F851065687
	for <freebsd-fs@FreeBSD.org>; Wed, 19 Oct 2011 15:39:53 +0000 (UTC)
	(envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 171338FC1D
	for <freebsd-fs@FreeBSD.org>; Wed, 19 Oct 2011 15:39:52 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id SAA29079;
	Wed, 19 Oct 2011 18:39:50 +0300 (EEST)
	(envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
	by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1RGYFC-000GOA-33; Wed, 19 Oct 2011 18:39:50 +0300
Message-ID: <4E9EEF45.9020404@FreeBSD.org>
Date: Wed, 19 Oct 2011 18:39:49 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:7.0.1) Gecko/20111002 Thunderbird/7.0.1
MIME-Version: 1.0
To: Florian Wagner <florian@wagner-flo.net>
References: <20111015214347.09f68e4e@naclador.mos32.de>
	<4E9ACA9F.5090308@FreeBSD.org>
	<20111019082139.1661868e@auedv3.syscomp.de>
In-Reply-To: <20111019082139.1661868e@auedv3.syscomp.de>
X-Enigmail-Version: undefined
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org
Subject: Re: Extending zfsboot.c to allow selecting filesystem from
	boot.config
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Oct 2011 15:39:53 -0000

on 19/10/2011 09:21 Florian Wagner said the following:

Somewhere about this line an attribution of my text was expected :-)

>> on 15/10/2011 22:43 Florian Wagner said the following:
>>> Hi,
>>> 
>>> from looking at the code in sys/boot/i386/zfsboot/zfsboot.c the ZFS 
>>> aware boot block already allows to select pool to load the kernel from
>>> by adding <POOL>:<FILE TO BOOT> to the boot.config. As this code calls
>>> the zfs_mount_pool function it will look for the bootfs property on the
>>> new pool or use its root dataset to get the file from there.
>>> 
>>> How much work would it be to extend the loader to also allow selecting
>>> a ZFS filesystem?
>>> 
>>> What I'd like to do is place a boot.config on the (otherwise empty) 
>>> root of my system pool and then tell it to get the loader from another
>>> filesystem by putting "rpool/root/stable-8-r226381:/boot/zfsloader" in
>>> there.
>> 
>> Please check out the following changes: 
>> https://gitorious.org/~avg/freebsd/avgbsd/commit/8c3808c4bb2a2cd746db3e9c46871c9bdf943ef6
>>
>> 
https://gitorious.org/~avg/freebsd/avgbsd/commit/0b4279c0d366d9f2b5bb9d4c0dd3229d8936d92b
>> https://gitorious.org/~avg/freebsd/avgbsd/commit/b29ab78b079f27918de1683e88bcb1817a0e5969
>>
>> 
https://gitorious.org/~avg/freebsd/avgbsd/commit/f49add15516dfd582258b6820b8f0254cf9419a3
>> https://gitorious.org/~avg/freebsd/avgbsd/commit/e072b443b0f59fe1ff54a70d2437d63698bbf597
>>
>> 
https://gitorious.org/~avg/freebsd/avgbsd/commit/f701760c10812c5b6925352fb003408c19170063
> 
> Looks great!
> 
> I've applied the patches to my checkout of Stable 8 and gave the resulting
> gptzfsboot and zfsloader a cursory try in a virtual machine.

Thank you for testing!

> Commit f701760c10812c5b6925352fb003408c19170063 breaks the build of the 
> non-ZFS-enabled bootcode. The syntax is wrong in the following snippet if
> LOADER_ZFS_SUPPORT is not defined. Moving the closing bracket ("};") right
> after the second #endif into the preprocessor conditional fixes that.

Thank you for reporting this!

> @@ -52,14 +52,21 @@ u_int32_t	howto; u_int32_t	bootdev; u_int32_t
> bootflags; +#ifdef LOADER_ZFS_SUPPORT union { +#endif struct { u_int32_t
> pxeinfo; u_int32_t	res2; }; +#ifdef LOADER_ZFS_SUPPORT uint64_t	zfspool; 
> +#endif }; u_int32_t	bootinfo; +#ifdef LOADER_ZFS_SUPPORT +    uint64_t
> zfsroot; +#endif } *kargs;
> 
> 
> The only thing I was a bit confused by is that on the boot prompt only the
> pool and filename to be booted are printed.

Do you mean the (gpt)zfsboot prompt?

> Apart from that it worked as expected. Not having to set vfs.root.mountfrom
> in the loader is nice.

Thanks!
-- 
Andriy Gapon

From owner-freebsd-fs@FreeBSD.ORG  Wed Oct 19 15:55:41 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 84010106564A
	for <freebsd-fs@freebsd.org>; Wed, 19 Oct 2011 15:55:41 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
	[131.104.91.36])
	by mx1.freebsd.org (Postfix) with ESMTP id 17CBA8FC0C
	for <freebsd-fs@freebsd.org>; Wed, 19 Oct 2011 15:55:39 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ap8EAMrynk6DaFvO/2dsb2JhbABEhHWlBIFuAQEBAwEBAQEgBCcgCwUWGAICDRkCKQEJJgYIBwQBHASHXwijcJIBgTCFV4EUBJFkghqRcw
X-IronPort-AV: E=Sophos;i="4.69,373,1315195200"; d="scan'208";a="140488433"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-annu-pri.mail.uoguelph.ca with ESMTP; 19 Oct 2011 11:55:32 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id A07ACB3F7E;
	Wed, 19 Oct 2011 11:55:32 -0400 (EDT)
Date: Wed, 19 Oct 2011 11:55:32 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: John <jwd@freebsd.org>
Message-ID: <1951661642.102471.1319039732646.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20111019034956.GA8345@FreeBSD.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: freebsd-fs@freebsd.org
Subject: Re: nfsstats for new nfsserver
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Oct 2011 15:55:41 -0000

John wrote:
> Hi Folks,
> 
> I've been looking into different performance aspects of running
> the new nfsserver servering out zfs filesystems with 9.
> 
> I've run into a nfsstat question I thought I would ask about.
> >From nfsstat on a system that's been up for a few hours:
> 
> # nfsstat
> Client Info:
> 
> ... deleted ... all 0.
> 
> Server Info:
> Getattr Setattr Lookup Readlink Read Write Create Remove
> 1014376791 95502 1815135267 10181 8613463 6005951 0 0
> Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access
> 240 0 0 0 0 47964 0 547155
> Mknod Fsstat Fsinfo PathConf Commit
> 0 595932 45 0 74154
> Server Ret-Failed
> 0
> Server Faults
> 0
> Server Cache Stats:
> Inprog Idem Non-idem Misses
> 6308 0 3852 -1448802368
> Server Write Gathering:
> WriteOps WriteRPC Opsaved
> 6005951 6005951 0
> 
> 
> The 'Misses' value is very large. When looking at the source, if I'm
> following the code correctly (and I might not be), would it make sense
> to try increasing the size of the cache, or simply disabling it? Can
> do
> either - looking for opinions.
> 
Well, if everything is working well, you should always have misses and
nothing else. A hit means that an RPC has been retried. On TCP, this
implies that there should have been a network partitioning for a
significant period of time (or the client is retrying too agressively,
since the TCP layer should take care of packet loss, etc).

I'm actually surprised you see any hits. Are you using UDP? (In that
case there probably will be spurious RPC retries and the cache avoids
re-doing the RPC for those cases.)

The DRC is weird, in that it does not improve performance, but
correctness. Since the overhead should be minimal, I wouldn't
disable it, especially if you are using UDP.

> The Opsaved value being 0, would it make sense to simply disable
> gathering also?
> 
The new server doesn't do write gathering. It was mainly useful for
NFSv2, where all writes were synchronous.

--> The Opsaved field will always be 0. It's there for the default
    "nfsstat" output for backwards compatibility only.

If you use "nfsstat -e -s" you'll get the newer style of server stats.

> Last, just more of a comment, would it make sense to go ahead and
> treat
> these values as unsigned? They'll still wrap, but they would stay
> positive.
> 
Yea, I suppose the printfs should be changed someday, rick

> Thanks,
> John
> 
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

From owner-freebsd-fs@FreeBSD.ORG  Wed Oct 19 16:04:16 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 44213106564A
	for <freebsd-fs@freebsd.org>; Wed, 19 Oct 2011 16:04:16 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
	[131.104.91.36])
	by mx1.freebsd.org (Postfix) with ESMTP id E36CA8FC0A
	for <freebsd-fs@freebsd.org>; Wed, 19 Oct 2011 16:04:15 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: At8EANTznk6DaFvO/2dsb2JhbABEhHWhRYM/gW4BAQEEAQEBIAQnIAsbDgMDAQIBERkCBB8GAQkeCAYIBwQBHASHZ6NukgGDMINXgRQEkWSCGoowh0M
X-IronPort-AV: E=Sophos;i="4.69,373,1315195200"; d="scan'208";a="140490268"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-annu-pri.mail.uoguelph.ca with ESMTP; 19 Oct 2011 12:04:14 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 09A30B3F6A;
	Wed, 19 Oct 2011 12:04:15 -0400 (EDT)
Date: Wed, 19 Oct 2011 12:04:15 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Mark Saad <nonesuch@longcount.org>
Message-ID: <436946680.103454.1319040255028.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <201109291600.p8TG0OI4040954@freefall.freebsd.org>
MIME-Version: 1.0
Content-Type: multipart/mixed; 
	boundary="----=_Part_103453_1537264991.1319040255027"
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: freebsd-fs@FreeBSD.org
Subject: Re: kern/156168: [nfs] [panic] Kernel panic under concurrent access
 over NFS
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Oct 2011 16:04:16 -0000

------=_Part_103453_1537264991.1319040255027
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

Mark Saad wrote:
> The following reply was made to PR kern/156168; it has been noted by
> GNATS.
> 
> From: Mark Saad <nonesuch@longcount.org>
> To: bug-followup@FreeBSD.org, niakrisn@gmail.com
> Cc:
> Subject: Re: kern/156168: [nfs] [panic] Kernel panic under concurrent
> access
> over NFS
> Date: Thu, 29 Sep 2011 11:32:12 -0400
> 
> All
> I am seeing a similar crash on 7.3-RELEASE-p2 amd64 when using
> apache-1.3.34 with accf_httpd and a nfs docroot
> The servers that have crashed are all FreeBSD 7.3-RELEASE amd64.
> Hardware is HP Dl145 g2
> They have 2G of ram and 2G swap with one single core opteron cpu.
> 
> 
> We are using the following sysctls .
> 
> kern.ipc.maxsockbuf=2097152
> kern.ipc.nmbclusters=32768
> kern.ipc.somaxconn=1024
> kern.maxfiles=131072
> kern.maxfilesperproc=32768
> net.inet.tcp.inflight.enable=0
> net.inet.tcp.path_mtu_discovery=0
> net.inet.tcp.recvbuf_inc=524288
> net.inet.tcp.recvbuf_max=8388608
> net.inet.tcp.recvspace=32768
> net.inet.tcp.sendbuf_inc=16384
> net.inet.tcp.sendbuf_max=8388608
> net.inet.tcp.sendspace=32768
> net.inet.udp.recvspace=42080
> net.isr.direct=1
> vm.pmap.shpgperproc=600
> 
> 
> Up time prior to the crash was not the other system was up for 11 days
> this one was 6 days.
> 
> Here is the contents of my crash
> 
> 
> [root@web29 /var/crash]# kgdb /boot/kernel/kernel /var/crash/vmcore.0
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and
> you are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for
> details.
> This GDB was configured as "amd64-marcel-freebsd"...
> 
> Unread portion of the kernel message buffer:
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address = 0x258
> fault code = supervisor read data, page not present
> instruction pointer = 0x8:0xffffffff8051a66d
> stack pointer = 0x10:0xffffff803e69b1c0
> frame pointer = 0x10:0xffffff0001b50ae0
> code segment = base 0x0, limit 0xfffff, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags = interrupt enabled, resume, IOPL = 0
> current process = 9336 (libhttpd.ep)
> trap number = 12
> panic: page fault
> cpuid = 0
> Uptime: 6d5h18m39s
> Physical memory: 2034 MB
> Dumping 1451 MB: 1436 1420 1404 1388 1372 1356 1340 1324 1308 1292
> 1276 1260 1244 1228 1212 1196 1180 1164 1148 1132 1116 1100 1084 1068
> 1052 1036 1020 1004 988 972 956 940 924 908 892 876 860 844 828 812
> 796 780 764 748 732 716 700 684 668 652 636 620 604 588 572 556 540
> 524 508 492 476 460 444 428 412 396 380 364 348 332 316 300 284 268
> 252 236 220 204 188 172 156 140 124 108 92 76 60 44 28 12
> 
> Reading symbols from /boot/kernel/accf_http.ko...Reading symbols from
> /boot/kernel/accf_http.ko.symbols...done.
> done.
> Loaded symbols for /boot/kernel/accf_http.ko
> #0 doadump () at pcpu.h:195
> 195 pcpu.h: No such file or directory.
> in pcpu.h
> (kgdb) bt
> #0 doadump () at pcpu.h:195
> #1 0x0000000000000004 in ?? ()
> #2 0xffffffff805285f9 in boot (howto=260) at
> /usr/src/sys/kern/kern_shutdown.c:418
> #3 0xffffffff80528a02 in panic (fmt=0x104 <Address 0x104 out of
> bounds>) at /usr/src/sys/kern/kern_shutdown.c:574
> #4 0xffffffff807ec813 in trap_fatal (frame=0xffffff0001b50ae0,
> eva=Variable "eva" is not available.
> ) at /usr/src/sys/amd64/amd64/trap.c:777
> #5 0xffffffff807ecbe5 in trap_pfault (frame=0xffffff803e69b110,
> usermode=0) at /usr/src/sys/amd64/amd64/trap.c:693
> #6 0xffffffff807ed50c in trap (frame=0xffffff803e69b110) at
> /usr/src/sys/amd64/amd64/trap.c:464
> #7 0xffffffff807d614e in calltrap () at
> /usr/src/sys/amd64/amd64/exception.S:218
> #8 0xffffffff8051a66d in _mtx_lock_sleep (m=0xffffff002f3d7a80,
> tid=18446742974226565856, opts=Variable "opts" is not available.
> )
> at /usr/src/sys/kern/kern_mutex.c:339
> #9 0xffffffff80701f60 in clnt_dg_create (so=0xffffff00017755a0,
> svcaddr=0xffffff803e69b310, program=100000, version=4, sendsz=Variable
> "sendsz" is not available.
> )
> at /usr/src/sys/rpc/clnt_dg.c:259
> #10 0xffffffff806e97c9 in nlm_get_rpc (sa=Variable "sa" is not
> available.
> ) at /usr/src/sys/nlm/nlm_prot_impl.c:327
> #11 0xffffffff806e9d39 in nlm_host_get_rpc (host=0xffffff0001705000)
> at /usr/src/sys/nlm/nlm_prot_impl.c:1199
> #12 0xffffffff806e680f in nlm_clearlock (host=0xffffff0001705000,
> ext=0xffffff803e69b9a0, vers=4, timo=0xffffff803e69b9d0,
> retries=2147483647, vp=0xffffff004881edc8, op=2,
> fl=0xffffff803e69bac0, flags=64, svid=9336, fhlen=32,
> fh=0xffffff803e69b750,
> size=689) at /usr/src/sys/nlm/nlm_advlock.c:943
> #13 0xffffffff806e7801 in nlm_advlock_internal (vp=0xffffff004881edc8,
> id=Variable "id" is not available.
> ) at /usr/src/sys/nlm/nlm_advlock.c:355
> #14 0xffffffff806e8166 in nlm_advlock (ap=Variable "ap" is not
> available.
> ) at /usr/src/sys/nlm/nlm_advlock.c:392
> #15 0xffffffff806ced28 in nfs_advlock (ap=0xffffff803e69ba90) at
> /usr/src/sys/nfsclient/nfs_vnops.c:3153
> #16 0xffffffff804f40e2 in closef (fp=0xffffff0073716d80,
> td=0xffffff0001b50ae0) at vnode_if.h:1036
> #17 0xffffffff804f462b in kern_close (td=0xffffff0001b50ae0,
> fd=Variable "fd" is not available.
> ) at /usr/src/sys/kern/kern_descrip.c:1125
> #18 0xffffffff807ece67 in syscall (frame=0xffffff803e69bc80) at
> /usr/src/sys/amd64/amd64/trap.c:920
> #19 0xffffffff807d635b in Xfast_syscall () at
> /usr/src/sys/amd64/amd64/exception.S:339
> #20 0x00000008009c5b1c in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> 
You could try the attached patch, which contains some of the changes
in the newer versions of clnt_dg.c. (There have been many changes, so
carrying them all across isn't practical, for me at least.)

I have no way of testing this patch at this time, so all I did was
compile it, rick

> --
> mark saad | nonesuch@longcount.org
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

------=_Part_103453_1537264991.1319040255027
Content-Type: text/x-patch; name=nlmdg7.patch
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename=nlmdg7.patch

LS0tIHJwYy9jbG50X2RnLmMuc2F2CTIwMTEtMTAtMTkgMDk6Mzk6MjguMDAwMDAwMDAwIC0wNDAw
CisrKyBycGMvY2xudF9kZy5jCTIwMTEtMTAtMTkgMDk6Mzk6MzkuMDAwMDAwMDAwIC0wNDAwCkBA
IC0xMjAsOSArMTIwLDExIEBAIHN0cnVjdCBjdV9zb2NrZXQgewogCXN0cnVjdCBtdHgJCWNzX2xv
Y2s7CiAJaW50CQkJY3NfcmVmczsJLyogQ291bnQgb2YgY2xpZW50cyAqLwogCXN0cnVjdCBjdV9y
ZXF1ZXN0X2xpc3QJY3NfcGVuZGluZzsJLyogUmVxdWVzdHMgYXdhaXRpbmcgcmVwbGllcyAqLwot
CQorCWludAkJCWNzX3VwY2FsbHJlZnM7CS8qIFJlZmNudCBvZiB1cGNhbGxzIGluIHByb2cuKi8K
IH07CiAKK3N0YXRpYyB2b2lkIGNsbnRfZGdfdXBjYWxsc2RvbmUoc3RydWN0IHNvY2tldCAqLCBz
dHJ1Y3QgY3Vfc29ja2V0ICopOworCiAvKgogICogUHJpdmF0ZSBkYXRhIGtlcHQgcGVyIGNsaWVu
dCBoYW5kbGUKICAqLwpAQCAtMjc2LDYgKzI3OCw3IEBAIHJlY2hlY2tfc29ja2V0OgogCQl9CiAJ
CW10eF9pbml0KCZjcy0+Y3NfbG9jaywgImNzLT5jc19sb2NrIiwgTlVMTCwgTVRYX0RFRik7CiAJ
CWNzLT5jc19yZWZzID0gMTsKKwkJY3MtPmNzX3VwY2FsbHJlZnMgPSAwOwogCQlUQUlMUV9JTklU
KCZjcy0+Y3NfcGVuZGluZyk7CiAJCXNvLT5zb191cGNhbGxhcmcgPSBjczsKIAkJc28tPnNvX3Vw
Y2FsbCA9IGNsbnRfZGdfc291cGNhbGw7CkBAIC04MTEsMTggKzgxNCwyMyBAQCBjbG50X2RnX2Rl
c3Ryb3koQ0xJRU5UICpjbCkKIAl3aGlsZSAoY3UtPmN1X3RocmVhZHMpCiAJCW1zbGVlcChjdSwg
JmNzLT5jc19sb2NrLCAwLCAicnBjY2xvc2UiLCAwKTsKIAorCW10eF91bmxvY2soJmNzLT5jc19s
b2NrKTsJCS8qIFRvIGF2b2lkIGEgTE9SLiAqLworCVNPQ0tCVUZfTE9DSygmY3UtPmN1X3NvY2tl
dC0+c29fcmN2KTsKKwltdHhfbG9jaygmY3MtPmNzX2xvY2spOwogCWNzLT5jc19yZWZzLS07CiAJ
aWYgKGNzLT5jc19yZWZzID09IDApIHsKLQkJbXR4X2Rlc3Ryb3koJmNzLT5jc19sb2NrKTsKLQkJ
U09DS0JVRl9MT0NLKCZjdS0+Y3Vfc29ja2V0LT5zb19yY3YpOworCQltdHhfdW5sb2NrKCZjcy0+
Y3NfbG9jayk7CiAJCWN1LT5jdV9zb2NrZXQtPnNvX3VwY2FsbGFyZyA9IE5VTEw7CiAJCWN1LT5j
dV9zb2NrZXQtPnNvX3VwY2FsbCA9IE5VTEw7CiAJCWN1LT5jdV9zb2NrZXQtPnNvX3Jjdi5zYl9m
bGFncyAmPSB+U0JfVVBDQUxMOworCQljbG50X2RnX3VwY2FsbHNkb25lKGN1LT5jdV9zb2NrZXQs
IGNzKTsKIAkJU09DS0JVRl9VTkxPQ0soJmN1LT5jdV9zb2NrZXQtPnNvX3Jjdik7CisJCW10eF9k
ZXN0cm95KCZjcy0+Y3NfbG9jayk7CiAJCW1lbV9mcmVlKGNzLCBzaXplb2YoKmNzKSk7CiAJCWxh
c3Rzb2NrZXRyZWYgPSBUUlVFOwogCX0gZWxzZSB7CiAJCW10eF91bmxvY2soJmNzLT5jc19sb2Nr
KTsKKwkJU09DS0JVRl9VTkxPQ0soJmN1LT5jdV9zb2NrZXQtPnNvX3Jjdik7CiAJCWxhc3Rzb2Nr
ZXRyZWYgPSBGQUxTRTsKIAl9CiAKQEAgLTg2Myw2ICs4NzEsOSBAQCBjbG50X2RnX3NvdXBjYWxs
KHN0cnVjdCBzb2NrZXQgKnNvLCB2b2lkCiAJaW50IGVycm9yLCByY3ZmbGFnLCBmb3VuZHJlcTsK
IAl1aW50MzJfdCB4aWQ7CiAKKwltdHhfbG9jaygmY3MtPmNzX2xvY2spOworCWNzLT5jc191cGNh
bGxyZWZzKys7CisJbXR4X3VubG9jaygmY3MtPmNzX2xvY2spOwogCXVpby51aW9fcmVzaWQgPSAx
MDAwMDAwMDAwOwogCXVpby51aW9fdGQgPSBjdXJ0aHJlYWQ7CiAJZG8gewpAQCAtOTM1LDUgKzk0
NiwyMiBAQCBjbG50X2RnX3NvdXBjYWxsKHN0cnVjdCBzb2NrZXQgKnNvLCB2b2lkCiAJCWlmICgh
Zm91bmRyZXEpCiAJCQltX2ZyZWVtKG0pOwogCX0gd2hpbGUgKG0pOworCW10eF9sb2NrKCZjcy0+
Y3NfbG9jayk7CisJY3MtPmNzX3VwY2FsbHJlZnMtLTsKKwltdHhfdW5sb2NrKCZjcy0+Y3NfbG9j
ayk7CisJd2FrZXVwKCZjcy0+Y3NfdXBjYWxscmVmcyk7CiB9CiAKKy8qCisgKiBXYWl0IGZvciBh
bGwgdXBjYWxscyBpbiBwcm9ncmVzcyB0byBjb21wbGV0ZS4KKyAqLworc3RhdGljIHZvaWQKK2Ns
bnRfZGdfdXBjYWxsc2RvbmUoc3RydWN0IHNvY2tldCAqc28sIHN0cnVjdCBjdV9zb2NrZXQgKmNz
KQoreworCisJU09DS0JVRl9MT0NLX0FTU0VSVCgmc28tPnNvX3Jjdik7CisKKwl3aGlsZSAoY3Mt
PmNzX3VwY2FsbHJlZnMgPiAwKQorCQkodm9pZCkgbXNsZWVwKCZjcy0+Y3NfdXBjYWxscmVmcywg
U09DS0JVRl9NVFgoJnNvLT5zb19yY3YpLCAwLAorCQkgICAgInJwY2RndXAiLCAwKTsKK30K
------=_Part_103453_1537264991.1319040255027--

From owner-freebsd-fs@FreeBSD.ORG  Wed Oct 19 16:21:33 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8DC32106564A;
	Wed, 19 Oct 2011 16:21:33 +0000 (UTC)
	(envelope-from florian@wagner-flo.net)
Received: from umbracor.wagner-flo.net (umbracor.wagner-flo.net
	[213.165.81.202])
	by mx1.freebsd.org (Postfix) with ESMTP id 4B20F8FC16;
	Wed, 19 Oct 2011 16:21:33 +0000 (UTC)
Received: from naclador.mos32.de (ppp-188-174-59-72.dynamic.mnet-online.de
	[188.174.59.72])
	by umbracor.wagner-flo.net (Postfix) with ESMTPSA id 700D53C06C30;
	Wed, 19 Oct 2011 18:21:35 +0200 (CEST)
Date: Wed, 19 Oct 2011 18:21:30 +0200
From: Florian Wagner <florian@wagner-flo.net>
To: Andriy Gapon <avg@FreeBSD.org>
Message-ID: <20111019182130.27446750@naclador.mos32.de>
In-Reply-To: <4E9EEF45.9020404@FreeBSD.org>
References: <20111015214347.09f68e4e@naclador.mos32.de>
	<4E9ACA9F.5090308@FreeBSD.org>
	<20111019082139.1661868e@auedv3.syscomp.de>
	<4E9EEF45.9020404@FreeBSD.org>
X-Mailer: Claws Mail 3.7.9 (GTK+ 2.24.5; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=PGP-SHA1;
	boundary="Sig_/2Ss+PquXy3E6/Gv5vI3TZ8D";
	protocol="application/pgp-signature"
Cc: freebsd-fs@FreeBSD.org
Subject: Re: Extending zfsboot.c to allow selecting filesystem from
 boot.config
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Oct 2011 16:21:33 -0000

--Sig_/2Ss+PquXy3E6/Gv5vI3TZ8D
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

> [...]
>
> > The only thing I was a bit confused by is that on the boot prompt
> > only the pool and filename to be booted are printed.
>=20
> Do you mean the (gpt)zfsboot prompt?

Yes. For a boot.config with "rpool:root/something:" it prints:

  Booting from Hard Disk...
  /boot.config: rpool
  FreeBSD/x86 boot
  Default: rpool:/boot/zfsloader
  boot: <CURSOR>


Regards
Florian Wagner

--Sig_/2Ss+PquXy3E6/Gv5vI3TZ8D
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)

iEYEARECAAYFAk6e+QsACgkQLvW/2gp2pPyrVwCgh0KS8z/cfwXpHujMgy5VWHqr
OmUAoI0GZnF12/M1SJ3xPrNJRa7UAtlR
=4KjI
-----END PGP SIGNATURE-----

--Sig_/2Ss+PquXy3E6/Gv5vI3TZ8D--

From owner-freebsd-fs@FreeBSD.ORG  Thu Oct 20 05:26:01 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F04121065673;
	Thu, 20 Oct 2011 05:26:01 +0000 (UTC)
	(envelope-from linimon@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id C7DFC8FC08;
	Thu, 20 Oct 2011 05:26:01 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p9K5Q1lO007001;
	Thu, 20 Oct 2011 05:26:01 GMT
	(envelope-from linimon@freefall.freebsd.org)
Received: (from linimon@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p9K5Q1ad006997;
	Thu, 20 Oct 2011 05:26:01 GMT (envelope-from linimon)
Date: Thu, 20 Oct 2011 05:26:01 GMT
Message-Id: <201110200526.p9K5Q1ad006997@freefall.freebsd.org>
To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org
From: linimon@FreeBSD.org
Cc: 
Subject: Re: bin/161807: [patch] add option for explicitly specifying
	metadata version to geli(8)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Oct 2011 05:26:02 -0000

Old Synopsis: [patch] add option for explicitly specifying metadata version to geli
New Synopsis: [patch] add option for explicitly specifying metadata version to geli(8)

Responsible-Changed-From-To: freebsd-bugs->freebsd-fs
Responsible-Changed-By: linimon
Responsible-Changed-When: Thu Oct 20 05:25:18 UTC 2011
Responsible-Changed-Why: 
Over to maintainer(s).

http://www.freebsd.org/cgi/query-pr.cgi?pr=161807

From owner-freebsd-fs@FreeBSD.ORG  Thu Oct 20 11:55:44 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F19F2106566C
	for <freebsd-fs@freebsd.org>; Thu, 20 Oct 2011 11:55:44 +0000 (UTC)
	(envelope-from unix.co@gmail.com)
Received: from sam.nabble.com (sam.nabble.com [216.139.236.26])
	by mx1.freebsd.org (Postfix) with ESMTP id D18978FC12
	for <freebsd-fs@freebsd.org>; Thu, 20 Oct 2011 11:55:44 +0000 (UTC)
Received: from [192.168.236.26] (helo=sam.nabble.com)
	by sam.nabble.com with esmtp (Exim 4.72)
	(envelope-from <unix.co@gmail.com>) id 1RGqvq-00083f-Ii
	for freebsd-fs@freebsd.org; Thu, 20 Oct 2011 04:37:06 -0700
Date: Thu, 20 Oct 2011 04:37:06 -0700 (PDT)
From: umar <unix.co@gmail.com>
To: freebsd-fs@freebsd.org
Message-ID: <1319110626566-4921174.post@n5.nabble.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Subject: 8TB Partition Problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Oct 2011 11:55:45 -0000

Hi Member,

Recently I have buy Dell Power Vault NX3100, with 4 3 TB hardrive after
creating the raid it give me 8.xx TB partition. There is another 2 HD with
140GB attached with Raid 1. I have install FreeBSD 8.2 64bit on Raid1
partition.

After the installation I have tried to create new partition on 8TB drive
through sysinstall but its not working. Then I tried bsdlable but its also
failed below is the error message of bsdlable.

bsdlabel: disks with more than 2^32-1 sectors are not supported

Would you please help me how i can solve this problem. If freebsd is not
supported 8 TB then which Linux is supporting so I can move to Linux

Best Regards,

Umar


--
View this message in context: http://freebsd.1045724.n5.nabble.com/8TB-Partition-Problem-tp4921174p4921174.html
Sent from the freebsd-fs mailing list archive at Nabble.com.

From owner-freebsd-fs@FreeBSD.ORG  Thu Oct 20 12:01:42 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E957C10656D6
	for <freebsd-fs@freebsd.org>; Thu, 20 Oct 2011 12:01:42 +0000 (UTC)
	(envelope-from nowakpl@platinum.linux.pl)
Received: from platinum.linux.pl (platinum.edu.pl [81.161.192.4])
	by mx1.freebsd.org (Postfix) with ESMTP id A278F8FC23
	for <freebsd-fs@freebsd.org>; Thu, 20 Oct 2011 12:01:42 +0000 (UTC)
Received: by platinum.linux.pl (Postfix, from userid 87)
	id 1880447E15; Thu, 20 Oct 2011 14:01:40 +0200 (CEST)
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on platinum.linux.pl
X-Spam-Level: 
X-Spam-Status: No, score=-0.4 required=3.0 tests=ALL_TRUSTED,AWL,URI_HEX
	autolearn=disabled version=3.3.2
Received: from [172.19.191.2] (078088011125.bialystok.vectranet.pl
	[78.88.11.125])
	by platinum.linux.pl (Postfix) with ESMTPA id 0DD5F47E11
	for <freebsd-fs@freebsd.org>; Thu, 20 Oct 2011 14:01:35 +0200 (CEST)
Message-ID: <4EA00D97.7000005@platinum.linux.pl>
Date: Thu, 20 Oct 2011 14:01:27 +0200
From: Adam Nowacki <nowakpl@platinum.linux.pl>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB;
	rv:1.9.2.23) Gecko/20110920 Thunderbird/3.1.15
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
References: <1319110626566-4921174.post@n5.nabble.com>
In-Reply-To: <1319110626566-4921174.post@n5.nabble.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: 8TB Partition Problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Oct 2011 12:01:43 -0000

On 2011-10-20 13:37, umar wrote:
> Hi Member,
>
> Recently I have buy Dell Power Vault NX3100, with 4 3 TB hardrive after
> creating the raid it give me 8.xx TB partition. There is another 2 HD with
> 140GB attached with Raid 1. I have install FreeBSD 8.2 64bit on Raid1
> partition.
>
> After the installation I have tried to create new partition on 8TB drive
> through sysinstall but its not working. Then I tried bsdlable but its also
> failed below is the error message of bsdlable.
>
> bsdlabel: disks with more than 2^32-1 sectors are not supported
>
> Would you please help me how i can solve this problem. If freebsd is not
> supported 8 TB then which Linux is supporting so I can move to Linux

Use GPT, gpart create -s GPT /dev/device then add partitions with gpart, 
see 
http://www.freebsd.org/cgi/man.cgi?query=gpart&apropos=0&sektion=0&manpath=FreeBSD+8.2-RELEASE&arch=default&format=html

>
> Best Regards,
>
> Umar
>
>
> --
> View this message in context: http://freebsd.1045724.n5.nabble.com/8TB-Partition-Problem-tp4921174p4921174.html
> Sent from the freebsd-fs mailing list archive at Nabble.com.
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"


From owner-freebsd-fs@FreeBSD.ORG  Thu Oct 20 12:23:55 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 745051065673
	for <freebsd-fs@freebsd.org>; Thu, 20 Oct 2011 12:23:55 +0000 (UTC)
	(envelope-from numisemis@gmail.com)
Received: from mail-gx0-f182.google.com (mail-gx0-f182.google.com
	[209.85.161.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 347F98FC13
	for <freebsd-fs@freebsd.org>; Thu, 20 Oct 2011 12:23:54 +0000 (UTC)
Received: by ggnq2 with SMTP id q2so1845817ggn.13
	for <freebsd-fs@freebsd.org>; Thu, 20 Oct 2011 05:23:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=subject:mime-version:content-type:from:in-reply-to:date:cc
	:content-transfer-encoding:message-id:references:to:x-mailer;
	bh=PMpfrxfJVo/gZHEHGUZ3ZS0ICbxB8QkYgKAjJ3wwIPw=;
	b=etc99BVK7+jShDFsdgnurSJJfNnjdoA3KpAllnH9D3Y51e1kw09ZKG+AEEX9hpy8hG
	jdzvLK+uZ8aSx5y5rL9O1CP21DHGONroyVEBAo9r+Z5zwAzjVM4CdsWNMmzMHx7U9Ju7
	BMH6YDgB8IlzBm174rIk5Z+rITvI8jh+vlc18=
Received: by 10.223.77.77 with SMTP id f13mr10159615fak.19.1319112048967;
	Thu, 20 Oct 2011 05:00:48 -0700 (PDT)
Received: from sime-imac.logos.hr ([213.147.110.159])
	by mx.google.com with ESMTPS id y8sm15472870faj.10.2011.10.20.05.00.47
	(version=TLSv1/SSLv3 cipher=OTHER);
	Thu, 20 Oct 2011 05:00:47 -0700 (PDT)
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
From: =?iso-8859-2?Q?=A9imun_Mikecin?= <numisemis@gmail.com>
In-Reply-To: <1319110626566-4921174.post@n5.nabble.com>
Date: Thu, 20 Oct 2011 14:00:44 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <667290D4-6F62-41C4-A690-28FC11C2AD5F@gmail.com>
References: <1319110626566-4921174.post@n5.nabble.com>
To: umar <unix.co@gmail.com>
X-Mailer: Apple Mail (2.1084)
Cc: freebsd-fs <freebsd-fs@freebsd.org>
Subject: Re: 8TB Partition Problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Oct 2011 12:23:55 -0000


On 20. lis. 2011., at 13:37, umar wrote:

>=20
> After the installation I have tried to create new partition on 8TB =
drive
> through sysinstall but its not working. Then I tried bsdlable but its =
also
> failed below is the error message of bsdlable.
>=20
> bsdlabel: disks with more than 2^32-1 sectors are not supported
>=20
> Would you please help me how i can solve this problem. If freebsd is =
not
> supported 8 TB then which Linux is supporting so I can move to Linux


Disks larger than 2TB should be partitioned using GPT instead of MBR. It =
doesn't matter if you are using FreeBSD or Linux.

Instead of using sysinstall and/or bsdlabel for partitioning, you should =
partition your disk using GPT, see gpart(8).


From owner-freebsd-fs@FreeBSD.ORG  Thu Oct 20 15:09:11 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E2877106566B
	for <freebsd-fs@freebsd.org>; Thu, 20 Oct 2011 15:09:10 +0000 (UTC)
	(envelope-from bgold@simons-rock.edu)
Received: from hedwig.simons-rock.edu (hedwig.simons-rock.edu [208.81.88.14])
	by mx1.freebsd.org (Postfix) with ESMTP id AAC5B8FC17
	for <freebsd-fs@freebsd.org>; Thu, 20 Oct 2011 15:09:10 +0000 (UTC)
Received: from hp6000new (behemoth.simons-rock.edu [10.30.2.44])
	by hedwig.simons-rock.edu (Postfix) with ESMTP id 833402BB35B
	for <freebsd-fs@freebsd.org>; Thu, 20 Oct 2011 10:40:37 -0400 (EDT)
From: "Brian Gold" <bgold@simons-rock.edu>
To: <freebsd-fs@freebsd.org>
Date: Thu, 20 Oct 2011 10:40:33 -0400
Message-ID: <05b701cc8f36$3934cf70$ab9e6e50$@simons-rock.edu>
MIME-Version: 1.0
X-Mailer: Microsoft Outlook 14.0
Thread-Index: AcyPNjkccBdkHrdhSMKChgRJOdOxbQ==
Content-Language: en-us
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: system hangs using zfs28 after enabiling nfsshare on a new dataset
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Oct 2011 15:09:11 -0000

Hello all,

 
I've been using the 8.2-RELEASE w/ ZFSv28 from mfsBSD (http://mfsbsd.vx.sk/)
for a while now without any issues. Today I decided to try out sharenfs to
see how it works. I got everything configured and successfully shared a new
dataset (backup/vmimages) to another system on my network. After an hour or
so however, the nfs host locked up. I tried rebooting it and it hung shortly
after mounting my zfsroot pool (see screenshot
http://i.imgur.com/qjJO2.jpg). I rebooted into single-user mode and
attempted to run a "zfs set sharenfs=off backup/vmimages", but that hung the
system as well. I rebooted again into single-user mode and ran "zfs destroy
backup/vmimages" (nothing important on there yet) and even that hangs the
system. Any thoughts as to what I can do to troubleshoot this further?


From owner-freebsd-fs@FreeBSD.ORG  Thu Oct 20 22:44:42 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5C68C1065674;
	Thu, 20 Oct 2011 22:44:42 +0000 (UTC)
	(envelope-from rmacklem@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 33DC68FC18;
	Thu, 20 Oct 2011 22:44:42 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p9KMigAl001888;
	Thu, 20 Oct 2011 22:44:42 GMT
	(envelope-from rmacklem@freefall.freebsd.org)
Received: (from rmacklem@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p9KMifS1001884;
	Thu, 20 Oct 2011 22:44:41 GMT (envelope-from rmacklem)
Date: Thu, 20 Oct 2011 22:44:41 GMT
Message-Id: <201110202244.p9KMifS1001884@freefall.freebsd.org>
To: niakrisn@gmail.com, rmacklem@FreeBSD.org, freebsd-fs@FreeBSD.org,
	rmacklem@FreeBSD.org
From: rmacklem@FreeBSD.org
Cc: 
Subject: Re: kern/156168: [nfs] [panic] Kernel panic under concurrent access
	over NFS
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Oct 2011 22:44:42 -0000

Synopsis: [nfs] [panic] Kernel panic under concurrent access over NFS

State-Changed-From-To: open->feedback
State-Changed-By: rmacklem
State-Changed-When: Thu Oct 20 22:42:03 UTC 2011
State-Changed-Why: 

I have sent the person that reported this a patch to test
and am waiting for feedback. I've taken responsibility for this.


Responsible-Changed-From-To: freebsd-fs->rmacklem
Responsible-Changed-By: rmacklem
Responsible-Changed-When: Thu Oct 20 22:42:03 UTC 2011
Responsible-Changed-Why: 

I have sent the person that reported this a patch for testing
and will update the status when I hear back from them.

http://www.freebsd.org/cgi/query-pr.cgi?pr=156168

From owner-freebsd-fs@FreeBSD.ORG  Fri Oct 21 06:49:46 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E7B131065670
	for <freebsd-fs@freebsd.org>; Fri, 21 Oct 2011 06:49:46 +0000 (UTC)
	(envelope-from unix.co@gmail.com)
Received: from sam.nabble.com (sam.nabble.com [216.139.236.26])
	by mx1.freebsd.org (Postfix) with ESMTP id C463A8FC08
	for <freebsd-fs@freebsd.org>; Fri, 21 Oct 2011 06:49:46 +0000 (UTC)
Received: from [192.168.236.26] (helo=sam.nabble.com)
	by sam.nabble.com with esmtp (Exim 4.72)
	(envelope-from <unix.co@gmail.com>) id 1RH8vK-0000Xd-1e
	for freebsd-fs@freebsd.org; Thu, 20 Oct 2011 23:49:46 -0700
Date: Thu, 20 Oct 2011 23:49:46 -0700 (PDT)
From: umar <unix.co@gmail.com>
To: freebsd-fs@freebsd.org
Message-ID: <1319179786044-4923804.post@n5.nabble.com>
In-Reply-To: <4EA00D97.7000005@platinum.linux.pl>
References: <1319110626566-4921174.post@n5.nabble.com>
	<4EA00D97.7000005@platinum.linux.pl>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Subject: Re: 8TB Partition Problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Oct 2011 06:49:47 -0000

Hi,

Thanks it works.

Best Regards,

Umar

--
View this message in context: http://freebsd.1045724.n5.nabble.com/8TB-Partition-Problem-tp4921174p4923804.html
Sent from the freebsd-fs mailing list archive at Nabble.com.

From owner-freebsd-fs@FreeBSD.ORG  Fri Oct 21 09:19:36 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2AD251065670
	for <freebsd-fs@freebsd.org>; Fri, 21 Oct 2011 09:19:36 +0000 (UTC)
	(envelope-from shuriku@shurik.kiev.ua)
Received: from it-profi.org.ua (graal.shurik.kiev.ua [193.239.74.7])
	by mx1.freebsd.org (Postfix) with ESMTP id CBC618FC12
	for <freebsd-fs@freebsd.org>; Fri, 21 Oct 2011 09:19:35 +0000 (UTC)
Received: from [93.183.237.246] (helo=lenovo-b570.it-profi.org.ua)
	by it-profi.org.ua with esmtpsa (TLSv1:CAMELLIA256-SHA:256)
	(Exim 4.76 (FreeBSD)) (envelope-from <shuriku@shurik.kiev.ua>)
	id 1RHAYu-0003IS-5F
	for freebsd-fs@freebsd.org; Fri, 21 Oct 2011 11:34:44 +0300
Message-ID: <4EA12EC0.2040907@shurik.kiev.ua>
Date: Fri, 21 Oct 2011 11:35:12 +0300
From: Alexandr <shuriku@shurik.kiev.ua>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:6.0.2) Gecko/20110926 Thunderbird/6.0.2
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: -1.0 (-)
X-Spam-Report: Spam detection software,
	running on the system "graal.it-profi.org.ua", has
	identified this incoming email as possible spam. The original message
	has been attached to this so you can view it (if it isn't spam) or
	label similar future email.  If you have any questions, see
	The administrator of that system for details.
	Content preview: Hello! A few weeks ago I have migrated to a new laptop
	Lenovo
	B570 and I cannot boot from my HDD. At the start of the boot process
	HDD
	led blinks some times and boot continues from Network PXE. Bootloader
	from 8-RELEASE, 9-STABLE,
	and 10-CURRENT connot solve this issue. I am using a
	GPT scheme on my laptop: [...] 
	Content analysis details:   (-1.0 points, 5.0 required)
	pts rule name              description
	---- ----------------------
	--------------------------------------------------
	-1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP
X-SA-Exim-Connect-IP: 93.183.237.246
X-SA-Exim-Mail-From: shuriku@shurik.kiev.ua
X-SA-Exim-Scanned: No (on it-profi.org.ua); SAEximRunCond expanded to false
Subject: cannot boot from HDD
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Oct 2011 09:19:36 -0000

Hello!

A few weeks ago I have migrated to a new laptop Lenovo B570 and I cannot 
boot from my HDD. At the start of the boot process HDD led blinks some 
times and boot continues from Network PXE. Bootloader from 8-RELEASE, 
9-STABLE, and 10-CURRENT connot solve this issue.
I am using a GPT scheme on my laptop:

lenovo-b570# gpart show
=>       34  976773101  ada0  GPT  (465G)
          34        128     1  freebsd-boot  (64k)
         162  976772973     2  freebsd-zfs  (465G)

Now to boot my system I am using boottable usb-flash with Grub2 
installed. Choosing boot from HDD mbr starts boot my system.
I discussed this problem in our local mailing list, but no success.
The only way I see to resolve my problem at this time is to use such scheme:

bios-boot (with Grub2)
freebsd-boot
freebsd-zfs

From owner-freebsd-fs@FreeBSD.ORG  Fri Oct 21 15:38:47 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4AFC5106566B;
	Fri, 21 Oct 2011 15:38:47 +0000 (UTC)
	(envelope-from 000.fbsd@quip.cz)
Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4])
	by mx1.freebsd.org (Postfix) with ESMTP id 0B0D88FC18;
	Fri, 21 Oct 2011 15:38:46 +0000 (UTC)
Received: from elsa.codelab.cz (localhost [127.0.0.1])
	by elsa.codelab.cz (Postfix) with ESMTP id 36DEF28424;
	Fri, 21 Oct 2011 17:38:45 +0200 (CEST)
Received: from [192.168.1.2] (ip-86-49-61-235.net.upcbroadband.cz
	[86.49.61.235])
	(using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by elsa.codelab.cz (Postfix) with ESMTPSA id DF63D28422;
	Fri, 21 Oct 2011 17:38:43 +0200 (CEST)
Message-ID: <4EA19203.5050503@quip.cz>
Date: Fri, 21 Oct 2011 17:38:43 +0200
From: Miroslav Lachman <000.fbsd@quip.cz>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
	rv:1.9.1.19) Gecko/20110420 Lightning/1.0b1 SeaMonkey/2.0.14
MIME-Version: 1.0
To: Ivan Voras <ivoras@freebsd.org>
References: <4E97FEDD.7060205@quip.cz> <j7938v$66s$1@dough.gmane.org>
In-Reply-To: <j7938v$66s$1@dough.gmane.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
Subject: Re: dirhash and dynamic memory allocation
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Oct 2011 15:38:47 -0000

Hi, I am back on this topic...

Ivan Voras wrote:
> On 14/10/2011 11:20, Miroslav Lachman wrote:
>> Hi all,
>>
>> I tried some tuning of dirhash on our servers and after googlig a bit, I
>> found an old GSoC project wiki page about Dynamic Memory Allocation for
>> Dirhash: http://wiki.freebsd.org/DirhashDynamicMemory
>> Is there any reason not to use it / not commit it to HEAD?
>
> AFAIK it's sort-of already present. In 8-stable and recent kernels you
> can give huge amounts of memory to dirhash via vfs.ufs.dirhash_maxmem
> (but except in really large edge cases I don't think you *need* more
> than 32 MB), and the kernel will scale-down or free the memory if not
> needed.
>
> In effect, vfs.ufs.dirhash_maxmem is the upper limit - the kernel will
> use less and will free the allocated memory in low memory situations
> (which I've tried and it works).

So the current behavior is that on 7.3+ and 8.x we have smaller average 
dirhash buffer (by default) than it was initialy 10 years ago. Because 
it starts as 2MB fixed size and now we have 2MB max, which is lowered in 
low mem situations... and sometimes it is set to 0MB!

I caught this 2 days ago:

root@rip ~/# sysctl vfs.ufs
vfs.ufs.dirhash_reclaimage: 5
vfs.ufs.dirhash_lowmemcount: 36953
vfs.ufs.dirhash_docheck: 0
vfs.ufs.dirhash_mem: 0
vfs.ufs.dirhash_maxmem: 8388608
vfs.ufs.dirhash_minsize: 2560

I set maxmem to 8MB in sysctl.conf to increase performance and 
dirhash_mem 0 is really bad surprise!

I am worrying about bad performance in situation where dirhash is 
emptied in situations, where server is already running at maximum 
performance (there is some memory hungry process and system can start 
swapping to disk + dirhash is efectively disabled)

I found a PR kern/145246
http://www.freebsd.org/cgi/query-pr.cgi?pr=145246

Is it possible to add some dirhash_minmem limit to not clear all the 
dirhash memory?
So I can set dirhash_minmem=2MB dirhash_maxmem=16MB and then dirhash_mem 
will be allways between these two limits?

>> And second question - is there any negative impact with higher
>> vfs.ufs.dirhash_maxmem? It stil defaults to 2MB (on FreeBSD 8.2) after
>
> Not that I know of.
>
>> 10 years, but I think we all are using bigger FS in these days with lot
>> of files and directories and 2MB is not enough.
>
> AFAIK I've changed it to autotune so it's configured to approximately 4
> MB on a 4 GB machine (and scales up) in 9.

I didn't tried 9 yet. Does it mean dirhash_maxmem is initially set to 
approximately 1% of physical RAM and then it can be set higher by sysctl 
as in older versions?

Miroslav Lachman

From owner-freebsd-fs@FreeBSD.ORG  Fri Oct 21 16:20:27 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D881B1065673
	for <freebsd-fs@freebsd.org>; Fri, 21 Oct 2011 16:20:27 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta13.westchester.pa.mail.comcast.net
	(qmta13.westchester.pa.mail.comcast.net [76.96.59.243])
	by mx1.freebsd.org (Postfix) with ESMTP id 852EF8FC13
	for <freebsd-fs@freebsd.org>; Fri, 21 Oct 2011 16:20:27 +0000 (UTC)
Received: from omta19.westchester.pa.mail.comcast.net ([76.96.62.98])
	by qmta13.westchester.pa.mail.comcast.net with comcast
	id nU2J1h00927AodY5DULT8E; Fri, 21 Oct 2011 16:20:27 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta19.westchester.pa.mail.comcast.net with comcast
	id nULS1h00k1t3BNj3fULTcD; Fri, 21 Oct 2011 16:20:27 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 30DE3102C1C; Fri, 21 Oct 2011 09:20:25 -0700 (PDT)
Date: Fri, 21 Oct 2011 09:20:25 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Miroslav Lachman <000.fbsd@quip.cz>
Message-ID: <20111021162025.GA89885@icarus.home.lan>
References: <4E97FEDD.7060205@quip.cz> <j7938v$66s$1@dough.gmane.org>
	<4EA19203.5050503@quip.cz>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4EA19203.5050503@quip.cz>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org, Ivan Voras <ivoras@freebsd.org>
Subject: Re: dirhash and dynamic memory allocation
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Oct 2011 16:20:27 -0000

On Fri, Oct 21, 2011 at 05:38:43PM +0200, Miroslav Lachman wrote:
> Hi, I am back on this topic...
> 
> Ivan Voras wrote:
> >On 14/10/2011 11:20, Miroslav Lachman wrote:
> >>Hi all,
> >>
> >>I tried some tuning of dirhash on our servers and after googlig a bit, I
> >>found an old GSoC project wiki page about Dynamic Memory Allocation for
> >>Dirhash: http://wiki.freebsd.org/DirhashDynamicMemory
> >>Is there any reason not to use it / not commit it to HEAD?
> >
> >AFAIK it's sort-of already present. In 8-stable and recent kernels you
> >can give huge amounts of memory to dirhash via vfs.ufs.dirhash_maxmem
> >(but except in really large edge cases I don't think you *need* more
> >than 32 MB), and the kernel will scale-down or free the memory if not
> >needed.
> >
> >In effect, vfs.ufs.dirhash_maxmem is the upper limit - the kernel will
> >use less and will free the allocated memory in low memory situations
> >(which I've tried and it works).
> 
> So the current behavior is that on 7.3+ and 8.x we have smaller
> average dirhash buffer (by default) than it was initialy 10 years
> ago. Because it starts as 2MB fixed size and now we have 2MB max,
> which is lowered in low mem situations... and sometimes it is set to
> 0MB!
> 
> I caught this 2 days ago:
> 
> root@rip ~/# sysctl vfs.ufs
> vfs.ufs.dirhash_reclaimage: 5
> vfs.ufs.dirhash_lowmemcount: 36953
> vfs.ufs.dirhash_docheck: 0
> vfs.ufs.dirhash_mem: 0
> vfs.ufs.dirhash_maxmem: 8388608
> vfs.ufs.dirhash_minsize: 2560
> 
> I set maxmem to 8MB in sysctl.conf to increase performance and
> dirhash_mem 0 is really bad surprise!

Actually, the "bad surprise" is dirhash_lowmemcount of 36953.  You
increasing dirhash_maxmem is fine -- what you're seeing is that your
machine keeps running out of memory, or that your directories are filled
with so many files that you're exhausting the dirhash repetitively.

I'm going to be blunt and just ask it: why does that happen?  Or do you
have a filesystem that has an absurdly high number of files in a single
directory?  If the former, ignore the next paragraph

I've harped on this before on the mailing list: one of the first things
I learned as a system administrator was that you Do Not(tm) fill
directories with tens of thousands of files.  Split them up into
subdirs.  Even caching daemons (squid, etc.) support this kind of thing;
filename "aj1j11hsfkqXaj21" should really be aj/1j/11hsfkqXaj21.  You
get the idea.  DNS/BIND administrators of systems which have tens of
thousands of domains are quite familiar with this scenario too.

> I am worrying about bad performance in situation where dirhash is
> emptied in situations, where server is already running at maximum
> performance (there is some memory hungry process and system can
> start swapping to disk + dirhash is efectively disabled)
>
> I found a PR kern/145246
> http://www.freebsd.org/cgi/query-pr.cgi?pr=145246
> 
> Is it possible to add some dirhash_minmem limit to not clear all the
> dirhash memory?
> So I can set dirhash_minmem=2MB dirhash_maxmem=16MB and then
> dirhash_mem will be allways between these two limits?

dirhash shouldn't be "disabled", it's that memory pressure from other
things has priority over the dirhash, but I understand what you mean.
This is quite evident from dirhash_lowmemcount being so high.

I understand what you want, and maybe there is a way to get what you
want (with little effort), but I am strongly inclined to say you need to
figure out on your system what is causing such memory pressure and solve
that.  Honest: try to solve the real problem rather than dancing around
it.  If you have a process that skyrockets in RSS/RES usage due to a
memory leak or out-of-control design (such as a daemonised perl script
which blindly uses .= to append data to a scalar, or blindly keeps
appending data to the end of a list), then fix that problem.

Basically I'm trying to say that it shouldn't be the responsibility of
dirhash to "work around" other problems happening on the system that
diminish or exhaust available memory.  You end up with a kernel design
that has tons of one-offs in it and that does nothing but bite you in
the butt down the road.  (Linux has been through this many times over.)
 
-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Fri Oct 21 19:11:13 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2EC9F106566C;
	Fri, 21 Oct 2011 19:11:13 +0000 (UTC)
	(envelope-from 000.fbsd@quip.cz)
Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4])
	by mx1.freebsd.org (Postfix) with ESMTP id A14E68FC21;
	Fri, 21 Oct 2011 19:11:12 +0000 (UTC)
Received: from elsa.codelab.cz (localhost [127.0.0.1])
	by elsa.codelab.cz (Postfix) with ESMTP id 897D128424;
	Fri, 21 Oct 2011 21:11:10 +0200 (CEST)
Received: from [192.168.1.2] (ip-86-49-61-235.net.upcbroadband.cz
	[86.49.61.235])
	(using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by elsa.codelab.cz (Postfix) with ESMTPSA id 9438428423;
	Fri, 21 Oct 2011 21:11:08 +0200 (CEST)
Message-ID: <4EA1C3CC.3090500@quip.cz>
Date: Fri, 21 Oct 2011 21:11:08 +0200
From: Miroslav Lachman <000.fbsd@quip.cz>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
	rv:1.9.1.19) Gecko/20110420 Lightning/1.0b1 SeaMonkey/2.0.14
MIME-Version: 1.0
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
References: <4E97FEDD.7060205@quip.cz> <j7938v$66s$1@dough.gmane.org>
	<4EA19203.5050503@quip.cz> <20111021162025.GA89885@icarus.home.lan>
In-Reply-To: <20111021162025.GA89885@icarus.home.lan>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org, Ivan Voras <ivoras@freebsd.org>
Subject: Re: dirhash and dynamic memory allocation
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Oct 2011 19:11:13 -0000

Jeremy Chadwick wrote:
> On Fri, Oct 21, 2011 at 05:38:43PM +0200, Miroslav Lachman wrote:
>> Hi, I am back on this topic...
>>
>> Ivan Voras wrote:
>>> On 14/10/2011 11:20, Miroslav Lachman wrote:
>>>> Hi all,
>>>>
>>>> I tried some tuning of dirhash on our servers and after googlig a bit, I
>>>> found an old GSoC project wiki page about Dynamic Memory Allocation for
>>>> Dirhash: http://wiki.freebsd.org/DirhashDynamicMemory
>>>> Is there any reason not to use it / not commit it to HEAD?
>>>
>>> AFAIK it's sort-of already present. In 8-stable and recent kernels you
>>> can give huge amounts of memory to dirhash via vfs.ufs.dirhash_maxmem
>>> (but except in really large edge cases I don't think you *need* more
>>> than 32 MB), and the kernel will scale-down or free the memory if not
>>> needed.
>>>
>>> In effect, vfs.ufs.dirhash_maxmem is the upper limit - the kernel will
>>> use less and will free the allocated memory in low memory situations
>>> (which I've tried and it works).
>>
>> So the current behavior is that on 7.3+ and 8.x we have smaller
>> average dirhash buffer (by default) than it was initialy 10 years
>> ago. Because it starts as 2MB fixed size and now we have 2MB max,
>> which is lowered in low mem situations... and sometimes it is set to
>> 0MB!
>>
>> I caught this 2 days ago:
>>
>> root@rip ~/# sysctl vfs.ufs
>> vfs.ufs.dirhash_reclaimage: 5
>> vfs.ufs.dirhash_lowmemcount: 36953
>> vfs.ufs.dirhash_docheck: 0
>> vfs.ufs.dirhash_mem: 0
>> vfs.ufs.dirhash_maxmem: 8388608
>> vfs.ufs.dirhash_minsize: 2560
>>
>> I set maxmem to 8MB in sysctl.conf to increase performance and
>> dirhash_mem 0 is really bad surprise!
>
> Actually, the "bad surprise" is dirhash_lowmemcount of 36953.  You
> increasing dirhash_maxmem is fine -- what you're seeing is that your
> machine keeps running out of memory, or that your directories are filled
> with so many files that you're exhausting the dirhash repetitively.
>
> I'm going to be blunt and just ask it: why does that happen?  Or do you
> have a filesystem that has an absurdly high number of files in a single
> directory?  If the former, ignore the next paragraph

There are not absurdly high number of files in a single directory, 
because I know this potential problem and I am fighting against it with 
webapp developers. But I see similar lowmemcount on almost all UFS based 
servers. Most of them are for webhosting (running OpenSource or 
proprietary CMS, so the most content is in MySQL). Many of our servers 
have long uptime (about or more than year), so the lowmemcount numbers 
are higher on them. Webservers are hosting about 100-150 websites.

Examples from 4 of our servers:

vfs.ufs.dirhash_lowmemcount: 45295
up 39 days

vfs.ufs.dirhash_lowmemcount: 164782
up 419 days

vfs.ufs.dirhash_lowmemcount: 391452
up 102 days

vfs.ufs.dirhash_lowmemcount: 633202
up 417 days

Only few of our servers have lowmemcount lower than 1000 (but stil 
higher than 500)

One example is server with jails, where UFS is used only for host 
system, and jails are on ZFS.

This server has 4GB of RAM and 362MB used swap space:

vfs.ufs.dirhash_lowmemcount: 936
up 284 days

> I've harped on this before on the mailing list: one of the first things
> I learned as a system administrator was that you Do Not(tm) fill
> directories with tens of thousands of files.  Split them up into
> subdirs.  Even caching daemons (squid, etc.) support this kind of thing;
> filename "aj1j11hsfkqXaj21" should really be aj/1j/11hsfkqXaj21.  You
> get the idea.  DNS/BIND administrators of systems which have tens of
> thousands of domains are quite familiar with this scenario too.
>
>> I am worrying about bad performance in situation where dirhash is
>> emptied in situations, where server is already running at maximum
>> performance (there is some memory hungry process and system can
>> start swapping to disk + dirhash is efectively disabled)
>>
>> I found a PR kern/145246
>> http://www.freebsd.org/cgi/query-pr.cgi?pr=145246
>>
>> Is it possible to add some dirhash_minmem limit to not clear all the
>> dirhash memory?
>> So I can set dirhash_minmem=2MB dirhash_maxmem=16MB and then
>> dirhash_mem will be allways between these two limits?
>
> dirhash shouldn't be "disabled", it's that memory pressure from other
> things has priority over the dirhash, but I understand what you mean.
> This is quite evident from dirhash_lowmemcount being so high.
>
> I understand what you want, and maybe there is a way to get what you
> want (with little effort), but I am strongly inclined to say you need to
> figure out on your system what is causing such memory pressure and solve
> that.  Honest: try to solve the real problem rather than dancing around
> it.  If you have a process that skyrockets in RSS/RES usage due to a
> memory leak or out-of-control design (such as a daemonised perl script
> which blindly uses .= to append data to a scalar, or blindly keeps
> appending data to the end of a list), then fix that problem.

As the servers are running 3rd party apps (customer's websites), it is 
out of my control to fix issues with PHP CMS etc. So low memory fix "is 
easy" - buy and add more RAM.

> Basically I'm trying to say that it shouldn't be the responsibility of
> dirhash to "work around" other problems happening on the system that
> diminish or exhaust available memory.  You end up with a kernel design
> that has tons of one-offs in it and that does nothing but bite you in
> the butt down the road.  (Linux has been through this many times over.)

You are partially right. But dirhash lowmemhook seems too sensitive to 
me. I see high lowmemcount numbers on systems with almost empty swap. 
(few kB in swap, not MBs) That's why I am looking for dirhash_minmem.

Miroslav Lachman


From owner-freebsd-fs@FreeBSD.ORG  Fri Oct 21 20:31:54 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 63BDF1065675
	for <freebsd-fs@freebsd.org>; Fri, 21 Oct 2011 20:31:54 +0000 (UTC)
	(envelope-from subbsd@gmail.com)
Received: from mail-vx0-f182.google.com (mail-vx0-f182.google.com
	[209.85.220.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 25EE58FC1E
	for <freebsd-fs@freebsd.org>; Fri, 21 Oct 2011 20:31:53 +0000 (UTC)
Received: by vcbfo13 with SMTP id fo13so5686429vcb.13
	for <freebsd-fs@freebsd.org>; Fri, 21 Oct 2011 13:31:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:date:message-id:subject:from:to:cc:content-type;
	bh=Q4CXmt28iRppaeiBMeKXq1KIWNbC2C5CC0in4Cy3T9M=;
	b=iHnQqk2VjZcJbesFWFX3DCqoaxkWznOU92Ixa1jmfffDc+SqRlJUnhykuZlZ2E3hN5
	P+eKEe/i6Vs+kMDSnwK0QCIIHSknaKTh4Sr1o2n10i9ABk9Mn7ogAfm6z6aVPnZ0oQcP
	LE+nXroyOlekbp5ImNJlXABTJbVGZLc7u7KqE=
MIME-Version: 1.0
Received: by 10.220.106.206 with SMTP id y14mr1164604vco.109.1319227735002;
	Fri, 21 Oct 2011 13:08:55 -0700 (PDT)
Received: by 10.220.160.197 with HTTP; Fri, 21 Oct 2011 13:08:54 -0700 (PDT)
Date: Sat, 22 Oct 2011 00:08:54 +0400
Message-ID: <CAFt_eMqJVuzjzcAf_4Hdxhu2cLqPTY+ww==GuVH1AE7Obs2S6Q@mail.gmail.com>
From: Subbsd <subbsd@gmail.com>
To: freebsd-stable@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-fs@freebsd.org
Subject: VFS problem with ?fcntl SETLK? and nullfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Oct 2011 20:31:54 -0000

Hi

I found a bad issue in FreeBSD mounts nullfs file system, which may
appear in the random.
Initially, I get problems on FreeBSD-current on the host that have a
large number JAIL at the time when they start. Handbook scenario:

1) have readonly base   (for example /usr/jails/base)
2) have write area for jail personal data (for example:
/usr/jails/j1data/{home,var,local,...})
3) mount  RO base to new jail location, then mount RW part data above RO

In some cases, i watched the freeze of the system when working nullfs
mount, but could not find a reason.

On a test environment I have tried to simulate mount_nullfs with
different types of actions by the source directory:

- through dd(1) to make an huge oveload by read - does not affect
- through dd(1) to make an huge overload by write - does not affect
- through script to delete, create random-files in large numbers -
does not affect

but now I can easily with a 100% guarantee show the problem - it is
easily obtained by working with "svn cleanup" action.
For example on the directory /usr/src obtained from SVN. If start in /usr/src
svn cleanup
and at the same time try to mount_nullfs the problem appears.
As far as I can see, cleanup makes frequent lock files. It seems to
me, who some of the lock is simply not true and is inherited by a
deadlock.

I wrote sample scripts simulating the problem. I did a rotation
mount-ro + mount-rw specifically - is the repetition of the way
described in the handbook section of jail.
Since the problem can appear in random moment, I made an infinite
loop. But I am getting the problem is usually the first-pass. Here is
it:

-------/cut/-----
#!/bin/sh
SRCROOT="/usr/src"
DSTROOT="/usr/nullfstest"
ITER=`seq 100`
MOUNTO=`find ${SRCROOT} -type d -maxdepth 1 -exec basename {} \;`

[ -d "${DSTROOT}" ] || mkdir $DSTROOT

mount_subdir()
{
for mto in ${MOUNTO}; do
    if [ -d "${1}/$mto" ]; then
    mount -orw -t nullfs /bin ${1}/${mto}
    fi
done
}

cd ${SRCROOT}

while [ 1 ]; do
 echo "Mount phase"
 lockf -s -t0 /tmp/svn.lock svn cleanup &

 for iter in $ITER; do
   DST="${DSTROOT}/${iter}"
   [ -d "${DST}" ] || mkdir ${DST}
   mount -oro -t nullfs ${SRCROOT} ${DST}
   mount_subdir ${DST}
 done

echo "Unmount phase"
mount -t nullfs |awk {'printf "umount -f "$3"\n"'} |sh
done
-------/end of cut/-----

Last syscall I can see this svn cleanup is:
fcntl(3,F_SETLK,0x7fffffffc9b0)

where 3 - fd of some \.svn/file.

looks like in action this way - the system (kernel) works. but if the
process or your session will affect an action in the source directory
(in this example - /usr/src), for example:

cd /usr/src
fstat /usr/src/*
ls /usr/src/

- Get filesystem deadlock. In addition, the system in this state does
not reboot without help - system do not return from free buffer to
storage stage.

in FreeBSD 9.0 RC1 bug exists.
PS: An important detail - I could not get the problem on FreeBSD
running under a virtual machine (VirtualBox) - maybe due to the tick /
hz.kern issue?
PS2: what file system - does not matter. I get the problem on ZFS as
well as for UFS

Please check this informatio. it seems that this is serious

Thanks.

From owner-freebsd-fs@FreeBSD.ORG  Fri Oct 21 20:40:16 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 85076106566B
	for <freebsd-fs@hub.freebsd.org>; Fri, 21 Oct 2011 20:40:16 +0000 (UTC)
	(envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 5B0748FC12
	for <freebsd-fs@hub.freebsd.org>; Fri, 21 Oct 2011 20:40:16 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p9LKeGYd070338
	for <freebsd-fs@freefall.freebsd.org>; Fri, 21 Oct 2011 20:40:16 GMT
	(envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p9LKeGPG070337;
	Fri, 21 Oct 2011 20:40:16 GMT (envelope-from gnats)
Date: Fri, 21 Oct 2011 20:40:16 GMT
Message-Id: <201110212040.p9LKeGPG070337@freefall.freebsd.org>
To: freebsd-fs@FreeBSD.org
From: Robert Millan <rmh@freebsd.org>
Cc: 
Subject: Re: kern/150207: zpool(1): zpool import -d /dev tries to open weird
	devices
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: Robert Millan <rmh@freebsd.org>
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Oct 2011 20:40:16 -0000

The following reply was made to PR kern/150207; it has been noted by GNATS.

From: Robert Millan <rmh@freebsd.org>
To: bug-followup@FreeBSD.org, aurelien@aurel32.net
Cc: pjd@freebsd.org
Subject: Re: kern/150207: zpool(1): zpool import -d /dev tries to open weird devices
Date: Fri, 21 Oct 2011 22:12:38 +0200

 This might have been fixed by pjd@ in r219089. A quick look in the
 code doesn't show traces of this problem.

From owner-freebsd-fs@FreeBSD.ORG  Sat Oct 22 02:16:28 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx2.freebsd.org (mx2.freebsd.org [IPv6:2001:4f8:fff6::35])
	by hub.freebsd.org (Postfix) with ESMTP id 39F8F106566C;
	Sat, 22 Oct 2011 02:16:28 +0000 (UTC)
	(envelope-from dougb@FreeBSD.org)
Received: from 172-17-198-245.globalsuite.net (hub.freebsd.org
	[IPv6:2001:4f8:fff6::36])
	by mx2.freebsd.org (Postfix) with ESMTP id B5CEA14E61A;
	Sat, 22 Oct 2011 02:16:27 +0000 (UTC)
Message-ID: <4EA2277B.5080306@FreeBSD.org>
Date: Fri, 21 Oct 2011 19:16:27 -0700
From: Doug Barton <dougb@FreeBSD.org>
Organization: http://SupersetSolutions.com/
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:7.0.1) Gecko/20111001 Thunderbird/7.0.1
MIME-Version: 1.0
To: Miroslav Lachman <000.fbsd@quip.cz>
References: <4E97FEDD.7060205@quip.cz> <j7938v$66s$1@dough.gmane.org>
	<4EA19203.5050503@quip.cz>
In-Reply-To: <4EA19203.5050503@quip.cz>
X-Enigmail-Version: undefined
OpenPGP: id=1A1ABC84
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org, Ivan Voras <ivoras@freebsd.org>
Subject: Re: dirhash and dynamic memory allocation
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 22 Oct 2011 02:16:28 -0000

On 10/21/2011 08:38, Miroslav Lachman wrote:
> Hi, I am back on this topic...
> 
> Ivan Voras wrote:
>> On 14/10/2011 11:20, Miroslav Lachman wrote:
>>> Hi all,
>>>
>>> I tried some tuning of dirhash on our servers and after googlig a bit, I
>>> found an old GSoC project wiki page about Dynamic Memory Allocation for
>>> Dirhash: http://wiki.freebsd.org/DirhashDynamicMemory
>>> Is there any reason not to use it / not commit it to HEAD?
>>
>> AFAIK it's sort-of already present. In 8-stable and recent kernels you
>> can give huge amounts of memory to dirhash via vfs.ufs.dirhash_maxmem
>> (but except in really large edge cases I don't think you *need* more
>> than 32 MB), and the kernel will scale-down or free the memory if not
>> needed.
>>
>> In effect, vfs.ufs.dirhash_maxmem is the upper limit - the kernel will
>> use less and will free the allocated memory in low memory situations
>> (which I've tried and it works).
> 
> So the current behavior is that on 7.3+ and 8.x we have smaller average
> dirhash buffer (by default) than it was initialy 10 years ago. Because
> it starts as 2MB fixed size and now we have 2MB max, which is lowered in
> low mem situations... and sometimes it is set to 0MB!
> 
> I caught this 2 days ago:
> 
> root@rip ~/# sysctl vfs.ufs
> vfs.ufs.dirhash_reclaimage: 5
> vfs.ufs.dirhash_lowmemcount: 36953
> vfs.ufs.dirhash_docheck: 0
> vfs.ufs.dirhash_mem: 0
> vfs.ufs.dirhash_maxmem: 8388608
> vfs.ufs.dirhash_minsize: 2560
> 
> I set maxmem to 8MB in sysctl.conf to increase performance and
> dirhash_mem 0 is really bad surprise!
> 
> I am worrying about bad performance in situation where dirhash is
> emptied in situations, where server is already running at maximum
> performance (there is some memory hungry process and system can start
> swapping to disk + dirhash is efectively disabled)
> 
> I found a PR kern/145246
> http://www.freebsd.org/cgi/query-pr.cgi?pr=145246
> 
> Is it possible to add some dirhash_minmem limit to not clear all the
> dirhash memory?
> So I can set dirhash_minmem=2MB dirhash_maxmem=16MB and then dirhash_mem
> will be allways between these two limits?

Isn't that what vfs.ufs.dirhash_minsize is for? I think given that there
is a lot more memory in modern systems setting that higher by default is
probably a good idea. Or maybe I'm misunderstanding what that knob does?

>>> And second question - is there any negative impact with higher
>>> vfs.ufs.dirhash_maxmem? It stil defaults to 2MB (on FreeBSD 8.2) after
>>
>> Not that I know of.
>>
>>> 10 years, but I think we all are using bigger FS in these days with lot
>>> of files and directories and 2MB is not enough.
>>
>> AFAIK I've changed it to autotune so it's configured to approximately 4
>> MB on a 4 GB machine (and scales up) in 9.
> 
> I didn't tried 9 yet. Does it mean dirhash_maxmem is initially set to
> approximately 1% of physical RAM and then it can be set higher by sysctl
> as in older versions?

I'm not sure that's what's happening, I have 6G of ram in this box and I
have this by default:

vfs.ufs.dirhash_maxmem: 9977856


-- 

	Nothin' ever doesn't change, but nothin' changes much.
			-- OK Go

	Breadth of IT experience, and depth of knowledge in the DNS.
	Yours for the right price.  :)  http://SupersetSolutions.com/


From owner-freebsd-fs@FreeBSD.ORG  Sat Oct 22 03:04:06 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 52F431065673
	for <freebsd-fs@freebsd.org>; Sat, 22 Oct 2011 03:04:06 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta08.westchester.pa.mail.comcast.net
	(qmta08.westchester.pa.mail.comcast.net [76.96.62.80])
	by mx1.freebsd.org (Postfix) with ESMTP id 139068FC17
	for <freebsd-fs@freebsd.org>; Sat, 22 Oct 2011 03:04:05 +0000 (UTC)
Received: from omta16.westchester.pa.mail.comcast.net ([76.96.62.88])
	by qmta08.westchester.pa.mail.comcast.net with comcast
	id nepN1h0011uE5Es58f462m; Sat, 22 Oct 2011 03:04:06 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta16.westchester.pa.mail.comcast.net with comcast
	id nf451h00J1t3BNj3cf45rQ; Sat, 22 Oct 2011 03:04:06 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id BFFAA102C1C; Fri, 21 Oct 2011 20:04:03 -0700 (PDT)
Date: Fri, 21 Oct 2011 20:04:03 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Doug Barton <dougb@FreeBSD.org>
Message-ID: <20111022030403.GA176@icarus.home.lan>
References: <4E97FEDD.7060205@quip.cz> <j7938v$66s$1@dough.gmane.org>
	<4EA19203.5050503@quip.cz> <4EA2277B.5080306@FreeBSD.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4EA2277B.5080306@FreeBSD.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org, Ivan Voras <ivoras@freebsd.org>
Subject: Re: dirhash and dynamic memory allocation
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 22 Oct 2011 03:04:06 -0000

On Fri, Oct 21, 2011 at 07:16:27PM -0700, Doug Barton wrote:
> On 10/21/2011 08:38, Miroslav Lachman wrote:
> > Hi, I am back on this topic...
> > 
> > Ivan Voras wrote:
> >> On 14/10/2011 11:20, Miroslav Lachman wrote:
> >>> Hi all,
> >>>
> >>> I tried some tuning of dirhash on our servers and after googlig a bit, I
> >>> found an old GSoC project wiki page about Dynamic Memory Allocation for
> >>> Dirhash: http://wiki.freebsd.org/DirhashDynamicMemory
> >>> Is there any reason not to use it / not commit it to HEAD?
> >>
> >> AFAIK it's sort-of already present. In 8-stable and recent kernels you
> >> can give huge amounts of memory to dirhash via vfs.ufs.dirhash_maxmem
> >> (but except in really large edge cases I don't think you *need* more
> >> than 32 MB), and the kernel will scale-down or free the memory if not
> >> needed.
> >>
> >> In effect, vfs.ufs.dirhash_maxmem is the upper limit - the kernel will
> >> use less and will free the allocated memory in low memory situations
> >> (which I've tried and it works).
> > 
> > So the current behavior is that on 7.3+ and 8.x we have smaller average
> > dirhash buffer (by default) than it was initialy 10 years ago. Because
> > it starts as 2MB fixed size and now we have 2MB max, which is lowered in
> > low mem situations... and sometimes it is set to 0MB!
> > 
> > I caught this 2 days ago:
> > 
> > root@rip ~/# sysctl vfs.ufs
> > vfs.ufs.dirhash_reclaimage: 5
> > vfs.ufs.dirhash_lowmemcount: 36953
> > vfs.ufs.dirhash_docheck: 0
> > vfs.ufs.dirhash_mem: 0
> > vfs.ufs.dirhash_maxmem: 8388608
> > vfs.ufs.dirhash_minsize: 2560
> > 
> > I set maxmem to 8MB in sysctl.conf to increase performance and
> > dirhash_mem 0 is really bad surprise!
> > 
> > I am worrying about bad performance in situation where dirhash is
> > emptied in situations, where server is already running at maximum
> > performance (there is some memory hungry process and system can start
> > swapping to disk + dirhash is efectively disabled)
> > 
> > I found a PR kern/145246
> > http://www.freebsd.org/cgi/query-pr.cgi?pr=145246
> > 
> > Is it possible to add some dirhash_minmem limit to not clear all the
> > dirhash memory?
> > So I can set dirhash_minmem=2MB dirhash_maxmem=16MB and then dirhash_mem
> > will be allways between these two limits?
> 
> Isn't that what vfs.ufs.dirhash_minsize is for? I think given that there
> is a lot more memory in modern systems setting that higher by default is
> probably a good idea. Or maybe I'm misunderstanding what that knob does?

I believe the function of that sysctl is different.  It's not the
"minimum amount of dirhash memory to retain", it's:

$ sysctl -d vfs.ufs.dirhash_minsize
vfs.ufs.dirhash_minsize: minimum directory size in bytes for which to use hashed lookup

The sysctl should really be named "dirhash_mindirsize".

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Sat Oct 22 03:13:22 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx2.freebsd.org (mx2.freebsd.org [IPv6:2001:4f8:fff6::35])
	by hub.freebsd.org (Postfix) with ESMTP id D7FA31065675;
	Sat, 22 Oct 2011 03:13:22 +0000 (UTC)
	(envelope-from dougb@FreeBSD.org)
Received: from 172-17-198-245.globalsuite.net (hub.freebsd.org
	[IPv6:2001:4f8:fff6::36])
	by mx2.freebsd.org (Postfix) with ESMTP id 492011507AF;
	Sat, 22 Oct 2011 03:13:22 +0000 (UTC)
Message-ID: <4EA234D1.7000805@FreeBSD.org>
Date: Fri, 21 Oct 2011 20:13:21 -0700
From: Doug Barton <dougb@FreeBSD.org>
Organization: http://SupersetSolutions.com/
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:7.0.1) Gecko/20111001 Thunderbird/7.0.1
MIME-Version: 1.0
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
References: <4E97FEDD.7060205@quip.cz> <j7938v$66s$1@dough.gmane.org>
	<4EA19203.5050503@quip.cz> <4EA2277B.5080306@FreeBSD.org>
	<20111022030403.GA176@icarus.home.lan>
In-Reply-To: <20111022030403.GA176@icarus.home.lan>
X-Enigmail-Version: undefined
OpenPGP: id=1A1ABC84
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org, Ivan Voras <ivoras@freebsd.org>
Subject: Re: dirhash and dynamic memory allocation
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 22 Oct 2011 03:13:23 -0000

On 10/21/2011 20:04, Jeremy Chadwick wrote:
> On Fri, Oct 21, 2011 at 07:16:27PM -0700, Doug Barton wrote:

>> Isn't that what vfs.ufs.dirhash_minsize is for? I think given that there
>> is a lot more memory in modern systems setting that higher by default is
>> probably a good idea. Or maybe I'm misunderstanding what that knob does?
> 
> I believe the function of that sysctl is different.  It's not the
> "minimum amount of dirhash memory to retain", it's:
> 
> $ sysctl -d vfs.ufs.dirhash_minsize
> vfs.ufs.dirhash_minsize: minimum directory size in bytes for which to use hashed lookup

Ah, silly me. I'm so used to 'sysctl -d' not working that I didn't even
try it this time. Thanks for setting me straight.

In that case I agree with the OP that a knob for minimum setting would
be desirable.


Doug

-- 

	Nothin' ever doesn't change, but nothin' changes much.
			-- OK Go

	Breadth of IT experience, and depth of knowledge in the DNS.
	Yours for the right price.  :)  http://SupersetSolutions.com/


From owner-freebsd-fs@FreeBSD.ORG  Sat Oct 22 05:06:03 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 86637106564A;
	Sat, 22 Oct 2011 05:06:03 +0000 (UTC)
	(envelope-from linimon@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 5E3978FC0A;
	Sat, 22 Oct 2011 05:06:03 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p9M563sH052058;
	Sat, 22 Oct 2011 05:06:03 GMT
	(envelope-from linimon@freefall.freebsd.org)
Received: (from linimon@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p9M563tA052054;
	Sat, 22 Oct 2011 05:06:03 GMT (envelope-from linimon)
Date: Sat, 22 Oct 2011 05:06:03 GMT
Message-Id: <201110220506.p9M563tA052054@freefall.freebsd.org>
To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org
From: linimon@FreeBSD.org
Cc: 
Subject: Re: kern/161864: [ufs] removing journaling from UFS partition fails
	on gpt-labelled disk
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 22 Oct 2011 05:06:03 -0000

Old Synopsis: removing journaling from UFS partition fails on gpt-labelled disk
New Synopsis: [ufs] removing journaling from UFS partition fails on gpt-labelled disk

Responsible-Changed-From-To: freebsd-bugs->freebsd-fs
Responsible-Changed-By: linimon
Responsible-Changed-When: Sat Oct 22 05:05:27 UTC 2011
Responsible-Changed-Why: 
This kind of spans several areas, so try to guess at a label for it as
I assign it.

http://www.freebsd.org/cgi/query-pr.cgi?pr=161864

From owner-freebsd-fs@FreeBSD.ORG  Sat Oct 22 09:52:04 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A0F32106566B;
	Sat, 22 Oct 2011 09:52:04 +0000 (UTC)
	(envelope-from 000.fbsd@quip.cz)
Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4])
	by mx1.freebsd.org (Postfix) with ESMTP id 4B4198FC14;
	Sat, 22 Oct 2011 09:52:03 +0000 (UTC)
Received: from elsa.codelab.cz (localhost [127.0.0.1])
	by elsa.codelab.cz (Postfix) with ESMTP id 582F928424;
	Sat, 22 Oct 2011 11:52:02 +0200 (CEST)
Received: from [192.168.1.2] (ip-86-49-61-235.net.upcbroadband.cz
	[86.49.61.235])
	(using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by elsa.codelab.cz (Postfix) with ESMTPSA id DC2BB28423;
	Sat, 22 Oct 2011 11:52:00 +0200 (CEST)
Message-ID: <4EA2923F.7060303@quip.cz>
Date: Sat, 22 Oct 2011 11:51:59 +0200
From: Miroslav Lachman <000.fbsd@quip.cz>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
	rv:1.9.1.19) Gecko/20110420 Lightning/1.0b1 SeaMonkey/2.0.14
MIME-Version: 1.0
To: Doug Barton <dougb@FreeBSD.org>
References: <4E97FEDD.7060205@quip.cz> <j7938v$66s$1@dough.gmane.org>
	<4EA19203.5050503@quip.cz> <4EA2277B.5080306@FreeBSD.org>
	<20111022030403.GA176@icarus.home.lan>
	<4EA234D1.7000805@FreeBSD.org>
In-Reply-To: <4EA234D1.7000805@FreeBSD.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org, Ivan Voras <ivoras@freebsd.org>
Subject: Re: dirhash and dynamic memory allocation
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 22 Oct 2011 09:52:04 -0000

Doug Barton wrote:
> On 10/21/2011 20:04, Jeremy Chadwick wrote:
>> On Fri, Oct 21, 2011 at 07:16:27PM -0700, Doug Barton wrote:
>
>>> Isn't that what vfs.ufs.dirhash_minsize is for? I think given that there
>>> is a lot more memory in modern systems setting that higher by default is
>>> probably a good idea. Or maybe I'm misunderstanding what that knob does?
>>
>> I believe the function of that sysctl is different.  It's not the
>> "minimum amount of dirhash memory to retain", it's:
>>
>> $ sysctl -d vfs.ufs.dirhash_minsize
>> vfs.ufs.dirhash_minsize: minimum directory size in bytes for which to use hashed lookup
>
> Ah, silly me. I'm so used to 'sysctl -d' not working that I didn't even
> try it this time. Thanks for setting me straight.

sysctls are becoming a mess as new are added, but only few of them are 
good named or documented. Even if some have 'sysctl -d' description, the 
description is not helpfull for non-developer persons like me.
And the second aspect is that sometimes there is two sysctl OID with 
slightly different naming scheme doing the same (or having same description)

for example low_mem vs. lowmemcount:

# sysctl -a | egrep 'low_?mem'
kern.geom.journal.stats.low_mem: 46443
vfs.ufs.dirhash_lowmemcount: 46443

# sysctl -d {kern.geom.journal.stats.low_mem,vfs.ufs.dirhash_lowmemcount}

kern.geom.journal.stats.low_mem: Number of times low_mem hook was called.
vfs.ufs.dirhash_lowmemcount: number of times low memory hook called

And the problem for non-developer person is, that this description 
brings more questions than it answers. "What condition is causing it?, 
What low mem hook is doing? What should I do with it?..." (I already 
know it, not from a documentation, but from discussions in mailinglists)

FreeBSD is known to its good documentation, so it is sad that this 
important part of the system is lacking really good documentation. 
FreeBSD comes with not optimal default settings nor autotunes, so almost 
everybody needs to set something in loader.conf or sysctl.conf to 
achieve better performance. But it is hard without good sysctl 
documentation.
I know there were some atempts in the past (GSoC project and 
http://wiki.freebsd.org/IdeasPage#Document_all_sysctls) but none of them 
was successful.

Maybe it's time for stronger policy in committing new code to the tree - 
if something is adding new sysctl IOD, it must have 'sysctl -d' 
description and some documentation of its behavior in handbook or some 
wiki page - I don't know where is the right place to have all sysctls 
documented. (something like man rc.conf) Maybe wiki page with some 
tuning tips will be the best place, as more persons can edit it.

...I am sorry for being off-topic, this is really not related to my 
original dirhash problem. :)

> In that case I agree with the OP that a knob for minimum setting would
> be desirable.

Miroslav Lachman

From owner-freebsd-fs@FreeBSD.ORG  Sat Oct 22 13:29:37 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3650A106566C
	for <freebsd-fs@freebsd.org>; Sat, 22 Oct 2011 13:29:37 +0000 (UTC)
	(envelope-from ivoras@gmail.com)
Received: from mail-yw0-f54.google.com (mail-yw0-f54.google.com
	[209.85.213.54])
	by mx1.freebsd.org (Postfix) with ESMTP id D6B2F8FC0A
	for <freebsd-fs@freebsd.org>; Sat, 22 Oct 2011 13:29:36 +0000 (UTC)
Received: by ywt32 with SMTP id 32so1024047ywt.13
	for <multiple recipients>; Sat, 22 Oct 2011 06:29:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:from:date
	:x-google-sender-auth:message-id:subject:to:cc:content-type;
	bh=sTJcaxVVzBYx65wf8I8jOOnvoc6eHMx0wcpofdd8gfg=;
	b=lQm31adJUXMj/SAr5hHym89KhJOjSEgUt2yCMbkOxoVUppcPwgiLQJSa2yNhZPn9uq
	5atR7cCIeCpInSKmfO0SQahHn5jQAWQejc/rkk3QuwsORaIKvH8qpKzqIupV+gLhReRM
	ABZgrDJmVY+InJNXl5SSqUV7yEFOyvL7nt0us=
Received: by 10.100.56.32 with SMTP id e32mr4154911ana.66.1319290176242; Sat,
	22 Oct 2011 06:29:36 -0700 (PDT)
MIME-Version: 1.0
Sender: ivoras@gmail.com
Received: by 10.100.189.14 with HTTP; Sat, 22 Oct 2011 06:28:56 -0700 (PDT)
In-Reply-To: <4EA2277B.5080306@FreeBSD.org>
References: <4E97FEDD.7060205@quip.cz> <j7938v$66s$1@dough.gmane.org>
	<4EA19203.5050503@quip.cz> <4EA2277B.5080306@FreeBSD.org>
From: Ivan Voras <ivoras@freebsd.org>
Date: Sat, 22 Oct 2011 15:28:56 +0200
X-Google-Sender-Auth: 6-wGmCaBNb6Fy_YVHxD33mm2RKQ
Message-ID: <CAF-QHFUp6sX-JG+6N8WGUN2_taS+K4y1-F8Vs5CS6kOi=rx5Uw@mail.gmail.com>
To: Doug Barton <dougb@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Cc: freebsd-fs@freebsd.org
Subject: Re: dirhash and dynamic memory allocation
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 22 Oct 2011 13:29:37 -0000

On 22 October 2011 04:16, Doug Barton <dougb@freebsd.org> wrote:
> On 10/21/2011 08:38, Miroslav Lachman wrote:

>> will be allways between these two limits?
>
> Isn't that what vfs.ufs.dirhash_minsize is for? I think given that there
> is a lot more memory in modern systems setting that higher by default is
> probably a good idea. Or maybe I'm misunderstanding what that knob does?

Directories are AFAIK cached "all or nothing" so if there are some
large directories in the dirhash and they are evicted, it's possible
to end up with "0" dirhash used without being able to fit a directory
in the dirhash_minsize for any reasonable amount of time.
>>> AFAIK I've changed it to autotune so it's configured to approximately 4
>>> MB on a 4 GB machine (and scales up) in 9.
>>
>> I didn't tried 9 yet. Does it mean dirhash_maxmem is initially set to
>> approximately 1% of physical RAM and then it can be set higher by sysctl
>> as in older versions?
>
> I'm not sure that's what's happening, I have 6G of ram in this box and I
> have this by default:
>
> vfs.ufs.dirhash_maxmem: 9977856

It's actually not a direct percentage of memory but it's tied to
hibufspace which is itself auto-tuned here:
http://fxr.watson.org/fxr/source/kern/vfs_bio.c?v=FREEBSD8#L606 . So
yes, it's nonlinear (and it probably doesn't matter).

From owner-freebsd-fs@FreeBSD.ORG  Sat Oct 22 16:05:48 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5F8D2106564A;
	Sat, 22 Oct 2011 16:05:48 +0000 (UTC)
	(envelope-from linimon@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 3757D8FC0A;
	Sat, 22 Oct 2011 16:05:48 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p9MG5mT7095808;
	Sat, 22 Oct 2011 16:05:48 GMT
	(envelope-from linimon@freefall.freebsd.org)
Received: (from linimon@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p9MG5lCV095801;
	Sat, 22 Oct 2011 16:05:48 GMT (envelope-from linimon)
Date: Sat, 22 Oct 2011 16:05:48 GMT
Message-Id: <201110221605.p9MG5lCV095801@freefall.freebsd.org>
To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org
From: linimon@FreeBSD.org
Cc: 
Subject: Re: kern/161897: [zfs] [patch] zfs partition probing causing long
	delay at BTX loader
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 22 Oct 2011 16:05:48 -0000

Old Synopsis: zfs parition probing causing long delay at BTX loader
New Synopsis: [zfs] [patch] zfs partition probing causing long delay at BTX loader

Responsible-Changed-From-To: freebsd-bugs->freebsd-fs
Responsible-Changed-By: linimon
Responsible-Changed-When: Sat Oct 22 16:05:13 UTC 2011
Responsible-Changed-Why: 
Over to maintainer(s).

http://www.freebsd.org/cgi/query-pr.cgi?pr=161897