From owner-freebsd-fs@FreeBSD.ORG  Sun Sep 16 21:59:13 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4A13516A420
	for <freebsd-fs@freebsd.org>; Sun, 16 Sep 2007 21:59:13 +0000 (UTC)
	(envelope-from valenta@sutra.cz)
Received: from slimak.dkm.cz (smtp.dkm.cz [62.24.64.34])
	by mx1.freebsd.org (Postfix) with SMTP id AA4D613C469
	for <freebsd-fs@freebsd.org>; Sun, 16 Sep 2007 21:59:12 +0000 (UTC)
	(envelope-from valenta@sutra.cz)
Received: (qmail 49346 invoked by uid 0); 16 Sep 2007 21:32:31 -0000
Received: from r2t207.net.upc.cz (HELO ?192.168.1.163?) (62.245.83.207)
	by smtp.dkm.cz with SMTP; 16 Sep 2007 21:32:31 -0000
From: Miroslav Valenta <valenta@sutra.cz>
To: freebsd-fs@freebsd.org
Content-Type: text/plain
Date: Sun, 16 Sep 2007 23:32:36 +0200
Message-Id: <1189978356.30388.11.camel@afrodita>
Mime-Version: 1.0
X-Mailer: Evolution 2.11.92 
Content-Transfer-Encoding: 7bit
Subject: slow transfers on webshare service
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: valenta@sutra.cz
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Sep 2007 21:59:13 -0000

Hi 

i have problem with slow transfers on my web filesharing server. I'm
running FreeBSD 6.2 

files are sending by lighttpd 1.4.18 :

server.max-worker = 8
server.max-fds = 8192
server.network-backend = "writev"

-----

HW:
xeon 3160 
4GB of RAM
areca arc-1260 disk controller with 1GB cache + 8x 500GB sata2 hdd


file transfers slow down when i'm about 500 downloading connections

but when i send same file from same storage by same line it's  sending
fast.

i think it must be something about lighttpd or sysctl tunning.


Can you help me pls?


From owner-freebsd-fs@FreeBSD.ORG  Mon Sep 17 10:54:14 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1D0E816A417
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2007 10:54:14 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 9B3B413C465
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2007 10:54:13 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from list by ciao.gmane.org with local (Exim 4.43)
	id 1IXEEv-00008D-BB
	for freebsd-fs@freebsd.org; Mon, 17 Sep 2007 12:54:05 +0200
Received: from lara.cc.fer.hr ([161.53.72.113])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2007 12:54:05 +0200
Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Mon, 17 Sep 2007 12:54:05 +0200
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-fs@freebsd.org
From: Ivan Voras <ivoras@freebsd.org>
Date: Mon, 17 Sep 2007 12:53:53 +0200
Lines: 59
Message-ID: <fclmc2$bed$1@sea.gmane.org>
References: <1189978356.30388.11.camel@afrodita>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-ripemd160;
	protocol="application/pgp-signature";
	boundary="------------enigB8C6DDB4D0EDF1618C3D70D9"
X-Complaints-To: usenet@sea.gmane.org
X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr
User-Agent: Thunderbird 1.5.0.12 (X11/20060911)
In-Reply-To: <1189978356.30388.11.camel@afrodita>
X-Enigmail-Version: 0.94.4.0
Sender: news <news@sea.gmane.org>
Subject: Re: slow transfers on webshare service
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Sep 2007 10:54:14 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigB8C6DDB4D0EDF1618C3D70D9
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable

Miroslav Valenta wrote:
> Hi=20
>=20
> i have problem with slow transfers on my web filesharing server. I'm
> running FreeBSD 6.2=20
>=20
> files are sending by lighttpd 1.4.18 :
>=20
> server.max-worker =3D 8
> server.max-fds =3D 8192
> server.network-backend =3D "writev"
>=20
> -----
>=20
> HW:
> xeon 3160=20
> 4GB of RAM
> areca arc-1260 disk controller with 1GB cache + 8x 500GB sata2 hdd
>=20
>=20
> file transfers slow down when i'm about 500 downloading connections
> but when i send same file from same storage by same line it's  sending
> fast.

Please characterize "slow" and "fast" - how is it slow and how fast do=20
you think it should be?

Some general tips/ideas:

- Do you really need 8 workers? Lighttpd is async server so it's about=20
as fast as it gets. If you only serve static files, larger number of=20
worker processes (and CPUs...) won't make a difference.
- Do you use the "kqueue" extension for Lighttpd?
- 500 parallel downloads probably mean lots of seeking on the disk drive =

array, have you verified you can sustain the speed you need with the driv=
es?


--------------enigB8C6DDB4D0EDF1618C3D70D9
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFG7lzBldnAQVacBcgRA57tAKDp4XaigIoTpXintYjuqSx2fqdelACaAlsF
Ojd8SKk6OE3KAg1+RUxo21M=
=ufYo
-----END PGP SIGNATURE-----

--------------enigB8C6DDB4D0EDF1618C3D70D9--


From owner-freebsd-fs@FreeBSD.ORG  Mon Sep 17 11:07:59 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DD4BA16A419
	for <freebsd-fs@FreeBSD.org>; Mon, 17 Sep 2007 11:07:59 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id C68EA13C45D
	for <freebsd-fs@FreeBSD.org>; Mon, 17 Sep 2007 11:07:59 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.1/8.14.1) with ESMTP id l8HB7xOx049365
	for <freebsd-fs@FreeBSD.org>; Mon, 17 Sep 2007 11:07:59 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.1/8.14.1/Submit) id l8HB7wrq049361
	for freebsd-fs@FreeBSD.org; Mon, 17 Sep 2007 11:07:58 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 17 Sep 2007 11:07:58 GMT
Message-Id: <200709171107.l8HB7wrq049361@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
	owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@FreeBSD.org>
To: freebsd-fs@FreeBSD.org
Cc: 
Subject: Current problem reports assigned to you
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 17 Sep 2007 11:08:00 -0000

Current FreeBSD problem reports
Critical problems
Serious problems

S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/112658  fs         [smbfs] [patch] smbfs and caching problems (resolves b
o kern/114676  fs         [ufs] snapshot creation panics: snapacct_ufs2: bad blo
o kern/114856  fs         [ntfs] [patch] Bug in NTFS allows bogus file modes.
o bin/115165   fs         [PATCH] amd(8): add functionality of mount_nfs' -L -a 
o kern/116170  fs         Kernel panic when mounting /tmp

5 problems total.

Non-critical problems

S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/114847  fs         [ntfs] [patch] dirmask support for NTFS ala MSDOSFS

1 problem total.


From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 18 12:28:41 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1F1A116A420;
	Tue, 18 Sep 2007 12:28:41 +0000 (UTC)
	(envelope-from matteo@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id E8B5E13C4E8;
	Tue, 18 Sep 2007 12:28:40 +0000 (UTC)
	(envelope-from matteo@FreeBSD.org)
Received: from freefall.freebsd.org (matteo@localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.1/8.14.1) with ESMTP id l8ICSeB1047179;
	Tue, 18 Sep 2007 12:28:40 GMT
	(envelope-from matteo@freefall.freebsd.org)
Received: (from matteo@localhost)
	by freefall.freebsd.org (8.14.1/8.14.1/Submit) id l8ICSefl047175;
	Tue, 18 Sep 2007 12:28:40 GMT (envelope-from matteo)
Date: Tue, 18 Sep 2007 12:28:40 GMT
Message-Id: <200709181228.l8ICSefl047175@freefall.freebsd.org>
To: Andre.Albsmeier@siemens.com, matteo@FreeBSD.org, freebsd-fs@FreeBSD.org
From: matteo@FreeBSD.org
Cc: 
Subject: Re: bin/115165: [PATCH] amd(8): add functionality of mount_nfs' -L
	-a -d options to amd
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2007 12:28:41 -0000

Synopsis: [PATCH] amd(8): add functionality of mount_nfs' -L -a -d options to amd

State-Changed-From-To: open->closed
State-Changed-By: matteo
State-Changed-When: Tue Sep 18 12:28:15 UTC 2007
State-Changed-Why: 
Bug was submitted to upstream maintainer, so no need to keep it open here

http://www.freebsd.org/cgi/query-pr.cgi?pr=115165

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 18 15:28:51 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4FD1F16A417
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2007 15:28:51 +0000 (UTC)
	(envelope-from astrodog@gmail.com)
Received: from nz-out-0506.google.com (nz-out-0506.google.com [64.233.162.238])
	by mx1.freebsd.org (Postfix) with ESMTP id 125D213C47E
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2007 15:28:50 +0000 (UTC)
	(envelope-from astrodog@gmail.com)
Received: by nz-out-0506.google.com with SMTP id l8so1022481nzf
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2007 08:28:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta;
	h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	bh=kfeitG6T3aV+11kqDB8vy5AarvqhCPiVeOPRCbTnACs=;
	b=ewC/HftGeP+m1T0fz0B2roJSoJ+dfuKWLOtixiBR9J6vUlE3O4ts56Gq2uDy+OWRW+Cy79k4Ors6XFmcYc7zxDrkR5MYHp0ftT6Iw3r9PxllOT80pteTLYOhONTLNIe0qPMbATj3vhJjPeRDRFh8T54i4ec1N5IrZIeY5aZC83M=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta;
	h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	b=BONURClYb42GQ6Q1tGlLMjLjChUAEX336PBS3f0nMyL/owvbVVKr2KDaaL5P4JMk9NhAqPmMENBPdO+1tfAZxQF8jQRzTDnQIyBGOkOqVOn2xHSDtm2vhesYZf6FBSTPx1MaWJfXLRhyECqlMRF2PhAFkxbQchZft5ct0GK125Y=
Received: by 10.115.60.1 with SMTP id n1mr1248187wak.1190129330030;
	Tue, 18 Sep 2007 08:28:50 -0700 (PDT)
Received: by 10.141.74.5 with HTTP; Tue, 18 Sep 2007 08:28:49 -0700 (PDT)
Message-ID: <2fd864e0709180828sec17035m5e575b5ad9701b08@mail.gmail.com>
Date: Tue, 18 Sep 2007 10:28:49 -0500
From: Astrodog <astrodog@gmail.com>
To: "Bruce Evans" <brde@optusnet.com.au>
In-Reply-To: <2fd864e0709180815y4c261252tfe9ce5c5a7130462@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <200709180037.l8I0bJb1003933@freefall.freebsd.org>
	<20070918211449.A75529@besplex.bde.org>
	<2fd864e0709180453l756d37c6y7dac8fa5fa8fcf15@mail.gmail.com>
	<2fd864e0709180514w627bb198r46f4ddecb212fd77@mail.gmail.com>
	<20070918224545.Y75789@besplex.bde.org>
	<2fd864e0709180815y4c261252tfe9ce5c5a7130462@mail.gmail.com>
Cc: freebsd-fs@freebsd.org, linimon@freebsd.org
Subject: Re: amd64/74811: [nfs] df, nfs mount,
	negative Avail -> 32/64-bit confusion
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2007 15:28:51 -0000

On 9/18/07, Astrodog <astrodog@gmail.com> wrote:
> > I cannot see how to get the correct result non-accidentally without
> > using the hack of passing negative values as large unsigned ones.
> > Passing negative values as the difference of two unsigned values works
> > with NetBSD's extension to statvfs (f_bresvd), but it doesn't work for
> > nfs because it requires an extra value which the protocol doesn't
> > support AFAIK (not far).
> >
> > Bruce
> >
>
> From the above, it doesn't appear that NFS can support negative
> values, in any reasonable way... and I suppose that saying "There are
> zero blocks avalible for non-privileged users" is accurate, when
> bavail <= 0.
>
> I'm going to dig through the RFCs and see if there's an otherwise
> unused or underused variable that could be used to store bresvd, for
> clients that could support it.
>
> Thanks for the detailed explaination,
> --- Harrison
>

The only thing I've found, thus far, is to hijack the "NULL" NFSv3
operation. From what I can tell, clients are expected to discard the
value.

On clients that are supported, the returned value can be what should
be subtracted from bfree to get bavail. bavail can be handled as it is
now in the server, so non-supporting clients wouldn't see any change
in behavior, beyond a NULL nfs operation taking a few cycles longer.

Any thoughts? I'm aware that this certainly isn't proper behavior...
but I also can't find anything that actually uses the NULL return.

--- Harrison

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 18 15:39:42 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E6A8F16A419
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2007 15:39:42 +0000 (UTC)
	(envelope-from astrodog@gmail.com)
Received: from rv-out-0910.google.com (rv-out-0910.google.com [209.85.198.188])
	by mx1.freebsd.org (Postfix) with ESMTP id BE55A13C46A
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2007 15:39:42 +0000 (UTC)
	(envelope-from astrodog@gmail.com)
Received: by rv-out-0910.google.com with SMTP id l15so1511849rvb
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2007 08:39:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta;
	h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	bh=xwfqS5QrmfgE9WjVCr83fOS1aqXpehucbHk60RosuDU=;
	b=kYjFDlFvu0LyvZo1tWBAwI0fGhQLCTb+U7HFXZKL7kd0BiIiC2OfimkLKuSlNIP+HFw7/IZK+M9INcYIL6EwxuFnCKdaEXJ/Bhj/6vdSq3xbLz/+Yes915IvRFaKH99qHT/xkkxtSE9O5Pt6EMb27Zjw1gQ1DlREN63EDkouadU=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta;
	h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
	b=uRWqGUpKakqOHZ4RU0krG4dUUg2sbSUQ+jXHq7ZM4YSJy2MD+Iz1B3rRvT0mLs/MHr4D34tcnMpHN8Z/99JtXr2xJWVj3tURY08G3wfnnu+BG+uP45lh6m043bsyguNFTWLk1zwEdw/kxVUPfQeEdE2Ce9bm6U5NQnAqU/wNuSk=
Received: by 10.141.50.17 with SMTP id c17mr1142414rvk.1190128510127;
	Tue, 18 Sep 2007 08:15:10 -0700 (PDT)
Received: by 10.141.74.5 with HTTP; Tue, 18 Sep 2007 08:15:10 -0700 (PDT)
Message-ID: <2fd864e0709180815y4c261252tfe9ce5c5a7130462@mail.gmail.com>
Date: Tue, 18 Sep 2007 10:15:10 -0500
From: Astrodog <astrodog@gmail.com>
To: "Bruce Evans" <brde@optusnet.com.au>
In-Reply-To: <20070918224545.Y75789@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <200709180037.l8I0bJb1003933@freefall.freebsd.org>
	<20070918211449.A75529@besplex.bde.org>
	<2fd864e0709180453l756d37c6y7dac8fa5fa8fcf15@mail.gmail.com>
	<2fd864e0709180514w627bb198r46f4ddecb212fd77@mail.gmail.com>
	<20070918224545.Y75789@besplex.bde.org>
Cc: freebsd-fs@freebsd.org, linimon@freebsd.org
Subject: Re: amd64/74811: [nfs] df, nfs mount,
	negative Avail -> 32/64-bit confusion
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2007 15:39:43 -0000

> I cannot see how to get the correct result non-accidentally without
> using the hack of passing negative values as large unsigned ones.
> Passing negative values as the difference of two unsigned values works
> with NetBSD's extension to statvfs (f_bresvd), but it doesn't work for
> nfs because it requires an extra value which the protocol doesn't
> support AFAIK (not far).
>
> Bruce
>

>From the above, it doesn't appear that NFS can support negative
values, in any reasonable way... and I suppose that saying "There are
zero blocks avalible for non-privileged users" is accurate, when
bavail <= 0.

I'm going to dig through the RFCs and see if there's an otherwise
unused or underused variable that could be used to store bresvd, for
clients that could support it.

Thanks for the detailed explaination,
--- Harrison

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 18 18:09:36 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4A17616A469;
	Tue, 18 Sep 2007 18:09:36 +0000 (UTC)
	(envelope-from 000.fbsd@quip.cz)
Received: from elsa.codelab.cz (elsa.codelab.cz [82.208.36.70])
	by mx1.freebsd.org (Postfix) with ESMTP id 0EFC513C465;
	Tue, 18 Sep 2007 18:09:35 +0000 (UTC)
	(envelope-from 000.fbsd@quip.cz)
Received: from localhost (localhost.codelab.cz [127.0.0.1])
	by elsa.codelab.cz (Postfix) with ESMTP id 523C019E02A;
	Tue, 18 Sep 2007 19:51:12 +0200 (CEST)
Received: from [192.168.1.2] (r3a200.net.upc.cz [213.220.192.200])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by elsa.codelab.cz (Postfix) with ESMTP id C0B4619E027;
	Tue, 18 Sep 2007 19:51:09 +0200 (CEST)
Message-ID: <46F0105A.3060309@quip.cz>
Date: Tue, 18 Sep 2007 19:52:26 +0200
From: Miroslav Lachman <000.fbsd@quip.cz>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
	rv:1.7.12) Gecko/20050915
X-Accept-Language: cz, cs, en, en-us
MIME-Version: 1.0
To: Ivan Voras <ivoras@freebsd.org>
References: <1189978356.30388.11.camel@afrodita> <fclmc2$bed$1@sea.gmane.org>
In-Reply-To: <fclmc2$bed$1@sea.gmane.org>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
Subject: Re: slow transfers on webshare service
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2007 18:09:36 -0000

Ivan Voras wrote:
> Miroslav Valenta wrote:
> 
>> Hi
>> i have problem with slow transfers on my web filesharing server. I'm
>> running FreeBSD 6.2
>> files are sending by lighttpd 1.4.18 :
>>
>> server.max-worker = 8
>> server.max-fds = 8192
>> server.network-backend = "writev"
>>
>> -----
>>
>> HW:
>> xeon 3160 4GB of RAM
>> areca arc-1260 disk controller with 1GB cache + 8x 500GB sata2 hdd
>>
>>
>> file transfers slow down when i'm about 500 downloading connections
>> but when i send same file from same storage by same line it's  sending
>> fast.
> 
> 
> Please characterize "slow" and "fast" - how is it slow and how fast do 
> you think it should be?
> 
> Some general tips/ideas:
> 
> - Do you really need 8 workers? Lighttpd is async server so it's about 
> as fast as it gets. If you only serve static files, larger number of 
> worker processes (and CPUs...) won't make a difference.
> - Do you use the "kqueue" extension for Lighttpd?
> - 500 parallel downloads probably mean lots of seeking on the disk drive 
> array, have you verified you can sustain the speed you need with the 
> drives?

I am running Lighttpd for about 2 years on download server. With 1 
worker, lighttpd seems limited to 110Mbps. After change to 4 workers
throughput increase to about 190Mbps serving 250-400 clients. (daily 
traffic is ~750GB)

I am using these settings:
server.event-handler = "freebsd-kqueue" # needed on OS X
#server.network-backend = "freebsd-sendfile" # better for small files
server.network-backend = "writev" # better for large files?
server.max-keep-alive-requests = 6
server.max-keep-alive-idle = 5
server.max-read-idle = 60
server.max-write-idle = 180

Miroslav Lachman

From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 18 20:36:27 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F201216A418
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2007 20:36:26 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by mx1.freebsd.org (Postfix) with ESMTP id A55B913C4B4
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2007 20:36:26 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from list by ciao.gmane.org with local (Exim 4.43)
	id 1IXjn6-00077w-Ey
	for freebsd-fs@freebsd.org; Tue, 18 Sep 2007 22:35:28 +0200
Received: from 78-0-83-191.adsl.net.t-com.hr ([78.0.83.191])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2007 22:35:28 +0200
Received: from ivoras by 78-0-83-191.adsl.net.t-com.hr with local (Gmexim 0.1
	(Debian)) id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2007 22:35:28 +0200
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-fs@freebsd.org
From: Ivan Voras <ivoras@freebsd.org>
Date: Tue, 18 Sep 2007 22:34:03 +0200
Lines: 31
Message-ID: <fcpco6$3sv$1@sea.gmane.org>
References: <1189978356.30388.11.camel@afrodita> <fclmc2$bed$1@sea.gmane.org>
	<46F0105A.3060309@quip.cz>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature";
	boundary="------------enigA2CE85D9E26CCCE75676BF1B"
X-Complaints-To: usenet@sea.gmane.org
X-Gmane-NNTP-Posting-Host: 78-0-83-191.adsl.net.t-com.hr
User-Agent: Thunderbird 2.0.0.6 (Windows/20070728)
In-Reply-To: <46F0105A.3060309@quip.cz>
X-Enigmail-Version: 0.95.3
Sender: news <news@sea.gmane.org>
Subject: Re: slow transfers on webshare service
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2007 20:36:27 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigA2CE85D9E26CCCE75676BF1B
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Miroslav Lachman wrote:

> I am running Lighttpd for about 2 years on download server. With 1
> worker, lighttpd seems limited to 110Mbps. After change to 4 workers
> throughput increase to about 190Mbps serving 250-400 clients. (daily
> traffic is ~750GB)

Ok. Can you run iostat during peak traffic and report its output? So we
rule out disk drives.


--------------enigA2CE85D9E26CCCE75676BF1B
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG8DZEldnAQVacBcgRAgYeAKDKpUpGebfbwQaMkL5F2FY+VNgt8QCgnhsp
In4bDd09Xzw09JTS4Ra4mhQ=
=FRDG
-----END PGP SIGNATURE-----

--------------enigA2CE85D9E26CCCE75676BF1B--


From owner-freebsd-fs@FreeBSD.ORG  Tue Sep 18 21:33:01 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 300B716A418
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2007 21:33:01 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from fallbackmx03.syd.optusnet.com.au
	(fallbackmx03.syd.optusnet.com.au [211.29.133.136])
	by mx1.freebsd.org (Postfix) with ESMTP id B4C5E13C467
	for <freebsd-fs@freebsd.org>; Tue, 18 Sep 2007 21:33:00 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au
	[211.29.132.184])
	by fallbackmx03.syd.optusnet.com.au (8.12.11.20060308/8.12.11) with
	ESMTP id l8IEq306024489
	for <freebsd-fs@freebsd.org>; Wed, 19 Sep 2007 00:52:03 +1000
Received: from besplex.bde.org (c220-239-235-248.carlnfd3.nsw.optusnet.com.au
	[220.239.235.248])
	by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	l8IEpvl9028680
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 19 Sep 2007 00:51:59 +1000
Date: Wed, 19 Sep 2007 00:51:57 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Astrodog <astrodog@gmail.com>
In-Reply-To: <2fd864e0709180514w627bb198r46f4ddecb212fd77@mail.gmail.com>
Message-ID: <20070918224545.Y75789@besplex.bde.org>
References: <200709180037.l8I0bJb1003933@freefall.freebsd.org> 
	<20070918211449.A75529@besplex.bde.org>
	<2fd864e0709180453l756d37c6y7dac8fa5fa8fcf15@mail.gmail.com>
	<2fd864e0709180514w627bb198r46f4ddecb212fd77@mail.gmail.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@freebsd.org, linimon@freebsd.org
Subject: Re: amd64/74811: [nfs] df, nfs mount, negative Avail -> 32/64-bit
 confusion
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 18 Sep 2007 21:33:01 -0000

[Redirected a bit]

On Tue, 18 Sep 2007, Astrodog wrote:
> On 9/18/07, Astrodog <astrodog@gmail.com> wrote:
>> On 9/18/07, Bruce Evans <brde@optusnet.com.au> wrote:

>>> -current still breaks negative avail counts on the server by clamping them
>>> them to 0, so the bug is less obvious on buggy clients.
>>
>> It appears that RFC 1094 calls for blocks free to be unsigned (2.2.8).
>> I don't know how this could be handled, besides clamping, though.
>
> Rather, it calls for blocks free, and blocks availible to be unsigned. D'oh.

RFC 1094 only covers nfsv2.  That is so crufty that its RFC even
specifies precisely unsigned for almost everything.  IIRC, nfsv3 also
specifies an unsigned type for the avail count, but that type is
uint64_t, and nfsv4 is similar.

This is clearly a bug in the spec, or rather, the spec doesn't support
BSD's primary file system or what BSD's nfs has always done.

nfs in FreeBSD-[1-4] ignores the spec and passes negative values as
large unsigned ones, mostly by blindly copying bits.  The server was
broken on 2004/04/12 (between 5.2R and 5.3R).  FreeBSD clients still
try to support negative values being passed as large unsigned ones,
but the clients with a 32-bit statfs have a lot of sign extension and
overflow bugs that are most serious for such values.

The design bug also affects statvfs(3).  POSIX standardized this but
not BSD's statfs(2).  Most things in struct statvfs are typedefed
almost to a fault (fsblkcnt_t for block counts, and fsfilcnt_t for
file counts), but fsblkcnt_t is specified to be an unsigned type, so
negative avail counts cannot work without hacks.

I don't know how to work around the design bug for all clients.
Clamping on the server seems to be best if the client doesn't support
negative avail counts.  NetBSD has large changes in this area, but
they seem to reduce to clamping.  In at least nfs_vfsops.c 1.144
(2005/01/02):

- On the server, there is no clamping, but I think negative values
   can't happen anyway because the avail counts are obtained from the
   statvfs interface and statvfs is broken (but see below about f_bresvd;
   f_bresvd is not used here so something like clamping happens
   automatically).

   NetBSD has also fixed bogus truncation of file counts to 32 bits in
   the v3 case.  Truncation is still blind, but only has to be to 32
   bits for the v2 case.

- on the client, the avail count is converted into statvfs's avail
   count (f_bavail) plus a NetBSD (?) extension of statvfs (f_bresvd).
   I think f_bresvd is NetBSD's solution to the design bug for statvfs
   and NetBSD needs this more than FreeBSD because NetBSD has converted
   many (?) utilities from statfs to statvfs.  For nfs_statvfs(),
   f_bresvd is initialized to f_bfree - f_bavail (where the free and
   avail counts are whatever is passed by then server>.  Then under a
   COMPAT_20 ifdef, avail counts which are so large that they can only
   be from an "old" server that is trying to pass a negative count,
   because b_avail to be set to 0.  In applications like df, the final
   avail count is f_bfree - f_bresvd.  This can easily be negative,
   and should be negative when f_bfree is 0 (no space for non-root) and
   f_bresvd is nonzero (some space for root).  However, nfs can only
   initialize things correctly if the server is "old" (= not broken to
   spec).  If the server is not "old" then the initializations are just:

 	f_bfree =  server f_bfree
 	f_bavail = server f_bavail
 	f_bresvd = f_bfree - f_bavail   # XXX no way to know server f_bresvd

and these are used like the following in df:

 	# f_bavail is not used in df!
 	avail = f_bfree - f_bresvd
 	      = f_bfree - (f_bfree - f_bavail)
 	      = f_bavail
 	      = server f_bavail (cast to int64_t)

The resulting `int64_t avail' can only be negative if f_bavail is
"negative" on the server, but we are using a difference and never
using f_bavail in df to avoid abusing f_bavail for holding negative
values, and in the case where the server actually passes us a "negative"
f_bavail and COMPAT_20 is configured, then we clobber f_bavail to 0
in nfs_statvfs() and end up getting avail = f_bfree in df -- completely
wrong.

So the NetBSD code only seems to give the correct result accidentally,
if the correct result is to print a negative avail count in df.  It
takes an "old" server and a client that thinks it doesn't support old
servers (no COMPAT_20 configured).

I cannot see how to get the correct result non-accidentally without
using the hack of passing negative values as large unsigned ones.
Passing negative values as the difference of two unsigned values works
with NetBSD's extension to statvfs (f_bresvd), but it doesn't work for
nfs because it requires an extra value which the protocol doesn't
support AFAIK (not far).

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 19 07:36:24 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C6F0616A418
	for <freebsd-fs@freebsd.org>; Wed, 19 Sep 2007 07:36:24 +0000 (UTC)
	(envelope-from freebsd-fs@adam.gs)
Received: from mail.adam.gs (mail.adam.gs [76.9.2.116])
	by mx1.freebsd.org (Postfix) with ESMTP id 68C2D13C442
	for <freebsd-fs@freebsd.org>; Wed, 19 Sep 2007 07:36:24 +0000 (UTC)
	(envelope-from freebsd-fs@adam.gs)
Received: from [127.0.0.1] (localhost.adam.gs [127.0.0.1])
	by mail.adam.gs (Postfix) with ESMTP id 931A8F354AD
	for <freebsd-fs@freebsd.org>; Wed, 19 Sep 2007 03:24:27 -0400 (EDT)
DomainKey-Signature: a=rsa-sha1; q=dns; c=simple; s=mail; d=adam.gs;
	b=UdIe9DWTAOtJoKU3g9jFzDLVIKJpU0NtDE+MARy76xhTKACBPSWO3Fh+tonm111XeyA5J74ARnsiLiRNI7BrdzU4crTz65kC+DHhX+KhjWf5C0n2Lcr02j+hYoO3dEcE8tCayXrg9ikEImU9Rq6PmcHxNE1oAkUBpk0HPonGJv8=;
Received: from [10.0.1.125] (unknown [64.111.192.110]) (Authenticated sender:
	adam@adam.gs) by mail.adam.gs (Postfix) with ESMTP id
	229A6F34D64; Wed, 19 Sep 2007 03:24:27 -0400 (EDT)
Mime-Version: 1.0 (Apple Message framework v752.3)
Content-Transfer-Encoding: 7bit
Message-Id: <E6CAB30C-E22C-4C46-8F43-B69A91E71FFF@adam.gs>
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
To: freebsd-current@freebsd.org, freebsd-fs@freebsd.org
From: Adam Jacob Muller <freebsd-fs@adam.gs>
Date: Wed, 19 Sep 2007 03:24:25 -0400
X-Mailer: Apple Mail (2.752.3)
Cc: 
Subject: ZFS pool not working on boot
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Sep 2007 07:36:24 -0000

Hello,
I have a server with two ZFS pools, one is an internal raid0 using 2  
drives connected via ahc. The other is an external storage array with  
11 drives also using ahc, using raidz. (This is a dell 1650 and pv220s).
On reboot, the pools do not come online on their own. Both pools  
consistently show as failed.

the exact symptoms vary, however I have seen that many drives are  
marked as variously "corrupt" or "unavailable" most zpool operations  
fail with "pool is unavailable" errors.

Here is the interesting part.
Consistently, 100% of the time, a zpool export followed by a zpool  
import restores the arrays to an ONLINE status. Once the array is  
online, it's quite stable (I'm loving ZFS btw, thank you to everyone  
for the hard work on this, ZFS is fantastic) and works great.

Anyone have any ideas why this might occur and what/if the solution is?

Any additional information can be provided on-request, I am running  
current from approximately 1 week ago.

-Adam


From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 19 08:44:22 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DC66216A417
	for <freebsd-fs@freebsd.org>; Wed, 19 Sep 2007 08:44:22 +0000 (UTC)
	(envelope-from wilkinsa@obelix.dsto.defence.gov.au)
Received: from digger1.defence.gov.au (digger1.defence.gov.au [203.5.217.4])
	by mx1.freebsd.org (Postfix) with ESMTP id 4822813C45D
	for <freebsd-fs@freebsd.org>; Wed, 19 Sep 2007 08:44:22 +0000 (UTC)
	(envelope-from wilkinsa@obelix.dsto.defence.gov.au)
Received: from ednmsw510.dsto.defence.gov.au (ednmsw510.dsto.defence.gov.au
	[131.185.68.11])
	by digger1.defence.gov.au (8.13.8/8.13.8) with ESMTP id l8J8ES3N017361; 
	Wed, 19 Sep 2007 17:44:28 +0930 (CST)
Received: from ednex510.dsto.defence.gov.au (ednex510.dsto.defence.gov.au) by 
	ednmsw510.dsto.defence.gov.au (Clearswift SMTPRS 5.2.9) with ESMTP
	id 
	<T822b5224a983b9440b16cc@ednmsw510.dsto.defence.gov.au>; Wed, 19 Sep 
	2007 17:55:52 +0930
Received: from obelix.dsto.defence.gov.au ([203.6.60.208]) by 
	ednex510.dsto.defence.gov.au with Microsoft SMTPSVC(6.0.3790.1830); 
	Wed, 19 Sep 2007 17:55:51 +0930
Received: from obelix.dsto.defence.gov.au (localhost [127.0.0.1]) by 
	obelix.dsto.defence.gov.au (8.14.1/8.14.1) with ESMTP id 
	l8J8PpGR056743; Wed, 19 Sep 2007 16:25:51 +0800 (WST) 
	(envelope-from wilkinsa@obelix.dsto.defence.gov.au)
Received: (from wilkinsa@localhost) by obelix.dsto.defence.gov.au 
	(8.14.1/8.14.1/Submit) id l8J8Ppkr056742; Wed, 19 Sep 2007 16:25:51 
	+0800 (WST) (envelope-from wilkinsa)
Date: Wed, 19 Sep 2007 16:25:51 +0800
From: "Wilkinson, Alex" <alex.wilkinson@dsto.defence.gov.au>
To: freebsd-current@freebsd.org, freebsd-fs@freebsd.org
Message-ID: <20070919082551.GS55051@obelix.dsto.defence.gov.au>
Mail-Followup-To: freebsd-current@freebsd.org, freebsd-fs@freebsd.org
References: <E6CAB30C-E22C-4C46-8F43-B69A91E71FFF@adam.gs>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <E6CAB30C-E22C-4C46-8F43-B69A91E71FFF@adam.gs>
Organisation: Defence Science Technology Organisation
User-Agent: Mutt/1.5.16 (2007-06-09)
X-OriginalArrivalTime: 19 Sep 2007 08:25:51.0952 (UTC) 
	FILETIME=[B0AD1500:01C7FA96]
X-TM-AS-Product-Ver: SMEX-7.0.0.1526-5.0.1021-15432.002
X-TM-AS-Result: No--7.209800-0.000000-31
Content-Transfer-Encoding: 7bit
Cc: 
Subject: Re: ZFS pool not working on boot
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Sep 2007 08:44:22 -0000

    0n Wed, Sep 19, 2007 at 03:24:25AM -0400, Adam Jacob Muller wrote: 

    >I have a server with two ZFS pools, one is an internal raid0 using 2 drives 
    >connected via ahc. The other is an external storage array with 11 drives 
    >also using ahc, using raidz. (This is a dell 1650 and pv220s).
    >On reboot, the pools do not come online on their own. Both pools 
    >consistently show as failed.

Make sure your hostid doesn't change. If it does. Then ZFS will fail upon bootstrap.

 -aW


IMPORTANT: This email remains the property of the Australian Defence Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT 1914.  If you have received this email in error, you are requested to contact the sender and delete the email.


From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 19 17:56:45 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4A90916A417;
	Wed, 19 Sep 2007 17:56:45 +0000 (UTC)
	(envelope-from 000.fbsd@quip.cz)
Received: from elsa.codelab.cz (elsa.codelab.cz [82.208.36.70])
	by mx1.freebsd.org (Postfix) with ESMTP id 017C413C442;
	Wed, 19 Sep 2007 17:56:44 +0000 (UTC)
	(envelope-from 000.fbsd@quip.cz)
Received: from localhost (localhost.codelab.cz [127.0.0.1])
	by elsa.codelab.cz (Postfix) with ESMTP id 1ABBD19E02A;
	Wed, 19 Sep 2007 19:56:43 +0200 (CEST)
Received: from [192.168.1.2] (r3a200.net.upc.cz [213.220.192.200])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by elsa.codelab.cz (Postfix) with ESMTP id AEEDB19E027;
	Wed, 19 Sep 2007 19:56:40 +0200 (CEST)
Message-ID: <46F16327.3010604@quip.cz>
Date: Wed, 19 Sep 2007 19:57:59 +0200
From: Miroslav Lachman <000.fbsd@quip.cz>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
	rv:1.7.12) Gecko/20050915
X-Accept-Language: cz, cs, en, en-us
MIME-Version: 1.0
To: Ivan Voras <ivoras@freebsd.org>
References: <1189978356.30388.11.camel@afrodita>
	<fclmc2$bed$1@sea.gmane.org>	<46F0105A.3060309@quip.cz>
	<fcpco6$3sv$1@sea.gmane.org>
In-Reply-To: <fcpco6$3sv$1@sea.gmane.org>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
Subject: Re: slow transfers on webshare service
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Sep 2007 17:56:45 -0000

Ivan Voras wrote:

> Miroslav Lachman wrote:
> 
> 
>>I am running Lighttpd for about 2 years on download server. With 1
>>worker, lighttpd seems limited to 110Mbps. After change to 4 workers
>>throughput increase to about 190Mbps serving 250-400 clients. (daily
>>traffic is ~750GB)
> 
> 
> Ok. Can you run iostat during peak traffic and report its output? So we
> rule out disk drives.

I have external disk array (da1) which is saturated at this time:

# iostat -w 5
       tty             da0              da1            pass0             cpu
  tin tout  KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s  us ni sy in id
    0    1 12.91   3  0.04  59.30  95  5.47   0.00   0  0.00   4  0 24 21 51
    0   92  2.00   1  0.00  59.28 251 14.51   0.00   0  0.00   1  0  5  5 89
    0   31 10.67   1  0.01  59.03 226 13.03   0.00   0  0.00   0  0  5  5 90
    0   31 17.74  17  0.29  58.80 231 13.29   0.00   0  0.00   0  0  4  4 92
    0   31  0.00   0  0.00  58.39 222 12.67   0.00   0  0.00   1  0  5  4 90

from systat -vmstat

Disks   da0   da1 pass0 pass1 pass2
KB/t   0.00 59.53  0.00  0.00  0.00
tps       0   226     0     0     0
MB/s   0.00 13.14  0.00  0.00  0.00
% busy    0    97     0     0     0

But without 4 workers a can't saturate array and traffic graph (by MRTG) 
will be limited on fixed bandwidth.
So setting Lighttpd to use 4 workers definitely help in my case. (I 
don't know if it is same for Miroslav Valenta)

Miroslav Lachman

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 19 18:02:04 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 032F616A419
	for <freebsd-fs@freebsd.org>; Wed, 19 Sep 2007 18:02:04 +0000 (UTC)
	(envelope-from qpadla@gmail.com)
Received: from ug-out-1314.google.com (ug-out-1314.google.com [66.249.92.169])
	by mx1.freebsd.org (Postfix) with ESMTP id 7296713C4F7
	for <freebsd-fs@freebsd.org>; Wed, 19 Sep 2007 18:02:02 +0000 (UTC)
	(envelope-from qpadla@gmail.com)
Received: by ug-out-1314.google.com with SMTP id a2so345482ugf
	for <freebsd-fs@freebsd.org>; Wed, 19 Sep 2007 11:02:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta;
	h=domainkey-signature:received:received:from:reply-to:to:subject:date:user-agent:cc:references:in-reply-to:mime-version:content-type:content-transfer-encoding:message-id;
	bh=Ih9eOwvYLzYNXFENTz4QnYvG65n2bwuJ3r/o9Qaks9c=;
	b=XT3T/kuX5Dzh52jqyxo1JF/NiWZPwMKDZz0xtiK7WriMY58FCUtYZ06106WiacPtQAc3d3z6VRVMejUvLF1EaMVklD2Kc9IYX9qnkoxxN0oMTYi5ghOMB6kFhVUGjfSKt7b5dxbpoNb8dtgSD3M/rYUulOEhFPfity+Xt+bb+hI=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta;
	h=received:from:reply-to:to:subject:date:user-agent:cc:references:in-reply-to:mime-version:content-type:content-transfer-encoding:message-id;
	b=YntcYChpzVQ8UkpTPOkN2d8IKHnKPBK3B3ni3YO8zKFBwX8u+v8UI5GcxXKH+OLqxkR3Uhf1s2jYbY5Ixh7+jwCaTlbd+aG3i0keJPzYp9J9wsP2FXMGXIaR9SN9+n6h1WXXhO3xs0kN8kA9dQMxlskUn7NLU+JrS0inTRVMiqs=
Received: by 10.78.136.9 with SMTP id j9mr601144hud.1190224921368;
	Wed, 19 Sep 2007 11:02:01 -0700 (PDT)
Received: from orion ( [77.109.33.29])
	by mx.google.com with ESMTPS id 39sm1091806hug.2007.09.19.11.01.57
	(version=TLSv1/SSLv3 cipher=OTHER);
	Wed, 19 Sep 2007 11:02:00 -0700 (PDT)
From: Nikolay Pavlov <qpadla@gmail.com>
To: freebsd-fs@freebsd.org
Date: Wed, 19 Sep 2007 21:01:47 +0300
User-Agent: KMail/1.9.7
References: <1189978356.30388.11.camel@afrodita> <fcpco6$3sv$1@sea.gmane.org>
	<46F16327.3010604@quip.cz>
In-Reply-To: <46F16327.3010604@quip.cz>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="nextPart2108894.xJiG5SVbVi";
	protocol="application/pgp-signature"; micalg=pgp-sha1
Content-Transfer-Encoding: 7bit
Message-Id: <200709192101.51636.qpadla@gmail.com>
Cc: Ivan Voras <ivoras@freebsd.org>
Subject: Re: slow transfers on webshare service
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: qpadla@gmail.com
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Sep 2007 18:02:04 -0000

--nextPart2108894.xJiG5SVbVi
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

On Wednesday 19 September 2007 20:57:59 Miroslav Lachman wrote:
> Ivan Voras wrote:
> > Miroslav Lachman wrote:
> >>I am running Lighttpd for about 2 years on download server. With 1
> >>worker, lighttpd seems limited to 110Mbps. After change to 4 workers
> >>throughput increase to about 190Mbps serving 250-400 clients. (daily
> >>traffic is ~750GB)
> >
> > Ok. Can you run iostat during peak traffic and report its output? So
> > we rule out disk drives.
>
> I have external disk array (da1) which is saturated at this time:
>
> # iostat -w 5
>        tty             da0              da1            pass0           =20
> cpu tin tout  KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s  us ni sy
> in id 0    1 12.91   3  0.04  59.30  95  5.47   0.00   0  0.00   4  0 24
> 21 51 0   92  2.00   1  0.00  59.28 251 14.51   0.00   0  0.00   1  0  5
>  5 89 0   31 10.67   1  0.01  59.03 226 13.03   0.00   0  0.00   0  0  5
>  5 90 0   31 17.74  17  0.29  58.80 231 13.29   0.00   0  0.00   0  0  4
>  4 92 0   31  0.00   0  0.00  58.39 222 12.67   0.00   0  0.00   1  0  5
>  4 90
>
> from systat -vmstat
>
> Disks   da0   da1 pass0 pass1 pass2
> KB/t   0.00 59.53  0.00  0.00  0.00
> tps       0   226     0     0     0
> MB/s   0.00 13.14  0.00  0.00  0.00
> % busy    0    97     0     0     0
>
> But without 4 workers a can't saturate array and traffic graph (by MRTG)
> will be limited on fixed bandwidth.
> So setting Lighttpd to use 4 workers definitely help in my case. (I
> don't know if it is same for Miroslav Valenta)
>
> Miroslav Lachman
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

Is there any reason to do not use sendfile as it is in your lighttpd=20
config?

=2D-=20
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20
=2D Best regards, Nikolay Pavlov. <<<-----------------------------------   =
=20
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20


--nextPart2108894.xJiG5SVbVi
Content-Type: application/pgp-signature; name=signature.asc 
Content-Description: This is a digitally signed message part.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQBG8WQP/2R6KvEYGaIRAl8mAKDPGv6THoq2knQ58D/a0qz8yiB36gCggc6H
EXSdA1RuhVXeDvBwsQJf+6Q=
=5mYe
-----END PGP SIGNATURE-----

--nextPart2108894.xJiG5SVbVi--

From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 19 22:13:18 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B4BC616A420
	for <freebsd-fs@freebsd.org>; Wed, 19 Sep 2007 22:13:18 +0000 (UTC)
	(envelope-from freebsd-fs@adam.gs)
Received: from mail.adam.gs (mail.adam.gs [76.9.2.116])
	by mx1.freebsd.org (Postfix) with ESMTP id 8021F13C481
	for <freebsd-fs@freebsd.org>; Wed, 19 Sep 2007 22:13:18 +0000 (UTC)
	(envelope-from freebsd-fs@adam.gs)
Received: from [127.0.0.1] (localhost.adam.gs [127.0.0.1])
	by mail.adam.gs (Postfix) with ESMTP id A4B64F3555A
	for <freebsd-fs@freebsd.org>; Wed, 19 Sep 2007 18:13:17 -0400 (EDT)
DomainKey-Signature: a=rsa-sha1; q=dns; c=simple; s=mail; d=adam.gs;
	b=Iu40a2222tG6i1E2r8rF7JFbGuMROpFojC+ef03eAmXaSzHZLlDgjLl19pJ/OUe108T3Va6r/hKEXn2gav8lAdsa1cmmPIRnUr5j+wk4OXaaUirB1W4BPlSovxE1SyR7aH9AP7abt81yYjYqRWIvkaYIlipgaWszFi+H4cNHLiI=;
Received: from [66.230.128.46] (unknown [66.230.128.46]) (Authenticated
	sender: adam@adam.gs) by mail.adam.gs (Postfix) with ESMTP id
	4DECDF35417; Wed, 19 Sep 2007 18:13:17 -0400 (EDT)
In-Reply-To: <20070919082551.GS55051@obelix.dsto.defence.gov.au>
References: <E6CAB30C-E22C-4C46-8F43-B69A91E71FFF@adam.gs>
	<20070919082551.GS55051@obelix.dsto.defence.gov.au>
Mime-Version: 1.0 (Apple Message framework v752.3)
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
Message-Id: <B4B9D23B-D249-4E3D-8F6D-EC9BEB47C322@adam.gs>
Content-Transfer-Encoding: 7bit
From: Adam Jacob Muller <freebsd-fs@adam.gs>
Date: Wed, 19 Sep 2007 18:13:15 -0400
To: "Wilkinson, Alex" <alex.wilkinson@dsto.defence.gov.au>
X-Mailer: Apple Mail (2.752.3)
Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org
Subject: Re: ZFS pool not working on boot
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Sep 2007 22:13:18 -0000

			
On Sep 19, 2007, at 4:25 AM, Wilkinson, Alex wrote:

>     0n Wed, Sep 19, 2007 at 03:24:25AM -0400, Adam Jacob Muller wrote:
>
>> I have a server with two ZFS pools, one is an internal raid0 using  
>> 2 drives
>> connected via ahc. The other is an external storage array with 11  
>> drives
>> also using ahc, using raidz. (This is a dell 1650 and pv220s).
>> On reboot, the pools do not come online on their own. Both pools
>> consistently show as failed.
>
> Make sure your hostid doesn't change. If it does. Then ZFS will  
> fail upon bootstrap.
>
>  -aW
>
No, The hostid is not changing, just rebooted and replicated the  
problem. Also it seems like from reading ZFS docs that the symptoms  
would be that the pool would simply need to be imported again if the  
host id changed?
after another reboot, I see this:
# zpool status
   pool: tank
  state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
         replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
    see: http://www.sun.com/msg/ZFS-8000-D3
  scrub: none requested
config:

         NAME        STATE     READ WRITE CKSUM
         tank        UNAVAIL      0     0     0  insufficient replicas
           da1       ONLINE       0     0     0
           da2       UNAVAIL      0     0     0  cannot open

... more output showing the other array with 11 drives is fine

# zpool export tank
# zpool import tank

# zpool status
   pool: tank
  state: ONLINE
  scrub: none requested
config:

         NAME        STATE     READ WRITE CKSUM
         tank        ONLINE       0     0     0
           da1       ONLINE       0     0     0
           da2       ONLINE       0     0     0

errors: No known data errors

(11-drive raidz is fine still of course)


From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 19 22:56:10 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EF41816A420
	for <freebsd-fs@freebsd.org>; Wed, 19 Sep 2007 22:56:10 +0000 (UTC)
	(envelope-from johan@stromnet.se)
Received: from av7-1-sn3.vrr.skanova.net (av7-1-sn3.vrr.skanova.net
	[81.228.9.181])
	by mx1.freebsd.org (Postfix) with ESMTP id 7137713C45A
	for <freebsd-fs@freebsd.org>; Wed, 19 Sep 2007 22:56:09 +0000 (UTC)
	(envelope-from johan@stromnet.se)
Received: by av7-1-sn3.vrr.skanova.net (Postfix, from userid 502)
	id BE82A38054; Thu, 20 Sep 2007 00:31:00 +0200 (CEST)
Received: from smtp3-2-sn3.vrr.skanova.net (smtp3-2-sn3.vrr.skanova.net
	[81.228.9.102])
	by av7-1-sn3.vrr.skanova.net (Postfix) with ESMTP id AA09E37F09
	for <freebsd-fs@freebsd.org>; Thu, 20 Sep 2007 00:31:00 +0200 (CEST)
Received: from phomca.stromnet.se (90-224-172-102-no129.tbcn.telia.com
	[90.224.172.102])
	by smtp3-2-sn3.vrr.skanova.net (Postfix) with ESMTP id 3B4CF37E44
	for <freebsd-fs@freebsd.org>; Thu, 20 Sep 2007 00:32:03 +0200 (CEST)
Received: from localhost (localhost [127.0.0.1])
	by phomca.stromnet.se (Postfix) with ESMTP id DA429BAF1
	for <freebsd-fs@freebsd.org>; Thu, 20 Sep 2007 00:32:03 +0200 (CEST)
X-Virus-Scanned: amavisd-new at stromnet.se
Received: from phomca.stromnet.se ([127.0.0.1])
	by localhost (phomca.stromnet.se [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id DuPDfuizN5G5 for <freebsd-fs@freebsd.org>;
	Thu, 20 Sep 2007 00:32:00 +0200 (CEST)
Received: from [IPv6:2001:16d8:ff20:1:217:f2ff:fef0:d6b7] (unknown
	[IPv6:2001:16d8:ff20:1:217:f2ff:fef0:d6b7])
	by phomca.stromnet.se (Postfix) with ESMTP id AEF9ABAEF
	for <freebsd-fs@freebsd.org>; Thu, 20 Sep 2007 00:32:00 +0200 (CEST)
Mime-Version: 1.0 (Apple Message framework v752.3)
Content-Transfer-Encoding: quoted-printable
Message-Id: <BE54DA53-0353-4EB3-B232-6A7193522582@stromnet.se>
Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed
To: freebsd-fs@freebsd.org
From: =?ISO-8859-1?Q?Johan_Str=F6m?= <johan@stromnet.se>
Date: Thu, 20 Sep 2007 00:31:56 +0200
X-Mailer: Apple Mail (2.752.3)
Subject: ZFS (and quota)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Sep 2007 22:56:11 -0000

Hello

I just installed FreeBSD-current on a box (actually upgraded 6.2 to -=20
current) to experiment a bit.
I was playing around with ZFS a bit and tried out the quota features. =20=

While doing this I noticed that it doesnt seem like you get a "disk =20
full" notice the same way as you do on a "normal" (UFS) filesystem. =20
Instead of aborting the operation with "No space left on device" it =20
just continued:

[root@devbox ~]# zpool create tank /dev/ad2
[root@devbox ~]# zpool list
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
tank                   37.2G    111K   37.2G     0%  ONLINE     -

[root@devbox /tank]# zfs create -V 10M tank/set3vol
[root@devbox /tank]# newfs /dev/zvol/tank/set3vol
/dev/zvol/tank/set3vol: 10.0MB (20480 sectors) block size 16384, =20
fragment size 2048
         using 4 cylinder groups of 2.52MB, 161 blks, 384 inodes.
super-block backups (for fsck -b #) at:
160, 5312, 10464, 15616
[root@devbox /tank]# mount /dev/zvol/tank/set3vol set3vol/
[root@devbox /tank]# cd set3vol/
[root@devbox /tank/set3vol]# dd if=3D/dev/urandom of=3Dtest

/tank/set3vol: write failed, filesystem is full
dd: test: No space left on device
19169+0 records in
19168+0 records out
9814016 bytes transferred in 2.276896 secs (4310261 bytes/sec)
[root@devbox /tank]# zfs create tank/set2
[root@devbox /tank/set2]# zfs set quota=3D10M tank/set2
[root@devbox /tank/set2]# zfs get quota tank/set2
NAME       PROPERTY  VALUE      SOURCE
tank/set2  quota     10M        local
[root@devbox /tank/set2]# dd if=3D/dev/urandom of=3Dtest
^C
18563+0 records in
18562+0 records out
9503744 bytes transferred in 199.564353 secs (47622 bytes/sec)
[root@devbox /tank/set2]# zfs list tank/set2
NAME        USED  AVAIL  REFER  MOUNTPOINT
tank/set2  9.15M   870K  9.15M  /tank/set2

No hard stop there, it just tries to write more and more and more.. =20
Well the quota is enforced fine but shouldnt there be some more hard =20
error? I'm not sure how regular UFS quotas work though since I never =20
used them, but this seems like strange behaviour.

Anyway, how "stable" is the ZFS support and -current / Fbsd7 in =20
general now? I'm about to get a new server, 8 core xeon thingy with =20
lots of disk, so I would probably benifit very much from running =20
freebsd-7 (much better multicore performance if i've understood =20
correct). Beeing able to use ZFS for some of my jails would rock too, =20=

having individual quotas and all the other flexibilitys ZFS provides =20
(ie creating a new set for every jail and enforce individual quota).. =20=

Would anyone dare to do this on a production machine yet? Is anyone =20
doing it?

Well, it can't be said to many times, keep up the good work! Thanks =20
all fbsd developers (and others too!) :)

--
Johan Str=F6m
Stromnet
johan@stromnet.se
http://www.stromnet.se/


From owner-freebsd-fs@FreeBSD.ORG  Wed Sep 19 23:01:43 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D688316A417;
	Wed, 19 Sep 2007 23:01:43 +0000 (UTC)
	(envelope-from acd@acd.homelinux.org)
Received: from server0.acapsecurity.com (acapsecurity.com [207.38.28.204])
	by mx1.freebsd.org (Postfix) with ESMTP id B2E5813C459;
	Wed, 19 Sep 2007 23:01:43 +0000 (UTC)
	(envelope-from acd@acd.homelinux.org)
Received: from acd.homelinux.org (unknown [10.0.15.5])
	by server0.acapsecurity.com (Postfix) with ESMTP id 63B4822B6E;
	Wed, 19 Sep 2007 15:44:50 -0700 (PDT)
Received: by acd.homelinux.org (Postfix, from userid 500)
	id C86593051; Wed, 19 Sep 2007 16:44:49 -0600 (MDT)
From: Axel <bsd@acd.homelinux.org>
To: Adam Jacob Muller <freebsd-current@adam.gs>
References: <E6CAB30C-E22C-4C46-8F43-B69A91E71FFF@adam.gs>
Date: Wed, 19 Sep 2007 16:44:49 -0600
In-Reply-To: <E6CAB30C-E22C-4C46-8F43-B69A91E71FFF@adam.gs> (Adam Jacob
	Muller's message of "Wed\, 19 Sep 2007 03\:24\:25 -0400")
Message-ID: <m3vea6gtr2.fsf@vecta0.vectavision.com>
User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org
Subject: Re: ZFS pool not working on boot
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 19 Sep 2007 23:01:44 -0000

Adam Jacob Muller <freebsd-current@adam.gs> writes:

> Hello,
> I have a server with two ZFS pools, one is an internal raid0 using 2
> drives connected via ahc. The other is an external storage array with
> 11 drives also using ahc, using raidz. (This is a dell 1650 and
> pv220s).
> On reboot, the pools do not come online on their own. Both pools
> consistently show as failed.
>
> the exact symptoms vary, however I have seen that many drives are
> marked as variously "corrupt" or "unavailable" most zpool operations
> fail with "pool is unavailable" errors.
>
> Here is the interesting part.
> Consistently, 100% of the time, a zpool export followed by a zpool
> import restores the arrays to an ONLINE status. Once the array is
> online, it's quite stable (I'm loving ZFS btw, thank you to everyone
> for the hard work on this, ZFS is fantastic) and works great.
>
> Anyone have any ideas why this might occur and what/if the solution is?
>
> Any additional information can be provided on-request, I am running
> current from approximately 1 week ago.
>
> -Adam
>

There is a file called /boot/zfs/zpool.cache that is kept in sync
and loaded at boot time.

If that's not there , e.g. by your /boot pointing to it , you're hosed.


-- 
Axel

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 20 00:06:26 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9047616A420
	for <freebsd-fs@freebsd.org>; Thu, 20 Sep 2007 00:06:26 +0000 (UTC)
	(envelope-from freebsd-fs@adam.gs)
Received: from mail.adam.gs (mail.adam.gs [76.9.2.116])
	by mx1.freebsd.org (Postfix) with ESMTP id 0F49513C468
	for <freebsd-fs@freebsd.org>; Thu, 20 Sep 2007 00:05:06 +0000 (UTC)
	(envelope-from freebsd-fs@adam.gs)
Received: from [127.0.0.1] (localhost.adam.gs [127.0.0.1])
	by mail.adam.gs (Postfix) with ESMTP id A89BDF35562
	for <freebsd-fs@freebsd.org>; Wed, 19 Sep 2007 20:05:05 -0400 (EDT)
DomainKey-Signature: a=rsa-sha1; q=dns; c=simple; s=mail; d=adam.gs;
	b=IrX7p75RCxrTxR8EtDTFM08tCDuP45V7BCb1VAY1VtlfbGCAIBpI18boSNOUomlSHJOJ+kAvPNeyDQ/9juDLT1yDZCCD2HM1Chz5QfOu/An4MtdfN+R3Yyf8YB7YVJfbdYQedMSKCTWdceNZAifxBMotjubyV97x4fgpOADCqVw=;
Received: from [66.230.128.46] (unknown [66.230.128.46]) (Authenticated
	sender: adam@adam.gs) by mail.adam.gs (Postfix) with ESMTP id
	49949F35385; Wed, 19 Sep 2007 20:05:05 -0400 (EDT)
In-Reply-To: <m3vea6gtr2.fsf@vecta0.vectavision.com>
References: <E6CAB30C-E22C-4C46-8F43-B69A91E71FFF@adam.gs>
	<m3vea6gtr2.fsf@vecta0.vectavision.com>
Mime-Version: 1.0 (Apple Message framework v752.3)
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
Message-Id: <EC63F389-A232-4D1C-8542-DE8F954E0D97@adam.gs>
Content-Transfer-Encoding: 7bit
From: Adam Jacob Muller <freebsd-fs@adam.gs>
Date: Wed, 19 Sep 2007 20:05:03 -0400
To: Axel <bsd@acd.homelinux.org>
X-Mailer: Apple Mail (2.752.3)
Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org
Subject: Re: ZFS pool not working on boot
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Sep 2007 00:06:26 -0000


On Sep 19, 2007, at 6:44 PM, Axel wrote:

> Adam Jacob Muller <freebsd-current@adam.gs> writes:
>
>> Hello,
>> I have a server with two ZFS pools, one is an internal raid0 using 2
>> drives connected via ahc. The other is an external storage array with
>> 11 drives also using ahc, using raidz. (This is a dell 1650 and
>> pv220s).
>> On reboot, the pools do not come online on their own. Both pools
>> consistently show as failed.
>>
>> the exact symptoms vary, however I have seen that many drives are
>> marked as variously "corrupt" or "unavailable" most zpool operations
>> fail with "pool is unavailable" errors.
>>
>> Here is the interesting part.
>> Consistently, 100% of the time, a zpool export followed by a zpool
>> import restores the arrays to an ONLINE status. Once the array is
>> online, it's quite stable (I'm loving ZFS btw, thank you to everyone
>> for the hard work on this, ZFS is fantastic) and works great.
>>
>> Anyone have any ideas why this might occur and what/if the  
>> solution is?
>>
>> Any additional information can be provided on-request, I am running
>> current from approximately 1 week ago.
>>
>> -Adam
>>
>
> There is a file called /boot/zfs/zpool.cache that is kept in sync
> and loaded at boot time.
>
> If that's not there , e.g. by your /boot pointing to it , you're  
> hosed.
>

File is there, of note is that some of the prior reboots had been  
"unintentional" reboots, so it is possible that that file was  
corrupt, however, it does not seem correct for zfs to come up in a  
state that shows drives as corrupted and/or unavailable. I believe I  
have corrected the crashing issue, however it still does not seem  
that this is the correct behavior.

- Adam


From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 20 02:39:13 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BB61F16A421;
	Thu, 20 Sep 2007 02:39:13 +0000 (UTC)
	(envelope-from acd@acd.homelinux.org)
Received: from server0.acapsecurity.com (acapsecurity.com [207.38.28.204])
	by mx1.freebsd.org (Postfix) with ESMTP id 96BA513C45E;
	Thu, 20 Sep 2007 02:39:13 +0000 (UTC)
	(envelope-from acd@acd.homelinux.org)
Received: from acd.homelinux.org (unknown [10.0.15.5])
	by server0.acapsecurity.com (Postfix) with ESMTP id 310FB22B6E;
	Wed, 19 Sep 2007 19:39:12 -0700 (PDT)
Received: by acd.homelinux.org (Postfix, from userid 500)
	id 551D43086; Wed, 19 Sep 2007 20:39:12 -0600 (MDT)
From: Axel <acd@bsd.homelinux.org>
To: Adam Jacob Muller <freebsd-current@adam.gs>
References: <E6CAB30C-E22C-4C46-8F43-B69A91E71FFF@adam.gs>
	<m3vea6gtr2.fsf@vecta0.vectavision.com>
	<EC63F389-A232-4D1C-8542-DE8F954E0D97@adam.gs>
Date: Wed, 19 Sep 2007 20:39:11 -0600
In-Reply-To: <EC63F389-A232-4D1C-8542-DE8F954E0D97@adam.gs> (Adam Jacob
	Muller's message of "Wed\, 19 Sep 2007 20\:05\:03 -0400")
Message-ID: <m38x723vsg.fsf@vecta0.vectavision.com>
User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org
Subject: Re: ZFS pool not working on boot
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Sep 2007 02:39:13 -0000

Adam Jacob Muller <freebsd-current@adam.gs> writes:

> On Sep 19, 2007, at 6:44 PM, Axel wrote:
>
>> There is a file called /boot/zfs/zpool.cache that is kept in sync
>> and loaded at boot time.
>>
>> If that's not there , e.g. by your /boot pointing to it , you're
>> hosed.
>>
>
> File is there, of note is that some of the prior reboots had been
> "unintentional" reboots, so it is possible that that file was
> corrupt, however, it does not seem correct for zfs to come up in a
> state that shows drives as corrupted and/or unavailable. I believe I
> have corrected the crashing issue, however it still does not seem
> that this is the correct behavior.
>
> - Adam
>

If you have a working root outside of zfs I'd do the following:

1) Rename the zpool.cache to something else to be safe
2) Reboot, make sure that /boot/zfs points to the right location,
   and reimport the pools.
3) Should be fine from there on.

I had sort of the same issue, the zpool.cache isn't documented
too well yet; I only stumbled over it by doing a "lsmod" at
the loader prompt;it's one reason root can be on zfs before hostid is
set. If you setup zfs and don't have the future /boot/zfs set right
it won't work because the information gets lost.
With / on zfs it's crucial to have /boot point to the actual
UFS boot partition and not be in your zfs / somewhere, cause
that gets ignored until it's mounted.

It's a good idea to keep the actual old UFS / directory around 
although only /boot gets used in there if you mount / from zfs.

http://wiki.freebsd.org/ZFS comes in handy.

And yes, I do love zfs too :-)

-- 
Axel

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 20 21:15:33 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1758416A47B
	for <freebsd-fs@freebsd.org>; Thu, 20 Sep 2007 21:15:33 +0000 (UTC)
	(envelope-from des@des.no)
Received: from tim.des.no (tim.des.no [194.63.250.121])
	by mx1.freebsd.org (Postfix) with ESMTP id A1BB513C467
	for <freebsd-fs@freebsd.org>; Thu, 20 Sep 2007 21:15:11 +0000 (UTC)
	(envelope-from des@des.no)
Received: from tim.des.no (localhost [127.0.0.1])
	by spam.des.no (Postfix) with ESMTP id 05EE520A4;
	Thu, 20 Sep 2007 11:26:16 +0200 (CEST)
X-Spam-Tests: AWL
X-Spam-Learn: disabled
X-Spam-Score: 0.0/3.0
X-Spam-Checker-Version: SpamAssassin 3.2.1 (2007-05-02) on tim.des.no
Received: from ds4.des.no (des.no [80.203.243.180])
	by smtp.des.no (Postfix) with ESMTP id DFC1520A0;
	Thu, 20 Sep 2007 11:26:15 +0200 (CEST)
Received: by ds4.des.no (Postfix, from userid 1001)
	id CAFCC84480; Thu, 20 Sep 2007 11:26:15 +0200 (CEST)
From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To: Johan =?utf-8?Q?Str=C3=B6m?= <johan@stromnet.se>
References: <BE54DA53-0353-4EB3-B232-6A7193522582@stromnet.se>
Date: Thu, 20 Sep 2007 11:26:15 +0200
In-Reply-To: <BE54DA53-0353-4EB3-B232-6A7193522582@stromnet.se> ("Johan
	=?utf-8?Q?Str=C3=B6m=22's?= message of "Thu\, 20 Sep 2007 00\:31\:56
	+0200")
Message-ID: <86y7f1ofgo.fsf@ds4.des.no>
User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS (and quota)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Sep 2007 21:15:33 -0000

Johan Str=C3=B6m <johan@stromnet.se> writes:
> I was playing around with ZFS a bit and tried out the quota
> features. While doing this I noticed that it doesnt seem like you get
> a "disk  full" notice the same way as you do on a "normal" (UFS)
> filesystem.  Instead of aborting the operation with "No space left on
> device" it  just continued:
> [...]
> [root@devbox /tank]# zfs create tank/set2
> [root@devbox /tank/set2]# zfs set quota=3D10M tank/set2
> [root@devbox /tank/set2]# zfs get quota tank/set2
> NAME       PROPERTY  VALUE      SOURCE
> tank/set2  quota     10M        local
> [root@devbox /tank/set2]# dd if=3D/dev/urandom of=3Dtest
> ^C
> 18563+0 records in
> 18562+0 records out
> 9503744 bytes transferred in 199.564353 secs (47622 bytes/sec)
> [root@devbox /tank/set2]# zfs list tank/set2
> NAME        USED  AVAIL  REFER  MOUNTPOINT
> tank/set2  9.15M   870K  9.15M  /tank/set2

See what it says under AVAIL?  You killed it before it filled the disk.

des@ds4 ~% sudo zfs create raid/q
des@ds4 ~% sudo zfs set quota=3D1m raid/q
des@ds4 ~% sudo dd if=3D/dev/zero of=3D/raid/q/test bs=3D65536
dd: /raid/q/test: Disc quota exceeded
16+0 records in
15+0 records out
983040 bytes transferred in 2.533990 secs (387942 bytes/sec)
des@ds4 ~% zfs list raid/q
NAME     USED  AVAIL  REFER  MOUNTPOINT
raid/q  1.03M      0  1.03M  /raid/q

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 20 21:45:28 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 31B8716A46C;
	Thu, 20 Sep 2007 21:45:28 +0000 (UTC) (envelope-from marck@rinet.ru)
Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68])
	by mx1.freebsd.org (Postfix) with ESMTP id AD0FD13C50A;
	Thu, 20 Sep 2007 21:45:27 +0000 (UTC) (envelope-from marck@rinet.ru)
Received: from localhost (localhost [127.0.0.1])
	by woozle.rinet.ru (8.14.1/8.14.1) with ESMTP id l8KLjAIq035406;
	Fri, 21 Sep 2007 01:45:10 +0400 (MSD) (envelope-from marck@rinet.ru)
Date: Fri, 21 Sep 2007 01:45:10 +0400 (MSD)
From: Dmitry Morozovsky <marck@rinet.ru>
To: Dan Nelson <dnelson@allantgroup.com>
In-Reply-To: <20070920150840.GG7562@dan.emsphone.com>
Message-ID: <20070921014243.C33213@woozle.rinet.ru>
References: <E6CAB30C-E22C-4C46-8F43-B69A91E71FFF@adam.gs>
	<m3vea6gtr2.fsf@vecta0.vectavision.com>
	<EC63F389-A232-4D1C-8542-DE8F954E0D97@adam.gs>
	<m38x723vsg.fsf@vecta0.vectavision.com>
	<20070920150840.GG7562@dan.emsphone.com>
X-NCC-RegID: ru.rinet
X-OpenPGP-Key-ID: 6B691B03
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0
	(woozle.rinet.ru [0.0.0.0]); Fri, 21 Sep 2007 01:45:10 +0400 (MSD)
Cc: freebsd-fs@freebsd.org, Adam Jacob Muller <freebsd-current@adam.gs>,
	freebsd-current@freebsd.org, Axel <acd@bsd.homelinux.org>
Subject: Re: ZFS pool not working on boot
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Sep 2007 21:45:28 -0000

On Thu, 20 Sep 2007, Dan Nelson wrote:

DN> > It's a good idea to keep the actual old UFS / directory around 
DN> > although only /boot gets used in there if you mount / from zfs.
DN> 
DN> What I do is populate my UFS /.boot filesystem with /etc, /lib, /libexec,
DN> /bin, and /sbin from my root filesystem, so if zfs fails to load it's
DN> easy to recover.

With small patch to rescue (including zpool and zfs together with libraries 
involved) all that required is copying /rescue and symlinking /bin and /sbin to 
it.

Well, you also have to mkdir dev and possibly have ./etc/{,s}pwd.db to make tar 
happy...

Sincerely,
D.Marck                                     [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer:				     marck@FreeBSD.org ]
------------------------------------------------------------------------
*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru ***
------------------------------------------------------------------------

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 20 21:46:01 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 90EC316A41B
	for <freebsd-fs@freebsd.org>; Thu, 20 Sep 2007 21:46:01 +0000 (UTC)
	(envelope-from dan@dan.emsphone.com)
Received: from dan.emsphone.com (dan.emsphone.com [199.67.51.101])
	by mx1.freebsd.org (Postfix) with ESMTP id 4B17513C4A5
	for <freebsd-fs@freebsd.org>; Thu, 20 Sep 2007 21:46:01 +0000 (UTC)
	(envelope-from dan@dan.emsphone.com)
Received: (from dan@localhost)
	by dan.emsphone.com (8.14.1/8.14.1) id l8KF8eX6034001;
	Thu, 20 Sep 2007 10:08:40 -0500 (CDT) (envelope-from dan)
Date: Thu, 20 Sep 2007 10:08:40 -0500
From: Dan Nelson <dnelson@allantgroup.com>
To: Axel <acd@bsd.homelinux.org>
Message-ID: <20070920150840.GG7562@dan.emsphone.com>
References: <E6CAB30C-E22C-4C46-8F43-B69A91E71FFF@adam.gs>
	<m3vea6gtr2.fsf@vecta0.vectavision.com>
	<EC63F389-A232-4D1C-8542-DE8F954E0D97@adam.gs>
	<m38x723vsg.fsf@vecta0.vectavision.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <m38x723vsg.fsf@vecta0.vectavision.com>
X-OS: FreeBSD 7.0-CURRENT
User-Agent: Mutt/1.5.16 (2007-06-09)
Cc: freebsd-fs@freebsd.org, Adam Jacob Muller <freebsd-current@adam.gs>,
	freebsd-current@freebsd.org
Subject: Re: ZFS pool not working on boot
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Sep 2007 21:46:01 -0000

In the last episode (Sep 19), Axel said:
> Adam Jacob Muller <freebsd-current@adam.gs> writes:
> > On Sep 19, 2007, at 6:44 PM, Axel wrote:
> >> There is a file called /boot/zfs/zpool.cache that is kept in sync
> >> and loaded at boot time.
> >>
> >> If that's not there , e.g. by your /boot pointing to it , you're
> >> hosed.
> >
> > File is there, of note is that some of the prior reboots had been
> > "unintentional" reboots, so it is possible that that file was
> > corrupt, however, it does not seem correct for zfs to come up in a
> > state that shows drives as corrupted and/or unavailable. I believe I
> > have corrected the crashing issue, however it still does not seem
> > that this is the correct behavior.
> 
> If you have a working root outside of zfs I'd do the following:
> 
> 1) Rename the zpool.cache to something else to be safe
> 2) Reboot, make sure that /boot/zfs points to the right location,
>    and reimport the pools.
> 3) Should be fine from there on.
> 
> I had sort of the same issue, the zpool.cache isn't documented too
> well yet; I only stumbled over it by doing a "lsmod" at the loader
> prompt;it's one reason root can be on zfs before hostid is set. If
> you setup zfs and don't have the future /boot/zfs set right it won't
> work because the information gets lost. With / on zfs it's crucial to
> have /boot point to the actual UFS boot partition and not be in your
> zfs / somewhere, cause that gets ignored until it's mounted.
> 
> It's a good idea to keep the actual old UFS / directory around 
> although only /boot gets used in there if you mount / from zfs.

What I do is populate my UFS /.boot filesystem with /etc, /lib, /libexec,
/bin, and /sbin from my root filesystem, so if zfs fails to load it's
easy to recover.

-- 
	Dan Nelson
	dnelson@allantgroup.com

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 20 21:50:11 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5F5B516A473
	for <freebsd-fs@freebsd.org>; Thu, 20 Sep 2007 21:50:11 +0000 (UTC)
	(envelope-from des@des.no)
Received: from tim.des.no (tim.des.no [194.63.250.121])
	by mx1.freebsd.org (Postfix) with ESMTP id 06C5513C48E
	for <freebsd-fs@freebsd.org>; Thu, 20 Sep 2007 21:50:11 +0000 (UTC)
	(envelope-from des@des.no)
Received: from tim.des.no (localhost [127.0.0.1])
	by spam.des.no (Postfix) with ESMTP id 1ABC020A6;
	Thu, 20 Sep 2007 14:54:55 +0200 (CEST)
X-Spam-Tests: AWL
X-Spam-Learn: disabled
X-Spam-Score: 0.0/3.0
X-Spam-Checker-Version: SpamAssassin 3.2.1 (2007-05-02) on tim.des.no
Received: from ds4.des.no (des.no [80.203.243.180])
	by smtp.des.no (Postfix) with ESMTP id 8E99F20A5;
	Thu, 20 Sep 2007 14:54:54 +0200 (CEST)
Received: by ds4.des.no (Postfix, from userid 1001)
	id 7CBAD84480; Thu, 20 Sep 2007 14:54:54 +0200 (CEST)
From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To: Johan =?utf-8?Q?Str=C3=B6m?= <johan@stromnet.se>
References: <BE54DA53-0353-4EB3-B232-6A7193522582@stromnet.se>
	<86y7f1ofgo.fsf@ds4.des.no>
	<93AE0860-0FF6-47C7-ACFC-D882D13EC7DB@stromnet.se>
Date: Thu, 20 Sep 2007 14:54:54 +0200
In-Reply-To: <93AE0860-0FF6-47C7-ACFC-D882D13EC7DB@stromnet.se> ("Johan
	=?utf-8?Q?Str=C3=B6m=22's?= message of "Thu\, 20 Sep 2007 14\:26\:25
	+0200")
Message-ID: <86ps0do5sx.fsf@ds4.des.no>
User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS (and quota)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Sep 2007 21:50:11 -0000

Johan Str=C3=B6m <johan@stromnet.se> writes:
> Dag-Erling Sm=C3=B8rgrav <des@des.no> writes:
> > des@ds4 ~% sudo zfs create raid/q
> > des@ds4 ~% sudo zfs set quota=3D1m raid/q
> > des@ds4 ~% sudo dd if=3D/dev/zero of=3D/raid/q/test bs=3D65536
> > dd: /raid/q/test: Disc quota exceeded
> > 16+0 records in
> > 15+0 records out
> > 983040 bytes transferred in 2.533990 secs (387942 bytes/sec)
> > des@ds4 ~% zfs list raid/q
> > NAME     USED  AVAIL  REFER  MOUNTPOINT
> > raid/q  1.03M      0  1.03M  /raid/q
> With the  bs=3D65536 parameter it works as expected, I get Disk quota
> exceeded. Without it it just keeps on running until I interrupt it

It seems that with small block sizes, it becomes increasingly slow as
the partition fills up.  You can easily see that by pressing ^T while dd
is running; you will see that it still makes progress, but very slowly.

des@ds4 ~% sudo dd if=3D/dev/zero of=3D/raid/q/test
load: 0.18  cmd: dd 20250 [zfs:(&tx->tx_quiesce_done_cv)] 0.01u 0.48s 0% 11=
92k
17245+0 records in
17244+0 records out
8828928 bytes transferred in 18.743790 secs (471032 bytes/sec)
load: 0.17  cmd: dd 20250 [zfs:(&tx->tx_quiesce_done_cv)] 0.01u 0.49s 0% 12=
12k
17273+0 records in
17272+0 records out
8843264 bytes transferred in 23.642442 secs (374042 bytes/sec)
load: 0.24  cmd: dd 20250 [zfs:(&tx->tx_quiesce_done_cv)] 0.01u 0.51s 0% 12=
12k
17406+0 records in
17405+0 records out
8911360 bytes transferred in 45.053364 secs (197796 bytes/sec)
load: 0.15  cmd: dd 20250 [zfs:(&tx->tx_quiesce_done_cv)] 0.01u 0.55s 0% 12=
12k
17601+0 records in
17600+0 records out
9011200 bytes transferred in 76.173965 secs (118298 bytes/sec)
load: 0.06  cmd: dd 20250 [zfs:(&tx->tx_quiesce_done_cv)] 0.02u 0.60s 0% 12=
12k
17906+0 records in
17905+0 records out
9167360 bytes transferred in 126.020690 secs (72745 bytes/sec)
^C18259+0 records in
18258+0 records out
9348096 bytes transferred in 185.266755 secs (50457 bytes/sec)

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 20 22:44:50 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DCD7C16A417
	for <freebsd-fs@freebsd.org>; Thu, 20 Sep 2007 22:44:50 +0000 (UTC)
	(envelope-from johan@stromnet.se)
Received: from av12-2-sn2.hy.skanova.net (av12-2-sn2.hy.skanova.net
	[81.228.8.186])
	by mx1.freebsd.org (Postfix) with ESMTP id 6BCA613C48E
	for <freebsd-fs@freebsd.org>; Thu, 20 Sep 2007 22:44:50 +0000 (UTC)
	(envelope-from johan@stromnet.se)
Received: by av12-2-sn2.hy.skanova.net (Postfix, from userid 502)
	id 33D4F381B7; Thu, 20 Sep 2007 14:26:35 +0200 (CEST)
Received: from smtp4-2-sn2.hy.skanova.net (smtp4-2-sn2.hy.skanova.net
	[81.228.8.93]) by av12-2-sn2.hy.skanova.net (Postfix) with ESMTP
	id 09338381B7; Thu, 20 Sep 2007 14:26:35 +0200 (CEST)
Received: from phomca.stromnet.se (90-224-172-102-no129.tbcn.telia.com
	[90.224.172.102])
	by smtp4-2-sn2.hy.skanova.net (Postfix) with ESMTP id AA87637E45;
	Thu, 20 Sep 2007 14:26:34 +0200 (CEST)
Received: from localhost (localhost [127.0.0.1])
	by phomca.stromnet.se (Postfix) with ESMTP id 55BB6BAF3;
	Thu, 20 Sep 2007 14:26:34 +0200 (CEST)
X-Virus-Scanned: amavisd-new at stromnet.se
Received: from phomca.stromnet.se ([127.0.0.1])
	by localhost (phomca.stromnet.se [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id K0VY8YPGjdG9; Thu, 20 Sep 2007 14:26:33 +0200 (CEST)
Received: from [IPv6:2001:16d8:ff20:1:217:f2ff:fef0:d6b7] (unknown
	[IPv6:2001:16d8:ff20:1:217:f2ff:fef0:d6b7])
	by phomca.stromnet.se (Postfix) with ESMTP id 11C63BAEF;
	Thu, 20 Sep 2007 14:26:33 +0200 (CEST)
In-Reply-To: <86y7f1ofgo.fsf@ds4.des.no>
References: <BE54DA53-0353-4EB3-B232-6A7193522582@stromnet.se>
	<86y7f1ofgo.fsf@ds4.des.no>
Mime-Version: 1.0 (Apple Message framework v752.3)
Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed
Message-Id: <93AE0860-0FF6-47C7-ACFC-D882D13EC7DB@stromnet.se>
Content-Transfer-Encoding: quoted-printable
From: =?ISO-8859-1?Q?Johan_Str=F6m?= <johan@stromnet.se>
Date: Thu, 20 Sep 2007 14:26:25 +0200
To: =?ISO-8859-1?Q?Dag-Erling_Sm=F8rgrav?= <des@des.no>
X-Mailer: Apple Mail (2.752.3)
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS (and quota)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Sep 2007 22:44:50 -0000

On Sep 20, 2007, at 11:26 , Dag-Erling Sm=F8rgrav wrote:

> Johan Str=F6m <johan@stromnet.se> writes:
>> I was playing around with ZFS a bit and tried out the quota
>> features. While doing this I noticed that it doesnt seem like you get
>> a "disk  full" notice the same way as you do on a "normal" (UFS)
>> filesystem.  Instead of aborting the operation with "No space left on
>> device" it  just continued:
>> [...]
>> [root@devbox /tank]# zfs create tank/set2
>> [root@devbox /tank/set2]# zfs set quota=3D10M tank/set2
>> [root@devbox /tank/set2]# zfs get quota tank/set2
>> NAME       PROPERTY  VALUE      SOURCE
>> tank/set2  quota     10M        local
>> [root@devbox /tank/set2]# dd if=3D/dev/urandom of=3Dtest
>> ^C
>> 18563+0 records in
>> 18562+0 records out
>> 9503744 bytes transferred in 199.564353 secs (47622 bytes/sec)
>> [root@devbox /tank/set2]# zfs list tank/set2
>> NAME        USED  AVAIL  REFER  MOUNTPOINT
>> tank/set2  9.15M   870K  9.15M  /tank/set2
>
> See what it says under AVAIL?  You killed it before it filled the =20
> disk.
>
[root@devbox /home/johan]# zfs list tank/set2
NAME        USED  AVAIL  REFER  MOUNTPOINT
tank/set2  9.15M   870K  9.15M  /tank/set2

Yes i did, but after 200 seconds one would think that 10Mbs should be =20=

filled (took 2.2s on the ufs) right? :)

> des@ds4 ~% sudo zfs create raid/q
> des@ds4 ~% sudo zfs set quota=3D1m raid/q
> des@ds4 ~% sudo dd if=3D/dev/zero of=3D/raid/q/test bs=3D65536
> dd: /raid/q/test: Disc quota exceeded
> 16+0 records in
> 15+0 records out
> 983040 bytes transferred in 2.533990 secs (387942 bytes/sec)
> des@ds4 ~% zfs list raid/q
> NAME     USED  AVAIL  REFER  MOUNTPOINT
> raid/q  1.03M      0  1.03M  /raid/q

With the  bs=3D65536 parameter it works as expected, I get Disk quota =20=

exceeded. Without it it just keeps on running until I interrupt it

>
> DES
> --=20
> Dag-Erling Sm=F8rgrav - des@des.no


From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 20 22:53:18 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 669C116A4A6
	for <freebsd-fs@freebsd.org>; Thu, 20 Sep 2007 22:53:18 +0000 (UTC)
	(envelope-from johan@stromnet.se)
Received: from av9-2-sn2.hy.skanova.net (av9-2-sn2.hy.skanova.net
	[81.228.8.180])
	by mx1.freebsd.org (Postfix) with ESMTP id A71E213C4B2
	for <freebsd-fs@freebsd.org>; Thu, 20 Sep 2007 22:53:17 +0000 (UTC)
	(envelope-from johan@stromnet.se)
Received: by av9-2-sn2.hy.skanova.net (Postfix, from userid 502)
	id BA12838295; Thu, 20 Sep 2007 14:28:58 +0200 (CEST)
Received: from smtp4-2-sn2.hy.skanova.net (smtp4-2-sn2.hy.skanova.net
	[81.228.8.93]) by av9-2-sn2.hy.skanova.net (Postfix) with ESMTP
	id A3A8C381BB; Thu, 20 Sep 2007 14:28:58 +0200 (CEST)
Received: from phomca.stromnet.se (90-224-172-102-no129.tbcn.telia.com
	[90.224.172.102])
	by smtp4-2-sn2.hy.skanova.net (Postfix) with ESMTP id 77EE637E42;
	Thu, 20 Sep 2007 14:28:58 +0200 (CEST)
Received: from localhost (localhost [127.0.0.1])
	by phomca.stromnet.se (Postfix) with ESMTP id 5CB48BAF4;
	Thu, 20 Sep 2007 14:28:58 +0200 (CEST)
X-Virus-Scanned: amavisd-new at stromnet.se
Received: from phomca.stromnet.se ([127.0.0.1])
	by localhost (phomca.stromnet.se [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id Kn4rnF5OKJwi; Thu, 20 Sep 2007 14:28:52 +0200 (CEST)
Received: from [IPv6:2001:16d8:ff20:1:217:f2ff:fef0:d6b7] (unknown
	[IPv6:2001:16d8:ff20:1:217:f2ff:fef0:d6b7])
	by phomca.stromnet.se (Postfix) with ESMTP id A5BB7BAEF;
	Thu, 20 Sep 2007 14:28:52 +0200 (CEST)
In-Reply-To: <20070920115621.GF4517@garage.freebsd.pl>
References: <BE54DA53-0353-4EB3-B232-6A7193522582@stromnet.se>
	<20070920115621.GF4517@garage.freebsd.pl>
Mime-Version: 1.0 (Apple Message framework v752.3)
Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed
Message-Id: <8B5FB4B1-2398-491C-95F4-E79361606916@stromnet.se>
Content-Transfer-Encoding: quoted-printable
From: =?ISO-8859-1?Q?Johan_Str=F6m?= <johan@stromnet.se>
Date: Thu, 20 Sep 2007 14:28:45 +0200
To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
X-Mailer: Apple Mail (2.752.3)
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS (and quota)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Sep 2007 22:53:18 -0000

On Sep 20, 2007, at 13:56 , Pawel Jakub Dawidek wrote:

> On Thu, Sep 20, 2007 at 12:31:56AM +0200, Johan Str=F6m wrote:
>> Hello
>>
>> I just installed FreeBSD-current on a box (actually upgraded 6.2 to -
>> current) to experiment a bit.
>> I was playing around with ZFS a bit and tried out the quota features.
>> While doing this I noticed that it doesnt seem like you get a "disk
>> full" notice the same way as you do on a "normal" (UFS) filesystem.
>> Instead of aborting the operation with "No space left on device" it
>> just continued:
> [...]
>> [root@devbox /tank]# zfs create tank/set2
>> [root@devbox /tank/set2]# zfs set quota=3D10M tank/set2
>> [root@devbox /tank/set2]# zfs get quota tank/set2
>> NAME       PROPERTY  VALUE      SOURCE
>> tank/set2  quota     10M        local
>> [root@devbox /tank/set2]# dd if=3D/dev/urandom of=3Dtest
>> ^C
>> 18563+0 records in
>> 18562+0 records out
>> 9503744 bytes transferred in 199.564353 secs (47622 bytes/sec)
>> [root@devbox /tank/set2]# zfs list tank/set2
>> NAME        USED  AVAIL  REFER  MOUNTPOINT
>> tank/set2  9.15M   870K  9.15M  /tank/set2
>>
>> No hard stop there, it just tries to write more and more and more..
>> Well the quota is enforced fine but shouldnt there be some more hard
>> error? I'm not sure how regular UFS quotas work though since I never
>> used them, but this seems like strange behaviour.
>
> Hmm, seems to work just fine here:
>
> 	beast:root:~# zfs create tank/foo
> 	beast:root:~# zfs set quota=3D10m tank/foo
>
> 	beast:root:~# dd if=3D/dev/random of=3D/tank/foo/test bs=3D1m
> 	dd: /tank/foo/test: Disc quota exceeded
> 	11+0 records in
> 	10+0 records out
> 	10485760 bytes transferred in 6.109407 secs (1716330 bytes/sec)
>
> 	beast:root:~# df -h /tank/foo
> 	Filesystem    Size    Used   Avail Capacity  Mounted on
> 	tank/foo       10M     10M      0B   100%    /tank/foo
>
> I think you just waited not long enough:) You didn't give block size
> argument to dd(1), so it used 512 bytes. Please be more patient, retry
> and report back, thanks!

You where correct :)

[root@devbox /tank/set2]# dd if=3D/dev/urandom of=3Dtest2
dd: test2: Disc quota exceeded
1538+0 records in
1537+0 records out
786944 bytes transferred in 202.628064 secs (3884 bytes/sec)

But the last day i ran it for at least 300 seconds wihtout having a =20
stop.. When i did it on UFS it took 2 seconds to fill up altogether, =20
with zfs it keept on going much longer? Retested:

[root@devbox /tank/set3vol]# ls -al
total 6
drwxr-xr-x  3 root  wheel     512 Sep 20 14:16 .
drwxr-xr-x  5 root  wheel       5 Sep 20 00:22 ..
drwxrwxr-x  2 root  operator  512 Sep 20 00:21 .snap
[root@devbox /tank/set3vol]# dd if=3D/dev/urandom of=3Dtest
/tank/set3vol: write failed, filesystem is full
dd: test: No space left on device
19169+0 records in
19168+0 records out
9814016 bytes transferred in 2.176188 secs (4509728 bytes/sec)
[root@devbox /tank/set3vol]# cd ../set2/
[root@devbox /tank/set2]# ls -al
total 3
drwxr-xr-x  2 root  wheel  2 Sep 20 14:16 .
drwxr-xr-x  5 root  wheel  5 Sep 20 00:22 ..
[root@devbox /tank/set2]# dd if=3D/dev/urandom of=3Dtest
dd: test: Disc quota exceeded
20226+0 records in
20225+0 records out
10355200 bytes transferred in 456.448610 secs (22686 bytes/sec)
[root@devbox /tank/set2]# df -h
Filesystem                Size    Used   Avail Capacity  Mounted on
/dev/ad0s1a               496M    174M    282M    38%    /
devfs                     1.0K    1.0K      0B   100%    /dev
/dev/ad0s1e               496M     28K    456M     0%    /tmp
/dev/ad0s1f               5.0G    2.8G    1.8G    61%    /usr
/dev/ad0s1d               1.2G    105M    1.0G     9%    /var
tank                       37G      0B     37G     0%    /tank
tank/set1                  37G      0B     37G     0%    /tank/set1
/dev/zvol/tank/set3vol    9.4M    9.4M   -728K   108%    /tank/set3vol
tank/set2                  10M     10M      0B   100%    /tank/set2
[root@devbox /tank/set2]#

On UFS 2.1 sec (altough disk full, not quota full), on ZFS 450sec.

>
> --=20
> Pawel Jakub Dawidek                       http://www.wheel.pl
> pjd@FreeBSD.org                           http://www.FreeBSD.org
> FreeBSD committer                         Am I Evil? Yes, I Am!


From owner-freebsd-fs@FreeBSD.ORG  Thu Sep 20 23:46:49 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E20CA16A417
	for <freebsd-fs@FreeBSD.org>; Thu, 20 Sep 2007 23:46:49 +0000 (UTC)
	(envelope-from etc@fluffles.net)
Received: from auriate.fluffles.net (cust.95.160.adsl.cistron.nl
	[195.64.95.160])
	by mx1.freebsd.org (Postfix) with ESMTP id 9FE2613C469
	for <freebsd-fs@FreeBSD.org>; Thu, 20 Sep 2007 23:46:49 +0000 (UTC)
	(envelope-from etc@fluffles.net)
Received: from 82-136-249-178.ip.tiscali.nl ([82.136.249.178] helo=[10.0.0.18])
	by auriate.fluffles.net with esmtpa (Exim 4.66 (FreeBSD))
	(envelope-from <etc@fluffles.net>) id 1IYJWX-000F3G-29
	for freebsd-fs@FreeBSD.org; Thu, 20 Sep 2007 12:44:45 +0200
Message-ID: <46F24F2C.40205@fluffles.net>
Date: Thu, 20 Sep 2007 12:45:00 +0200
From: Fluffles <etc@fluffles.net>
User-Agent: Thunderbird 2.0.0.6 (Windows/20070728)
MIME-Version: 1.0
To: freebsd-fs@FreeBSD.org
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
Cc: 
Subject: Writing contigiously to UFS2?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Sep 2007 23:46:50 -0000

Hello list,

I've setup a concat of 8 disks for my new NAS, using ataidle to spindown 
the disks not needed. This allows me to save power and noise/heat by 
running only the drives that are actually in use.

My problem is UFS. UFS2 seems to write to 4 disks, even though all the 
data written so far can easily fit on just one disk. What's going on 
here? I looked at newfs parameters, but in the past was unable to make 
newfs write contigiously. It seems UFS2 always writes to a new cylinder. 
Is there any way to force UFS to write contigiously? Or at least limit 
the problem?

If i write 400GB to a 4TB volume consisting of 8x 500GB disks, i want 
all data to be on the first disk. If the data spreads, then more disks 
will be 'awaken' when i read my data, which defeats the purpose of my 
power-saving NAS experiment.

Any feedback is welcome. Using FreeBSD 6.2-RELEASE i386, used newfs -U 
-S 2048 <device>.

- Veronica

From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 06:57:18 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4DA6016A494
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 06:57:18 +0000 (UTC)
	(envelope-from pjd@garage.freebsd.pl)
Received: from mail.garage.freebsd.pl (arm132.internetdsl.tpnet.pl
	[83.17.198.132])
	by mx1.freebsd.org (Postfix) with ESMTP id E5D5413C45A
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 06:57:17 +0000 (UTC)
	(envelope-from pjd@garage.freebsd.pl)
Received: by mail.garage.freebsd.pl (Postfix, from userid 65534)
	id 242E645CD9; Thu, 20 Sep 2007 13:57:49 +0200 (CEST)
Received: from localhost (pjd.wheel.pl [10.0.1.1])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.garage.freebsd.pl (Postfix) with ESMTP id 5627C45683;
	Thu, 20 Sep 2007 13:57:45 +0200 (CEST)
Date: Thu, 20 Sep 2007 13:56:21 +0200
From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
To: Johan =?iso-8859-1?Q?Str=F6m?= <johan@stromnet.se>
Message-ID: <20070920115621.GF4517@garage.freebsd.pl>
References: <BE54DA53-0353-4EB3-B232-6A7193522582@stromnet.se>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="jt0yj30bxbg11sci"
Content-Disposition: inline
In-Reply-To: <BE54DA53-0353-4EB3-B232-6A7193522582@stromnet.se>
User-Agent: Mutt/1.4.2.3i
X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc
X-OS: FreeBSD 7.0-CURRENT i386
X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on 
	mail.garage.freebsd.pl
X-Spam-Level: 
X-Spam-Status: No, score=-5.9 required=3.0 tests=ALL_TRUSTED,BAYES_00 
	autolearn=ham version=3.0.4
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS (and quota)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2007 06:57:18 -0000


--jt0yj30bxbg11sci
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Sep 20, 2007 at 12:31:56AM +0200, Johan Str=F6m wrote:
> Hello
>=20
> I just installed FreeBSD-current on a box (actually upgraded 6.2 to -=20
> current) to experiment a bit.
> I was playing around with ZFS a bit and tried out the quota features. =20
> While doing this I noticed that it doesnt seem like you get a "disk =20
> full" notice the same way as you do on a "normal" (UFS) filesystem. =20
> Instead of aborting the operation with "No space left on device" it =20
> just continued:
[...]
> [root@devbox /tank]# zfs create tank/set2
> [root@devbox /tank/set2]# zfs set quota=3D10M tank/set2
> [root@devbox /tank/set2]# zfs get quota tank/set2
> NAME       PROPERTY  VALUE      SOURCE
> tank/set2  quota     10M        local
> [root@devbox /tank/set2]# dd if=3D/dev/urandom of=3Dtest
> ^C
> 18563+0 records in
> 18562+0 records out
> 9503744 bytes transferred in 199.564353 secs (47622 bytes/sec)
> [root@devbox /tank/set2]# zfs list tank/set2
> NAME        USED  AVAIL  REFER  MOUNTPOINT
> tank/set2  9.15M   870K  9.15M  /tank/set2
>=20
> No hard stop there, it just tries to write more and more and more.. =20
> Well the quota is enforced fine but shouldnt there be some more hard =20
> error? I'm not sure how regular UFS quotas work though since I never =20
> used them, but this seems like strange behaviour.

Hmm, seems to work just fine here:

	beast:root:~# zfs create tank/foo
	beast:root:~# zfs set quota=3D10m tank/foo

	beast:root:~# dd if=3D/dev/random of=3D/tank/foo/test bs=3D1m
	dd: /tank/foo/test: Disc quota exceeded
	11+0 records in
	10+0 records out
	10485760 bytes transferred in 6.109407 secs (1716330 bytes/sec)

	beast:root:~# df -h /tank/foo
	Filesystem    Size    Used   Avail Capacity  Mounted on
	tank/foo       10M     10M      0B   100%    /tank/foo

I think you just waited not long enough:) You didn't give block size
argument to dd(1), so it used 512 bytes. Please be more patient, retry
and report back, thanks!

--=20
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--jt0yj30bxbg11sci
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFG8l/lForvXbEpPzQRAnKoAKCJw4sN4Zp84e2WcJESOpcP9VS1qwCfYm0O
Vnl2pgGkuUinVIDnD+IGkvI=
=O01/
-----END PGP SIGNATURE-----

--jt0yj30bxbg11sci--

From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 10:49:47 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1791D16A418
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 10:49:47 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 9BF9013C481
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 10:49:46 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from list by ciao.gmane.org with local (Exim 4.43)
	id 1IYg4s-0005Pc-GZ
	for freebsd-fs@freebsd.org; Fri, 21 Sep 2007 12:49:42 +0200
Received: from lara.cc.fer.hr ([161.53.72.113])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 12:49:42 +0200
Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 12:49:42 +0200
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-fs@freebsd.org
From: Ivan Voras <ivoras@freebsd.org>
Date: Fri, 21 Sep 2007 12:49:28 +0200
Lines: 52
Message-ID: <fd07jr$gp9$1@sea.gmane.org>
References: <46F24F2C.40205@fluffles.net>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-ripemd160;
	protocol="application/pgp-signature";
	boundary="------------enig5CF3686197E6B1275D584F4E"
X-Complaints-To: usenet@sea.gmane.org
X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr
User-Agent: Thunderbird 1.5.0.12 (X11/20060911)
In-Reply-To: <46F24F2C.40205@fluffles.net>
X-Enigmail-Version: 0.94.4.0
Sender: news <news@sea.gmane.org>
Subject: Re: Writing contigiously to UFS2?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2007 10:49:47 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig5CF3686197E6B1275D584F4E
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable

Fluffles wrote:
> Hello list,
>=20
> I've setup a concat of 8 disks for my new NAS, using ataidle to spindow=
n=20
> the disks not needed. This allows me to save power and noise/heat by=20
> running only the drives that are actually in use.
>=20
> My problem is UFS. UFS2 seems to write to 4 disks, even though all the =


There 4 drives are used in what RAID form? If it's RAID0/stripe, you=20
can't avoid data being spread across the drives (since this is the point =

of having RAID0).

> data written so far can easily fit on just one disk. What's going on=20
> here? I looked at newfs parameters, but in the past was unable to make =

> newfs write contigiously. It seems UFS2 always writes to a new cylinder=
=2E=20
> Is there any way to force UFS to write contigiously? Or at least limit =

> the problem?

If the drives are simply concatenated, then there might be weird=20
behaviour in choosing what cylinder groups to allocate for files. UFS=20
forces big files to be spread across cylinder groups so that no large=20
file fills entire cgs.


--------------enig5CF3686197E6B1275D584F4E
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFG86G5ldnAQVacBcgRA5udAJ9L3OmCHFrUOawWoO7KtdoDM2OSQQCgi1xG
8r9XI/M4ebP4xNTsmbXKSKk=
=18nj
-----END PGP SIGNATURE-----

--------------enig5CF3686197E6B1275D584F4E--


From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 11:09:04 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7931716A418
	for <freebsd-fs@FreeBSD.org>; Fri, 21 Sep 2007 11:09:04 +0000 (UTC)
	(envelope-from etc@fluffles.net)
Received: from auriate.fluffles.net (cust.95.160.adsl.cistron.nl
	[195.64.95.160])
	by mx1.freebsd.org (Postfix) with ESMTP id 2DA4113C4C4
	for <freebsd-fs@FreeBSD.org>; Fri, 21 Sep 2007 11:09:03 +0000 (UTC)
	(envelope-from etc@fluffles.net)
Received: from 82-136-249-178.ip.tiscali.nl ([82.136.249.178] helo=[10.0.0.18])
	by auriate.fluffles.net with esmtpa (Exim 4.66 (FreeBSD))
	(envelope-from <etc@fluffles.net>)
	id 1IYgNK-000IY7-IS; Fri, 21 Sep 2007 13:08:46 +0200
Message-ID: <46F3A64C.4090507@fluffles.net>
Date: Fri, 21 Sep 2007 13:09:00 +0200
From: Fluffles <etc@fluffles.net>
User-Agent: Thunderbird 2.0.0.6 (Windows/20070728)
MIME-Version: 1.0
To: freebsd-fs@FreeBSD.org
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Ivan Voras <ivoras@freebsd.org>
Subject: Re: Writing contigiously to UFS2?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2007 11:09:04 -0000


Ivan Voras wrote:
 > There 4 drives are used in what RAID form? If it's RAID0/stripe, you 
can't avoid data being spread across the drives (since this is the point 
of having RAID0).

It's an array of 8 drives in gconcat, so they are using the JBOD / 
spanning / concatenating scheme, which does not have a RAID designation 
but rather is a bunch of disks glued to each other. Thus, there is no 
striping involved. Offset 0 to 500GB will 'land' on disk0 and then disk1 
takes over, in scheme:

offset 0 
------------------------------------------------------------------- 
offset 4TB
disk0 -> disk1 -> disk2 -> disk3 -> disk4 -> disk5 -> disk6 -> disk7

(for everyone not familiar with concatenation)


 > If the drives are simply concatenated, then there might be weird 
behavior in choosing what cylinder groups to allocate for files. UFS 
forces big files to be spread across cylinder groups so that no large 
file fills entire cgs.

Exactly! And this is my problem. I do not like this behavior for various 
reasons:
- it causes lower sequential transfer speed because the disks have to 
seek regularly
- UFS causes 2 reads per second when writing sequentially, probably some 
meta-data thing but i don't like it either
- files are not written contiguously which causes fragmentation, 
essentially UFS forces big files to become fragmented this way.

Even worse: data is being stored at weird locations, so that my energy 
efficient NAS project becomes crippled. Even with the first 400GB of 
data, it's storing that on the first 4 disks in my concat configuration, 
so that when opening folders i have to wait 10 seconds before the disk 
is spinned up. For regular operation, multiple disk have to be spinned 
up which is not practical and unnecessary. Is there any way to force UFS 
to write contiguously? Else i think i should try linux with some linux 
filesystem (XFS, Reiser, JFS) in the hope they do not suffer from this 
problem.

In the past when testing geom_raid5 I've tried to tune newfs parameters 
so that it would write contiguously but still there were regular 2-phase 
writes which mean data was not written contiguously. I really dislike 
this behavior.

- Veronica


From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 11:36:31 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7CDAE16A468
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 11:36:31 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 0A77213C465
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 11:36:31 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from list by ciao.gmane.org with local (Exim 4.43)
	id 1IYgns-0003m1-6q
	for freebsd-fs@freebsd.org; Fri, 21 Sep 2007 13:36:12 +0200
Received: from lara.cc.fer.hr ([161.53.72.113])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 13:36:12 +0200
Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 13:36:12 +0200
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-fs@freebsd.org
From: Ivan Voras <ivoras@freebsd.org>
Date: Fri, 21 Sep 2007 13:35:47 +0200
Lines: 46
Message-ID: <fd0aaj$poh$1@sea.gmane.org>
References: <46F3A64C.4090507@fluffles.net>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-ripemd160;
	protocol="application/pgp-signature";
	boundary="------------enig32BE1CB392C24DEE913EB202"
X-Complaints-To: usenet@sea.gmane.org
X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr
User-Agent: Thunderbird 1.5.0.12 (X11/20060911)
In-Reply-To: <46F3A64C.4090507@fluffles.net>
X-Enigmail-Version: 0.94.4.0
Sender: news <news@sea.gmane.org>
Subject: Re: Writing contigiously to UFS2?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2007 11:36:31 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig32BE1CB392C24DEE913EB202
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable

Fluffles wrote:

> Even worse: data is being stored at weird locations, so that my energy =

> efficient NAS project becomes crippled. Even with the first 400GB of=20
> data, it's storing that on the first 4 disks in my concat configuration=
,=20

> In the past when testing geom_raid5 I've tried to tune newfs parameters=
=20
> so that it would write contiguously but still there were regular 2-phas=
e=20
> writes which mean data was not written contiguously. I really dislike=20
> this behavior.

I agree, this is my least favorite aspect of UFS (maybe together with=20
nonimplementation of extents), for various reasons. I feel it's time to=20
start heavy lobbying for finishing FreeBSD's implementations of XFS and=20
raiserfs :)

(ZFS is not the ultimate solution: 1) replacing UFS monoculture with ZFS =

monoculture will sooner or later yield problems, and 2) sometimes a=20
"dumb" unix filesystem is preferred to the "smart" ZFS).


--------------enig32BE1CB392C24DEE913EB202
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFG86yTldnAQVacBcgRA6qHAJ9q44TYMr5HghYLvGLddFNJXoRljgCeNONU
cMqIOf4ZoHWLPQEGl2vvSCA=
=dlel
-----END PGP SIGNATURE-----

--------------enig32BE1CB392C24DEE913EB202--


From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 12:10:31 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B9B4816A41A
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 12:10:31 +0000 (UTC)
	(envelope-from anderson@freebsd.org)
Received: from ns.trinitel.com (186.161.36.72.static.reverse.ltdomains.com
	[72.36.161.186])
	by mx1.freebsd.org (Postfix) with ESMTP id 9976613C4BA
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 12:10:31 +0000 (UTC)
	(envelope-from anderson@freebsd.org)
Received: from proton.storspeed.com (209-163-168-124.static.twtelecom.net
	[209.163.168.124]) (authenticated bits=0)
	by ns.trinitel.com (8.14.1/8.14.1) with ESMTP id l8LCAQEs098414;
	Fri, 21 Sep 2007 07:10:28 -0500 (CDT)
	(envelope-from anderson@freebsd.org)
Message-ID: <46F3B4B0.40606@freebsd.org>
Date: Fri, 21 Sep 2007 07:10:24 -0500
From: Eric Anderson <anderson@freebsd.org>
User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728)
MIME-Version: 1.0
To: Fluffles <etc@fluffles.net>
References: <46F3A64C.4090507@fluffles.net>
In-Reply-To: <46F3A64C.4090507@fluffles.net>
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00 autolearn=ham
	version=3.1.8
X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on ns.trinitel.com
Cc: freebsd-fs@freebsd.org, Ivan Voras <ivoras@freebsd.org>
Subject: Re: Writing contigiously to UFS2?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2007 12:10:31 -0000

Fluffles wrote:
> 
> Ivan Voras wrote:
>  > There 4 drives are used in what RAID form? If it's RAID0/stripe, you 
> can't avoid data being spread across the drives (since this is the point 
> of having RAID0).
> 
> It's an array of 8 drives in gconcat, so they are using the JBOD / 
> spanning / concatenating scheme, which does not have a RAID designation 
> but rather is a bunch of disks glued to each other. Thus, there is no 
> striping involved. Offset 0 to 500GB will 'land' on disk0 and then disk1 
> takes over, in scheme:
> 
> offset 0 
> ------------------------------------------------------------------- 
> offset 4TB
> disk0 -> disk1 -> disk2 -> disk3 -> disk4 -> disk5 -> disk6 -> disk7
> 
> (for everyone not familiar with concatenation)
> 
> 
>  > If the drives are simply concatenated, then there might be weird 
> behavior in choosing what cylinder groups to allocate for files. UFS 
> forces big files to be spread across cylinder groups so that no large 
> file fills entire cgs.
> 
> Exactly! And this is my problem. I do not like this behavior for various 
> reasons:
> - it causes lower sequential transfer speed because the disks have to 
> seek regularly
> - UFS causes 2 reads per second when writing sequentially, probably some 
> meta-data thing but i don't like it either
> - files are not written contiguously which causes fragmentation, 
> essentially UFS forces big files to become fragmented this way.
> 
> Even worse: data is being stored at weird locations, so that my energy 
> efficient NAS project becomes crippled. Even with the first 400GB of 
> data, it's storing that on the first 4 disks in my concat configuration, 
> so that when opening folders i have to wait 10 seconds before the disk 
> is spinned up. For regular operation, multiple disk have to be spinned 
> up which is not practical and unnecessary. Is there any way to force UFS 
> to write contiguously? Else i think i should try linux with some linux 
> filesystem (XFS, Reiser, JFS) in the hope they do not suffer from this 
> problem.
> 
> In the past when testing geom_raid5 I've tried to tune newfs parameters 
> so that it would write contiguously but still there were regular 2-phase 
> writes which mean data was not written contiguously. I really dislike 
> this behavior.

This notion of breaking up large blocks of data into smaller chunks is a 
fundamental of the UFS (well, FFS) filesystem, and has been around for 
ages.  I'm not saying it's the One True FS Format by any means, but many 
many other file systems use the same principals.

The largest file size per chunk in a cylinder group is calculated at 
newfs time, which determines also how many cylinder groups there should 
be.  I think the largest size I've seen was something in the 460MB-ish 
range, meaning any contiguous write above that would span more than one 
cylinder group.

The max cylinder group size also has another bad side effect - the more 
cylinder groups you have, the longer it takes a snapshot to be created.

I recommend trying msdos fs.  On recent -CURRENT, it should perform 
fairly well (akin to UFS2 I think), and if I recall correctly, has a 
more contiguous block layout.

In the end, extending UFS2 to support much larger cylinder group sizes 
would hugely beneficial.  Instead of forcing XFS, reiserfs, JFS, 
ext[23], etc, to be writable (which most of those are GPL'ed), why not 
start the (immensely huge) task of a UFS3, which has support for all the 
things we need for the next 5-10yrs?  UFS2 has served well from 
5.x->7.x, but what about the future?

Making a UFS3 takes time, and dedication from developers.

Eric


From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 12:12:40 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8684816A417
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 12:12:40 +0000 (UTC)
	(envelope-from se@FreeBSD.org)
Received: from spacemail2-out.mgmt.space.net (spacemail2-out.mgmt.Space.Net
	[194.97.149.148])
	by mx1.freebsd.org (Postfix) with ESMTP id 0CADD13C474
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 12:12:39 +0000 (UTC)
	(envelope-from se@FreeBSD.org)
X-SpaceNet-SBRS: None
X-IronPort-AV: E=Sophos;i="4.20,283,1186351200"; d="scan'208";a="53864237"
Received: from mail.atsec.com ([195.30.252.105])
	by spacemail2-out.mgmt.space.net with ESMTP; 21 Sep 2007 14:12:39 +0200
Received: from [10.2.2.88] (frueh.atsec.com [217.110.13.170])
	(Authenticated sender: se@atsec.com)
	by mail.atsec.com (Postfix) with ESMTP id 94988720A68;
	Fri, 21 Sep 2007 14:12:38 +0200 (CEST)
Message-ID: <46F3B520.1070708@FreeBSD.org>
Date: Fri, 21 Sep 2007 14:12:16 +0200
From: Stefan Esser <se@FreeBSD.org>
User-Agent: Thunderbird 2.0.0.6 (Windows/20070728)
MIME-Version: 1.0
To: Ivan Voras <ivoras@freebsd.org>
References: <46F3A64C.4090507@fluffles.net> <fd0aaj$poh$1@sea.gmane.org>
In-Reply-To: <fd0aaj$poh$1@sea.gmane.org>
X-Enigmail-Version: 0.95.3
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
Subject: Re: Writing contigiously to UFS2?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2007 12:12:40 -0000

Ivan Voras wrote:
> Fluffles wrote:
> 
>> Even worse: data is being stored at weird locations, so that my energy
>> efficient NAS project becomes crippled. Even with the first 400GB of
>> data, it's storing that on the first 4 disks in my concat configuration, 
> 
>> In the past when testing geom_raid5 I've tried to tune newfs
>> parameters so that it would write contiguously but still there were
>> regular 2-phase writes which mean data was not written contiguously. I
>> really dislike this behavior.
> 
> I agree, this is my least favorite aspect of UFS (maybe together with
> nonimplementation of extents), for various reasons. I feel it's time to
> start heavy lobbying for finishing FreeBSD's implementations of XFS and
> raiserfs :)
> 
> (ZFS is not the ultimate solution: 1) replacing UFS monoculture with ZFS
> monoculture will sooner or later yield problems, and 2) sometimes a
> "dumb" unix filesystem is preferred to the "smart" ZFS).

Both XFS and ReiserFS are quite complex compared to UFS definitely
not well described by the term "dumb" ;-)

The FFS paper by McKusick et.al describes the historical allocation
strategy, which was somewhat modified in FreeBSD a few years ago in
order to adapt to modern disk sizes (larger cylinder groups, meaning
it is not a good idea to create each new directory in a new cylinder
group).

The code that implements the block layout strategy is easily found
in the sources and can be modified without too much risk to your
file systems consistency ...

Regards, STefan

From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 12:51:53 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5D58616A468
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 12:51:53 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by mx1.freebsd.org (Postfix) with ESMTP id DC67A13C4B0
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 12:51:52 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from list by ciao.gmane.org with local (Exim 4.43)
	id 1IYhwr-000289-9x
	for freebsd-fs@freebsd.org; Fri, 21 Sep 2007 14:49:33 +0200
Received: from lara.cc.fer.hr ([161.53.72.113])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 14:49:33 +0200
Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 14:49:33 +0200
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-fs@freebsd.org
From: Ivan Voras <ivoras@freebsd.org>
Date: Fri, 21 Sep 2007 14:45:35 +0200
Lines: 64
Message-ID: <fd0edf$7jd$1@sea.gmane.org>
References: <46F3A64C.4090507@fluffles.net> <fd0aaj$poh$1@sea.gmane.org>
	<46F3B520.1070708@FreeBSD.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-ripemd160;
	protocol="application/pgp-signature";
	boundary="------------enigE36410BE42DF53D7D8D7BB25"
X-Complaints-To: usenet@sea.gmane.org
X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr
User-Agent: Thunderbird 1.5.0.12 (X11/20060911)
In-Reply-To: <46F3B520.1070708@FreeBSD.org>
X-Enigmail-Version: 0.94.4.0
Sender: news <news@sea.gmane.org>
Subject: Re: Writing contigiously to UFS2?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2007 12:51:53 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigE36410BE42DF53D7D8D7BB25
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable

Stefan Esser wrote:
> Ivan Voras wrote:

>> (ZFS is not the ultimate solution: 1) replacing UFS monoculture with Z=
FS
>> monoculture will sooner or later yield problems, and 2) sometimes a
>> "dumb" unix filesystem is preferred to the "smart" ZFS).
>=20
> Both XFS and ReiserFS are quite complex compared to UFS definitely
> not well described by the term "dumb" ;-)

Of course, I mean no disrespect to them, I've read enough papers on them =

to realize their complexity :) By "dumb" I meant they behave like "point =

them to a device and they will stick to it", i.e. they don't come with a =

volume manager.

> The FFS paper by McKusick et.al describes the historical allocation
> strategy, which was somewhat modified in FreeBSD a few years ago in
> order to adapt to modern disk sizes (larger cylinder groups, meaning
> it is not a good idea to create each new directory in a new cylinder
> group).

[thinking out loud:]

 From experience (not from reading code or the docs) I conclude that=20
cylinder groups cannot be larger than around 190 MB. I know this from=20
numerous runnings of newfs and during development of gvirstor which=20
interacts with cg in an "interesting" way. I know the reasons why cgs=20
exist (mainly to lower latencies from seeking) but with todays drives=20
and memory configurations it would sometimes be nice to make them larger =

or in the extreme, make just one cg that covers the entire drive.=20
Though, this extreme would in case of concat configurations put all of=20
block and inode metadata on the first drive which could have interesting =

effects on performance. Of course, with seek-less drives (solid state)=20
there's no reason to have cgs at all.


--------------enigE36410BE42DF53D7D8D7BB25
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFG87zvldnAQVacBcgRA+QDAJ9DAe3v2ddR9WZRzx/KlKjBye0erACfYHMg
VoW1ozr75Mbml3V8oN0Sw3M=
=DEqb
-----END PGP SIGNATURE-----

--------------enigE36410BE42DF53D7D8D7BB25--


From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 12:55:35 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BACFE16A418
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 12:55:35 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 73ACF13C458
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 12:55:35 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from root by ciao.gmane.org with local (Exim 4.43)
	id 1IYi2A-0003br-7D
	for freebsd-fs@freebsd.org; Fri, 21 Sep 2007 14:55:02 +0200
Received: from lara.cc.fer.hr ([161.53.72.113])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 14:55:02 +0200
Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 14:55:02 +0200
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-fs@freebsd.org
From: Ivan Voras <ivoras@freebsd.org>
Date: Fri, 21 Sep 2007 14:50:14 +0200
Lines: 40
Message-ID: <fd0em7$8hn$1@sea.gmane.org>
References: <46F3A64C.4090507@fluffles.net> <46F3B4B0.40606@freebsd.org>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-ripemd160;
	protocol="application/pgp-signature";
	boundary="------------enigAF8FCB6DAC7E006ECCEEFF0A"
X-Complaints-To: usenet@sea.gmane.org
X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr
User-Agent: Thunderbird 1.5.0.12 (X11/20060911)
In-Reply-To: <46F3B4B0.40606@freebsd.org>
X-Enigmail-Version: 0.94.4.0
Sender: news <news@sea.gmane.org>
Subject: Re: Writing contigiously to UFS2?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2007 12:55:35 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigAF8FCB6DAC7E006ECCEEFF0A
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable

Eric Anderson wrote:

> The largest file size per chunk in a cylinder group is calculated at=20
> newfs time, which determines also how many cylinder groups there should=
=20
> be.  I think the largest size I've seen was something in the 460MB-ish =

> range, meaning any contiguous write above that would span more than one=
=20
> cylinder group.

Hmm, how did you manage to create a file system with such large cylinder =

groups? I've experimented with smallnum-TB file systems and still=20
couldn't make them larger than around 190 MB (though I wasn't actively=20
trying, just observed how they turned out).


--------------enigAF8FCB6DAC7E006ECCEEFF0A
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFG874GldnAQVacBcgRA1XJAJ9jISCU8p5tg387omks1VmQ5CL82wCeKmOT
nf1/+zIzxUZdPm12/ii9SsE=
=S1wz
-----END PGP SIGNATURE-----

--------------enigAF8FCB6DAC7E006ECCEEFF0A--


From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 13:19:20 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 25AC116A418;
	Fri, 21 Sep 2007 13:19:20 +0000 (UTC)
	(envelope-from gpalmer@freebsd.org)
Received: from noop.in-addr.com (unknown [IPv6:2001:5c0:8fff:fffe::214d])
	by mx1.freebsd.org (Postfix) with ESMTP id E406613C45B;
	Fri, 21 Sep 2007 13:19:19 +0000 (UTC)
	(envelope-from gpalmer@freebsd.org)
Received: from gjp by noop.in-addr.com with local (Exim 4.54 (FreeBSD))
	id 1IYiPf-000MU2-3D; Fri, 21 Sep 2007 09:19:19 -0400
Date: Fri, 21 Sep 2007 09:19:19 -0400
From: Gary Palmer <gpalmer@freebsd.org>
To: Ivan Voras <ivoras@freebsd.org>
Message-ID: <20070921131919.GA46759@in-addr.com>
Mail-Followup-To: Ivan Voras <ivoras@freebsd.org>, freebsd-fs@freebsd.org
References: <46F3A64C.4090507@fluffles.net> <46F3B4B0.40606@freebsd.org>
	<fd0em7$8hn$1@sea.gmane.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <fd0em7$8hn$1@sea.gmane.org>
Cc: freebsd-fs@freebsd.org
Subject: Re: Writing contigiously to UFS2?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2007 13:19:20 -0000

On Fri, Sep 21, 2007 at 02:50:14PM +0200, Ivan Voras wrote:
> Eric Anderson wrote:
> 
> >The largest file size per chunk in a cylinder group is calculated at 
> >newfs time, which determines also how many cylinder groups there should 
> >be.  I think the largest size I've seen was something in the 460MB-ish 
> >range, meaning any contiguous write above that would span more than one 
> >cylinder group.
> 
> Hmm, how did you manage to create a file system with such large cylinder 
> groups? I've experimented with smallnum-TB file systems and still 
> couldn't make them larger than around 190 MB (though I wasn't actively 
> trying, just observed how they turned out).

Presumably by using the -c parameter to newfs.

The original poster might get some traction out of a combination of
-c and -e parameters to newfs, although the fundamental behaviour
will remain unchanged.

From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 13:25:19 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D72FB16A41A
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 13:25:19 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 8E80613C45A
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 13:25:19 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from root by ciao.gmane.org with local (Exim 4.43)
	id 1IYiVC-00075d-Dp
	for freebsd-fs@freebsd.org; Fri, 21 Sep 2007 15:25:02 +0200
Received: from lara.cc.fer.hr ([161.53.72.113])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 15:25:02 +0200
Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 15:25:02 +0200
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-fs@freebsd.org
From: Ivan Voras <ivoras@freebsd.org>
Date: Fri, 21 Sep 2007 15:23:20 +0200
Lines: 28
Message-ID: <fd0gk8$f0d$2@sea.gmane.org>
References: <46F3A64C.4090507@fluffles.net>
	<46F3B4B0.40606@freebsd.org>	<fd0em7$8hn$1@sea.gmane.org>
	<20070921131919.GA46759@in-addr.com>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-ripemd160;
	protocol="application/pgp-signature";
	boundary="------------enigA920D43A35BA38F71F36274F"
X-Complaints-To: usenet@sea.gmane.org
X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr
User-Agent: Thunderbird 1.5.0.12 (X11/20060911)
In-Reply-To: <20070921131919.GA46759@in-addr.com>
X-Enigmail-Version: 0.94.4.0
Sender: news <news@sea.gmane.org>
Subject: Re: Writing contigiously to UFS2?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2007 13:25:19 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigA920D43A35BA38F71F36274F
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable

Gary Palmer wrote:

> Presumably by using the -c parameter to newfs.

Hm, I'll try it again later but I think I concluded that -c can be used=20
to lower the size of cgs, not to increase it.


--------------enigA920D43A35BA38F71F36274F
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFG88XIldnAQVacBcgRA3m5AKDOAVIUCp+OlYX911PdjsoIBF1Q0QCfQdWE
Ex6Wv/MOFhpycNxENV5Tqg4=
=nwlK
-----END PGP SIGNATURE-----

--------------enigA920D43A35BA38F71F36274F--


From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 13:31:28 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5D81E16A4D1;
	Fri, 21 Sep 2007 13:31:28 +0000 (UTC)
	(envelope-from gpalmer@freebsd.org)
Received: from noop.in-addr.com (unknown [IPv6:2001:5c0:8fff:fffe::214d])
	by mx1.freebsd.org (Postfix) with ESMTP id 2580613C461;
	Fri, 21 Sep 2007 13:31:28 +0000 (UTC)
	(envelope-from gpalmer@freebsd.org)
Received: from gjp by noop.in-addr.com with local (Exim 4.54 (FreeBSD))
	id 1IYibP-000MbM-Ch; Fri, 21 Sep 2007 09:31:27 -0400
Date: Fri, 21 Sep 2007 09:31:27 -0400
From: Gary Palmer <gpalmer@freebsd.org>
To: Ivan Voras <ivoras@freebsd.org>
Message-ID: <20070921133127.GB46759@in-addr.com>
Mail-Followup-To: Ivan Voras <ivoras@freebsd.org>, freebsd-fs@freebsd.org
References: <46F3A64C.4090507@fluffles.net> <46F3B4B0.40606@freebsd.org>
	<fd0em7$8hn$1@sea.gmane.org> <20070921131919.GA46759@in-addr.com>
	<fd0gk8$f0d$2@sea.gmane.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <fd0gk8$f0d$2@sea.gmane.org>
Cc: freebsd-fs@freebsd.org
Subject: Re: Writing contigiously to UFS2?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2007 13:31:28 -0000

On Fri, Sep 21, 2007 at 03:23:20PM +0200, Ivan Voras wrote:
> Gary Palmer wrote:
> 
> >Presumably by using the -c parameter to newfs.
> 
> Hm, I'll try it again later but I think I concluded that -c can be used 
> to lower the size of cgs, not to increase it.

A CG is basically an inode table with a block allocation bitmap to keep
track of what disk blocks are in use.  You might have to use the -i
parameter to increase the expected average file size.  That should
allow you to increase the CG size.  Its been a LONG time since I looked
at the UFS code, but I suspect the # of inodes per CG is probably capped.

From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 14:27:15 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 23BCF16A417
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 14:27:15 +0000 (UTC)
	(envelope-from pjd@garage.freebsd.pl)
Received: from mail.garage.freebsd.pl (arm132.internetdsl.tpnet.pl
	[83.17.198.132])
	by mx1.freebsd.org (Postfix) with ESMTP id AC5CB13C455
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 14:27:14 +0000 (UTC)
	(envelope-from pjd@garage.freebsd.pl)
Received: by mail.garage.freebsd.pl (Postfix, from userid 65534)
	id 704C045F44; Fri, 21 Sep 2007 16:27:12 +0200 (CEST)
Received: from localhost (pjd.wheel.pl [10.0.1.1])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.garage.freebsd.pl (Postfix) with ESMTP id 2482145EE5;
	Fri, 21 Sep 2007 16:27:07 +0200 (CEST)
Date: Fri, 21 Sep 2007 16:25:40 +0200
From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
To: Johan =?iso-8859-1?Q?Str=F6m?= <johan@stromnet.se>
Message-ID: <20070921142540.GB5690@garage.freebsd.pl>
References: <BE54DA53-0353-4EB3-B232-6A7193522582@stromnet.se>
	<20070920115621.GF4517@garage.freebsd.pl>
	<8B5FB4B1-2398-491C-95F4-E79361606916@stromnet.se>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="VV4b6MQE+OnNyhkM"
Content-Disposition: inline
In-Reply-To: <8B5FB4B1-2398-491C-95F4-E79361606916@stromnet.se>
User-Agent: Mutt/1.4.2.3i
X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc
X-OS: FreeBSD 7.0-CURRENT i386
X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on 
	mail.garage.freebsd.pl
X-Spam-Level: 
X-Spam-Status: No, score=-5.9 required=3.0 tests=ALL_TRUSTED,BAYES_00 
	autolearn=ham version=3.0.4
Cc: freebsd-fs@freebsd.org, zfs-discuss@opensolaris.org
Subject: Re: ZFS (and quota)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2007 14:27:15 -0000


--VV4b6MQE+OnNyhkM
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

I'm CCing zfs-discuss@opensolaris.org, as this doesn't look like
FreeBSD-specific problem.

It looks there is a problem with block allocation(?) when we are near
quota limit. tank/foo dataset has quota set to 10m:

Without quota:

	FreeBSD:
	# dd if=3D/dev/zero of=3D/tank/test bs=3D512 count=3D20480
	time: 0.7s

	Solaris:
	# dd if=3D/dev/zero of=3D/tank/test bs=3D512 count=3D20480
	time: 4.5s

With quota:

	FreeBSD:
	# dd if=3D/dev/zero of=3D/tank/foo/test bs=3D512 count=3D20480
	dd: /tank/foo/test: Disc quota exceeded
	time: 306.5s

	Solaris:
	# dd if=3D/dev/zero of=3D/tank/foo/test bs=3D512 count=3D20480
	write: Disc quota exceeded
	time: 602.7s

CPU is almost entirely idle, but disk activity seems to be high.

Any ideas?

--=20
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--VV4b6MQE+OnNyhkM
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFG89RkForvXbEpPzQRAh11AJ9aW0rUjUaXygtCzztm2i91nNdRtQCfV2Zn
EvR4Pc+G1wI1BoKP3tdujgM=
=pdDv
-----END PGP SIGNATURE-----

--VV4b6MQE+OnNyhkM--

From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 15:49:19 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9A8FD16A420
	for <freebsd-fs@FreeBSD.org>; Fri, 21 Sep 2007 15:49:19 +0000 (UTC)
	(envelope-from pjd@garage.freebsd.pl)
Received: from mail.garage.freebsd.pl (arm132.internetdsl.tpnet.pl
	[83.17.198.132])
	by mx1.freebsd.org (Postfix) with ESMTP id 01D5213C4A7
	for <freebsd-fs@FreeBSD.org>; Fri, 21 Sep 2007 15:49:18 +0000 (UTC)
	(envelope-from pjd@garage.freebsd.pl)
Received: by mail.garage.freebsd.pl (Postfix, from userid 65534)
	id 25F6D45EE5; Fri, 21 Sep 2007 17:49:16 +0200 (CEST)
Received: from localhost (154.81.datacomsa.pl [195.34.81.154])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.garage.freebsd.pl (Postfix) with ESMTP id 9884345CD9;
	Fri, 21 Sep 2007 17:49:02 +0200 (CEST)
Date: Fri, 21 Sep 2007 17:47:33 +0200
From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
To: freebsd-fs@FreeBSD.org
Message-ID: <20070921154733.GA9445@garage.freebsd.pl>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="YiEDa0DAkWCtVeE4"
Content-Disposition: inline
User-Agent: Mutt/1.4.2.3i
X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc
X-OS: FreeBSD 7.0-CURRENT i386
X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on 
	mail.garage.freebsd.pl
X-Spam-Level: 
X-Spam-Status: No, score=-2.6 required=3.0 tests=BAYES_00 autolearn=ham 
	version=3.0.4
Cc: zfs-discuss@opensolaris.org
Subject: The ZFS-Man.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2007 15:49:19 -0000


--YiEDa0DAkWCtVeE4
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hi.

I gave a talk about ZFS during EuroBSDCon 2007, and because it won the
the best talk award and some find it funny, here it is:

	http://youtube.com/watch?v=3Do3TGM0T1CvE

a bit better version is here:

	http://people.freebsd.org/~pjd/misc/zfs/zfs-man.swf

BTW. Inspired by ZFS demos from OpenSolaris page I created few demos of
ZFS on FreeBSD:

	http://youtube.com/results?search_query=3Dfreebsd+zfs&search=3DSearch

And better versions:

	http://people.freebsd.org/~pjd/misc/zfs/

--=20
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--YiEDa0DAkWCtVeE4
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFG8+eVForvXbEpPzQRAr8OAJ9QgQdogxgxS0clfnB+2zFL+D0J2wCaAqJD
nJfwktgF4q8bv64zduM5b5Q=
=aCb6
-----END PGP SIGNATURE-----

--YiEDa0DAkWCtVeE4--

From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 16:11:42 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B176616A420
	for <freebsd-fs@FreeBSD.org>; Fri, 21 Sep 2007 16:11:42 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au
	[211.29.132.185])
	by mx1.freebsd.org (Postfix) with ESMTP id 4A38D13C458
	for <freebsd-fs@FreeBSD.org>; Fri, 21 Sep 2007 16:11:42 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from c220-239-235-248.carlnfd3.nsw.optusnet.com.au
	(c220-239-235-248.carlnfd3.nsw.optusnet.com.au [220.239.235.248])
	by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	l8LGBbAn000989
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sat, 22 Sep 2007 02:11:40 +1000
Date: Sat, 22 Sep 2007 02:11:37 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Ivan Voras <ivoras@FreeBSD.org>
In-Reply-To: <fd0aaj$poh$1@sea.gmane.org>
Message-ID: <20070921230757.Q43374@delplex.bde.org>
References: <46F3A64C.4090507@fluffles.net> <fd0aaj$poh$1@sea.gmane.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@FreeBSD.org
Subject: Re: Writing contigiously to UFS2?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2007 16:11:42 -0000

On Fri, 21 Sep 2007, Ivan Voras wrote:

> Fluffles wrote:
>
>> Even worse: data is being stored at weird locations, so that my energy 
>> efficient NAS project becomes crippled. Even with the first 400GB of data, 
>> it's storing that on the first 4 disks in my concat configuration, 
>
>> In the past when testing geom_raid5 I've tried to tune newfs parameters so 
>> that it would write contiguously but still there were regular 2-phase 
>> writes which mean data was not written contiguously. I really dislike this 
>> behavior.
>
> I agree, this is my least favorite aspect of UFS (maybe together with 
> nonimplementation of extents), for various reasons. I feel it's time to start 
> heavy lobbying for finishing FreeBSD's implementations of XFS and raiserfs :)

Why not improve the implementation of ffs?  Cylinder groups are
fundamental to ffs, and I think having too-many too-small ones is
fairly fundamental, but the allocation policy across them can be
anything, and large cylinder groups could be faked using small ones.

The current (and very old?) allocation policy for extending files is
to consider allocating the block in a new cg when the current cg has
more than fs_maxbpg blocks allocated in it (newfs and tunefs parameter
-e maxbpg: default 1/4 of the number of blocks in a cg = bpg).  Then
preference is given to the next cg with more than the average number
of free blocks.  This seems to be buggy.  From ffs_blkpref_ufs1():

% 	if (indx % fs->fs_maxbpg == 0 || bap[indx - 1] == 0) {
% 		if (lbn < NDADDR + NINDIR(fs)) {
% 			cg = ino_to_cg(fs, ip->i_number);
% 			return (cgbase(fs, cg) + fs->fs_frag);
% 		}

I think "indx" here is the index into an array of block pointers in
the inode or an indirect block.  So for extending large files it is
always into an indirect block.  It gets reset to 0 for each new indirect
block.  This makes its use in (indx % fs->fs_maxbpg == 0) dubious.
The condition is satisfied whenever:
- indx == 0, i.e., always at the start of a new indirect block.  Not
   too bad, but not what we want if fs_maxbpg is much larger than the
   number of indexes in an indirect block.
- index == nonzero multiple of number of indexes in an indirect block.
   This condition is never be satisfied if fs_maxbpg is larger than
   the number of indexes in an indirect block.  This is the usual case
   for ffs2 (only 2K indexes in 16K-blocks, and fairly large cg's).  On
   an ffs1 fs that I have handy, maxbpg is 2K and the number of indexes
   is 4K, so this condition is satisfied once.

The (bap[indx - 1] == 0) condition causes a move to a new cg after
every hole.  This may help by leaving space to fill in the hole, but
it is wrong if the hole will never be filled in or is small.  This
seems to be just a vestige of code that implemented the old rotdelay
pessimization.  Comments saying that we use fs_maxcontig near here
are obviously vestiges of the pessimization.

% 		/*
% 		 * Find a cylinder with greater than average number of
% 		 * unused data blocks.
% 		 */
% 		if (indx == 0 || bap[indx - 1] == 0)
% 			startcg =
% 			    ino_to_cg(fs, ip->i_number) + lbn / fs->fs_maxbpg;

At the start of an indirect block, and after a hole, we don't know where
the previous block was so we use the cg of the inode advanced by the
estimate (lbn / fs->fs_maxbpg) of how far we have advanced from the cg
of the inode.  I think this estimate is too primitive to work right even
a small fraction of the time.  Adjustment factors related to the number
of maxbpg's per block of indexes and the fullness of the disk seem to
be required.  Keeping track of the cg of the previous block would be better.

% 		else
% 			startcg = dtog(fs, bap[indx - 1]) + 1;

Now there is no problem finding the cg of the previous block.  Note that
we always add 1...

% 		startcg %= fs->fs_ncg;
% 		avgbfree = fs->fs_cstotal.cs_nbfree / fs->fs_ncg;
% 		for (cg = startcg; cg < fs->fs_ncg; cg++)

... so the search gives maximal non-preference to the cg of the previous
block.  I think things would work much better if we considered the
current cg, if any, first (current cg = one containing previous block), and
we actually know that cg.  This would be easy to try -- just don't add 1.
Also try not adding the bad estimate (lbn / fs->fs_maxbpg), so that the
search starts at the inode's cg in some cases -- then previous cg's will
be reconsidered but hopefully the average limit will prevent them being
used.

Note that in the calculation of avgbfree, division by ncg gives a
granularity of ncg, so there is an inertia of ncg blocks against moving
to the next cg.  A too-large ncg is a feature here.

BTW, I recently found the bug that broke the allocation policy in FreeBSD's
implementation of ext2fs.  I thought that the bug was missing code/a too
simple implementation (one without a search like the above), but it turned
out to be just a bug.  The search wasn't set up right, so the current cg
was always preferred.  Always preferring the current cg tends to give
contiguous allocation of data blocks, and this works very well for small
file systems, but for large file systems the data blocks end up too far
away from inodes (since there is a limited number of inodes per cg and
the per-cg inode and data block allocations fill up at different rates.

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 16:21:50 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A070D16A469;
	Fri, 21 Sep 2007 16:21:50 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail18.syd.optusnet.com.au (mail18.syd.optusnet.com.au
	[211.29.132.199])
	by mx1.freebsd.org (Postfix) with ESMTP id 0C5BB13C494;
	Fri, 21 Sep 2007 16:21:49 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from c220-239-235-248.carlnfd3.nsw.optusnet.com.au
	(c220-239-235-248.carlnfd3.nsw.optusnet.com.au [220.239.235.248])
	by mail18.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	l8LGLgEs019551
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sat, 22 Sep 2007 02:21:42 +1000
Date: Sat, 22 Sep 2007 02:21:41 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Eric Anderson <anderson@FreeBSD.org>
In-Reply-To: <46F3B4B0.40606@freebsd.org>
Message-ID: <20070922021201.C43374@delplex.bde.org>
References: <46F3A64C.4090507@fluffles.net> <46F3B4B0.40606@freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@FreeBSD.org, Fluffles <etc@fluffles.net>,
	Ivan Voras <ivoras@FreeBSD.org>
Subject: Re: Writing contigiously to UFS2?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2007 16:21:50 -0000

On Fri, 21 Sep 2007, Eric Anderson wrote:

> I recommend trying msdos fs.  On recent -CURRENT, it should perform fairly 
> well (akin to UFS2 I think), and if I recall correctly, has a more contiguous 
> block layout.

It can give perfect contiguity for data blocks, but has serious slowness for
non-sequential access to large files, and anyway "large" for msdosfs is
only 4GB.

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 16:30:59 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1C47516A421;
	Fri, 21 Sep 2007 16:30:59 +0000 (UTC)
	(envelope-from anderson@freebsd.org)
Received: from ns.trinitel.com (186.161.36.72.static.reverse.ltdomains.com
	[72.36.161.186])
	by mx1.freebsd.org (Postfix) with ESMTP id DF57F13C468;
	Fri, 21 Sep 2007 16:30:58 +0000 (UTC)
	(envelope-from anderson@freebsd.org)
Received: from proton.storspeed.com (209-163-168-124.static.twtelecom.net
	[209.163.168.124]) (authenticated bits=0)
	by ns.trinitel.com (8.14.1/8.14.1) with ESMTP id l8LGUrDH082328;
	Fri, 21 Sep 2007 11:30:55 -0500 (CDT)
	(envelope-from anderson@freebsd.org)
Message-ID: <46F3F1BD.3060306@freebsd.org>
Date: Fri, 21 Sep 2007 11:30:53 -0500
From: Eric Anderson <anderson@freebsd.org>
User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728)
MIME-Version: 1.0
To: Bruce Evans <brde@optusnet.com.au>
References: <46F3A64C.4090507@fluffles.net> <46F3B4B0.40606@freebsd.org>
	<20070922021201.C43374@delplex.bde.org>
In-Reply-To: <20070922021201.C43374@delplex.bde.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00 autolearn=ham
	version=3.1.8
X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on ns.trinitel.com
Cc: freebsd-fs@freebsd.org, Fluffles <etc@fluffles.net>,
	Ivan Voras <ivoras@freebsd.org>
Subject: Re: Writing contigiously to UFS2?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2007 16:30:59 -0000

Bruce Evans wrote:
> On Fri, 21 Sep 2007, Eric Anderson wrote:
> 
>> I recommend trying msdos fs.  On recent -CURRENT, it should perform 
>> fairly well (akin to UFS2 I think), and if I recall correctly, has a 
>> more contiguous block layout.
> 
> It can give perfect contiguity for data blocks, but has serious slowness 
> for
> non-sequential access to large files, and anyway "large" for msdosfs is
> only 4GB.


Oops - forgot about the 4GB limit.  I was also assuming that the random 
read in a big file problem wasn't an issue due to the configuration 
noted by the original poster.. but maybe that's a bad assumption.

Eric

From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 16:45:36 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E8C2E16A417
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 16:45:36 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 9DBAD13C447
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 16:45:36 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from list by ciao.gmane.org with local (Exim 4.43)
	id 1IYldB-000442-GZ
	for freebsd-fs@freebsd.org; Fri, 21 Sep 2007 16:45:29 +0000
Received: from 78-1-115-225.adsl.net.t-com.hr ([78.1.115.225])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 16:45:29 +0000
Received: from ivoras by 78-1-115-225.adsl.net.t-com.hr with local (Gmexim 0.1
	(Debian)) id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 16:45:29 +0000
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-fs@freebsd.org
From: Ivan Voras <ivoras@freebsd.org>
Date: Fri, 21 Sep 2007 18:45:11 +0200
Lines: 34
Message-ID: <fd0sf0$tri$1@sea.gmane.org>
References: <20070921154733.GA9445@garage.freebsd.pl>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature";
	boundary="------------enig1862C6828021DA8DCE31092A"
X-Complaints-To: usenet@sea.gmane.org
X-Gmane-NNTP-Posting-Host: 78-1-115-225.adsl.net.t-com.hr
User-Agent: Thunderbird 2.0.0.6 (Windows/20070728)
In-Reply-To: <20070921154733.GA9445@garage.freebsd.pl>
X-Enigmail-Version: 0.95.3
Sender: news <news@sea.gmane.org>
Subject: Re: The ZFS-Man.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2007 16:45:37 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig1862C6828021DA8DCE31092A
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Pawel Jakub Dawidek wrote:
> Hi.
>=20
> I gave a talk about ZFS during EuroBSDCon 2007, and because it won the
> the best talk award and some find it funny, here it is:
>=20
> 	http://youtube.com/watch?v=3Do3TGM0T1CvE

Just perfect!

Thank you, I've disseminated the links wherever I can :)


--------------enig1862C6828021DA8DCE31092A
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG8/UcldnAQVacBcgRAmwCAKCQqSX1oGNnKeq5Ms33oay1KpGnegCfbIkN
H8h/Xtuz2TifHXJoGOmkrEc=
=B2Fo
-----END PGP SIGNATURE-----

--------------enig1862C6828021DA8DCE31092A--


From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 18:06:45 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8A30416A41B
	for <freebsd-fs@FreeBSD.ORG>; Fri, 21 Sep 2007 18:06:45 +0000 (UTC)
	(envelope-from olli@lurza.secnetix.de)
Received: from lurza.secnetix.de (lurza.secnetix.de [83.120.8.8])
	by mx1.freebsd.org (Postfix) with ESMTP id E069C13C48A
	for <freebsd-fs@FreeBSD.ORG>; Fri, 21 Sep 2007 18:06:44 +0000 (UTC)
	(envelope-from olli@lurza.secnetix.de)
Received: from lurza.secnetix.de (localhost [127.0.0.1])
	by lurza.secnetix.de (8.14.1/8.14.1) with ESMTP id l8LHaogv098458;
	Fri, 21 Sep 2007 19:36:55 +0200 (CEST)
	(envelope-from oliver.fromme@secnetix.de)
Received: (from olli@localhost)
	by lurza.secnetix.de (8.14.1/8.14.1/Submit) id l8LHaoJ6098457;
	Fri, 21 Sep 2007 19:36:50 +0200 (CEST) (envelope-from olli)
Date: Fri, 21 Sep 2007 19:36:50 +0200 (CEST)
Message-Id: <200709211736.l8LHaoJ6098457@lurza.secnetix.de>
From: Oliver Fromme <olli@lurza.secnetix.de>
To: freebsd-fs@FreeBSD.ORG, etc@fluffles.net
In-Reply-To: <46F24F2C.40205@fluffles.net>
X-Newsgroups: list.freebsd-fs
User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX)
	(FreeBSD/6.2-STABLE-20070808 (i386))
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.1.2
	(lurza.secnetix.de [127.0.0.1]);
	Fri, 21 Sep 2007 19:36:56 +0200 (CEST)
Cc: 
Subject: Re: Writing contigiously to UFS2?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: freebsd-fs@FreeBSD.ORG, etc@fluffles.net
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2007 18:06:45 -0000

Fluffles <etc@fluffles.net> wrote:
 > I've setup a concat of 8 disks for my new NAS, using ataidle to spindown 
 > the disks not needed. This allows me to save power and noise/heat by 
 > running only the drives that are actually in use.
 > 
 > My problem is UFS. UFS2 seems to write to 4 disks, even though all the 
 > data written so far can easily fit on just one disk. What's going on 
 > here? I looked at newfs parameters, but in the past was unable to make 
 > newfs write contigiously. It seems UFS2 always writes to a new cylinder. 
 > Is there any way to force UFS to write contigiously? Or at least limit 
 > the problem?
 > 
 > If i write 400GB to a 4TB volume consisting of 8x 500GB disks, i want 
 > all data to be on the first disk.

You should be able to achieve that by putting a gvirstor
onto your drives, having the physical size of those eight
drives.  Then newfs that gvirstor device.

I haven't used gvirstor myself, but if I understand it
correctly, it should start filling its providers from the
start, and only begin using the next one when the previous
ones are all completely used.  So it should do exactly
what you want.

http://wiki.freebsd.org/gvirstor

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Gesch�ftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht M�n-
chen, HRB 125758,  Gesch�ftsf�hrer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

"One of the main causes of the fall of the Roman Empire was that,
lacking zero, they had no way to indicate successful termination
of their C programs."
        -- Robert Firth

From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 18:10:23 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 69F9016A419;
	Fri, 21 Sep 2007 18:10:23 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail11.syd.optusnet.com.au (mail11.syd.optusnet.com.au
	[211.29.132.192])
	by mx1.freebsd.org (Postfix) with ESMTP id F39DA13C4A5;
	Fri, 21 Sep 2007 18:10:22 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from c220-239-235-248.carlnfd3.nsw.optusnet.com.au
	(c220-239-235-248.carlnfd3.nsw.optusnet.com.au [220.239.235.248])
	by mail11.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	l8LIAJpc028903
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sat, 22 Sep 2007 04:10:21 +1000
Date: Sat, 22 Sep 2007 04:10:19 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Gary Palmer <gpalmer@FreeBSD.org>
In-Reply-To: <20070921133127.GB46759@in-addr.com>
Message-ID: <20070922022524.X43853@delplex.bde.org>
References: <46F3A64C.4090507@fluffles.net> <46F3B4B0.40606@freebsd.org>
	<fd0em7$8hn$1@sea.gmane.org> <20070921131919.GA46759@in-addr.com>
	<fd0gk8$f0d$2@sea.gmane.org> <20070921133127.GB46759@in-addr.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@FreeBSD.org, Ivan Voras <ivoras@FreeBSD.org>
Subject: Re: Writing contigiously to UFS2?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2007 18:10:23 -0000

On Fri, 21 Sep 2007, Gary Palmer wrote:

> On Fri, Sep 21, 2007 at 03:23:20PM +0200, Ivan Voras wrote:
>> Gary Palmer wrote:
>>
>>> Presumably by using the -c parameter to newfs.
>>
>> Hm, I'll try it again later but I think I concluded that -c can be used
>> to lower the size of cgs, not to increase it.

Yes, it used to default to a small value, but that became very pessimal
when disks became larger than a whole 1GB or so, so obrien changed it
to default to the maximum possible value.  I think it hasn't been
changed back down.

> A CG is basically an inode table with a block allocation bitmap to keep
> track of what disk blocks are in use.  You might have to use the -i
> parameter to increase the expected average file size.  That should
> allow you to increase the CG size.  Its been a LONG time since I looked
> at the UFS code, but I suspect the # of inodes per CG is probably capped.

The limit seems to be only that struct cg (mainly the struct hack stuff
at the end) fits in a single block.  The non-struct parts of this
struct consist mainly of the inode, block and cluster bitmaps.  The
block bitmap is normally the largest by far, since it actually maps
fragments.  With 16K-blocks and 2K-frags, at most 128K frags = 256MB
of disk can be mapped.  I get 180MB in practice, with an inode bitmap
size of only 3K, so there is not much to be gained by tuning -i but
more to be gained by tuning -b and -f (several doublings are reasonable).
However, I think small cg's are not a problem for huge files, except
for bugs.

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 18:33:24 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B640916A476
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 18:33:24 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 6B37413C478
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 18:33:24 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from list by ciao.gmane.org with local (Exim 4.43)
	id 1IYnJU-0004Jk-V6
	for freebsd-fs@freebsd.org; Fri, 21 Sep 2007 18:33:16 +0000
Received: from 78-1-115-225.adsl.net.t-com.hr ([78.1.115.225])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 18:33:16 +0000
Received: from ivoras by 78-1-115-225.adsl.net.t-com.hr with local (Gmexim 0.1
	(Debian)) id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Fri, 21 Sep 2007 18:33:16 +0000
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-fs@freebsd.org
From: Ivan Voras <ivoras@freebsd.org>
Date: Fri, 21 Sep 2007 20:32:59 +0200
Lines: 33
Message-ID: <fd12p3$jgk$1@sea.gmane.org>
References: <46F24F2C.40205@fluffles.net>
	<200709211736.l8LHaoJ6098457@lurza.secnetix.de>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature";
	boundary="------------enig7508AE0FA1BB31F3ED2E0FF0"
X-Complaints-To: usenet@sea.gmane.org
X-Gmane-NNTP-Posting-Host: 78-1-115-225.adsl.net.t-com.hr
User-Agent: Thunderbird 2.0.0.6 (Windows/20070728)
In-Reply-To: <200709211736.l8LHaoJ6098457@lurza.secnetix.de>
X-Enigmail-Version: 0.95.3
Sender: news <news@sea.gmane.org>
Subject: Re: Writing contigiously to UFS2?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2007 18:33:24 -0000

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig7508AE0FA1BB31F3ED2E0FF0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Oliver Fromme wrote:

> I haven't used gvirstor myself, but if I understand it
> correctly, it should start filling its providers from the
> start, and only begin using the next one when the previous
> ones are all completely used.  So it should do exactly
> what you want.

Yes, with the side-effect of putting all cgs and their metadata on the
beginning of the first drive. An obvious side-effect is that, writing to
any drive other than the first will also touch the first drive.


--------------enig7508AE0FA1BB31F3ED2E0FF0
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG9A5gldnAQVacBcgRAgNFAJoDiA6qcRbQD72e0exeo4dnmrtvggCg3gn+
AVEaLYlJLPxa/pdx6QNe1Ws=
=7IIV
-----END PGP SIGNATURE-----

--------------enig7508AE0FA1BB31F3ED2E0FF0--


From owner-freebsd-fs@FreeBSD.ORG  Fri Sep 21 20:08:32 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4784216A417
	for <freebsd-fs@FreeBSD.org>; Fri, 21 Sep 2007 20:08:32 +0000 (UTC)
	(envelope-from tmcmahon2@yahoo.com)
Received: from smtp104.plus.mail.re1.yahoo.com
	(smtp104.plus.mail.re1.yahoo.com [69.147.102.67])
	by mx1.freebsd.org (Postfix) with SMTP id D265913C448
	for <freebsd-fs@FreeBSD.org>; Fri, 21 Sep 2007 20:08:31 +0000 (UTC)
	(envelope-from tmcmahon2@yahoo.com)
Received: (qmail 69011 invoked from network); 21 Sep 2007 19:41:51 -0000
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com;
	h=Received:X-YMail-OSG:Message-ID:Date:From:User-Agent:MIME-Version:To:CC:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding;
	b=xMpUVXCp5roGGM10ug8Vldp0A+Lo3pWP49FstCJoHC3FQF+Y6uCS6g1MAJ18D29BT6uzhotQOSzpAhA7NT7UuhzO8L6qQemPVzknQz2LxC5JDhCAGsM2iWP1ayH6Exo0PxcYWHTURFrETIctQfgLczWLEdrxQ4azmGbx1D34Wws=
	; 
Received: from unknown (HELO ?192.168.1.100?) (tmcmahon2@68.50.242.72 with
	plain)
	by smtp104.plus.mail.re1.yahoo.com with SMTP; 21 Sep 2007 19:41:50 -0000
X-YMail-OSG: cR17mbYVM1mK91.bu6feIpGmoZq9VUIzW.IvCfDFIZpH3xOxXo.sXbs_aqNAYVoTQ8_3STT0Ap0TV7dYTS4L2nZXfWCEf30OGTVK3Oe7uxg8IlJ4FWCNhqc8155S
Message-ID: <46F41E7F.7050504@yahoo.com>
Date: Fri, 21 Sep 2007 15:41:51 -0400
From: Torrey McMahon <tmcmahon2@yahoo.com>
User-Agent: Thunderbird 2.0.0.6 (Windows/20070728)
MIME-Version: 1.0
To: Jonathan Edwards <Jonathan.Edwards@Sun.COM>
References: <20070921154733.GA9445@garage.freebsd.pl>	<5B5809AA-A1B2-4600-8974-C84DE9E7A05A@sun.com>
	<7C7432FE-F183-4A09-9432-75FE21F8A0F9@sun.com>
In-Reply-To: <7C7432FE-F183-4A09-9432-75FE21F8A0F9@sun.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@FreeBSD.org, zfs-discuss@opensolaris.org,
	Pawel Jakub Dawidek <pjd@FreeBSD.org>, eric kustarz <Eric.Kustarz@Sun.COM>
Subject: Re: [zfs-discuss] The ZFS-Man.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2007 20:08:32 -0000

Jonathan Edwards wrote:
> On Sep 21, 2007, at 14:57, eric kustarz wrote:
>
>   
>>> Hi.
>>>
>>> I gave a talk about ZFS during EuroBSDCon 2007, and because it won  
>>> the
>>> the best talk award and some find it funny, here it is:
>>>
>>> 	http://youtube.com/watch?v=o3TGM0T1CvE
>>>
>>> a bit better version is here:
>>>
>>> 	http://people.freebsd.org/~pjd/misc/zfs/zfs-man.swf
>>>       
>> Looks like Jeff has been working out :)
>>     
>
> my first thought too:
> http://blogs.sun.com/bonwick/resource/images/bonwick.portrait.jpg
>
> funny - i always pictured this as UFS-man though:
> http://www.benbakerphoto.com/business/47573_8C-after.jpg
>
> but what's going on with the sheep there?

Got me but they do look kind of nervous. (Happy friday folks...)


From owner-freebsd-fs@FreeBSD.ORG  Sat Sep 22 12:37:28 2007
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 692A216A419;
	Sat, 22 Sep 2007 12:37:28 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail06.syd.optusnet.com.au (mail06.syd.optusnet.com.au
	[211.29.132.187])
	by mx1.freebsd.org (Postfix) with ESMTP id DC64413C45A;
	Sat, 22 Sep 2007 12:37:27 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c220-239-235-248.carlnfd3.nsw.optusnet.com.au
	[220.239.235.248])
	by mail06.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	l8MCbMgF017230
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sat, 22 Sep 2007 22:37:25 +1000
Date: Sat, 22 Sep 2007 22:37:22 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
In-Reply-To: <20070921230757.Q43374@delplex.bde.org>
Message-ID: <20070922212133.L90921@besplex.bde.org>
References: <46F3A64C.4090507@fluffles.net> <fd0aaj$poh$1@sea.gmane.org>
	<20070921230757.Q43374@delplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@FreeBSD.org, Ivan Voras <ivoras@FreeBSD.org>
Subject: Re: Writing contigiously to UFS2?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 22 Sep 2007 12:37:28 -0000

On Sat, 22 Sep 2007, Bruce Evans wrote:

> The current (and very old?) allocation policy for extending files is
> to consider allocating the block in a new cg when the current cg has
> more than fs_maxbpg blocks allocated in it (newfs and tunefs parameter
> -e maxbpg: default 1/4 of the number of blocks in a cg = bpg).  Then
> preference is given to the next cg with more than the average number
> of free blocks.  This seems to be buggy.  From ffs_blkpref_ufs1():

Actually, it is almost as good as possible.

Note that ffs_blkpref_*() only gives a preference, so it shouldn't try
too hard.  Also, it is only used if block reallocation is enabled (the
default: sysctl ffs.vfs.doreallocblks=1).  Then it gives delayed
allocation.  Block reallocation generally does a good job of realocating
blocks contiguously.  I don't know exactly where/how the allocation
is done if block reallocation is not enabled, but certainly, maxbpg
is not used then.

> % 	if (indx % fs->fs_maxbpg == 0 || bap[indx - 1] == 0) {
> % 		if (lbn < NDADDR + NINDIR(fs)) {
> % 			cg = ino_to_cg(fs, ip->i_number);
> % 			return (cgbase(fs, cg) + fs->fs_frag);
> % 		}
>
> I think "indx" here is the index into an array of block pointers in
> the inode or an indirect block.

This is correct.

> So for extending large files it is
> always into an indirect block.  It gets reset to 0 for each new indirect
> block.  This makes its use in (indx % fs->fs_maxbpg == 0) dubious.
> The condition is satisfied whenever:
> - indx == 0, i.e., always at the start of a new indirect block.  Not
>  too bad, but not what we want if fs_maxbpg is much larger than the
>  number of indexes in an indirect block.

Actually, this case is handled quite well later.

> - index == nonzero multiple of number of indexes in an indirect block.
>  This condition is never be satisfied if fs_maxbpg is larger than
>  the number of indexes in an indirect block.  This is the usual case
>  for ffs2 (only 2K indexes in 16K-blocks, and fairly large cg's).  On
>  an ffs1 fs that I have handy, maxbpg is 2K and the number of indexes
>  is 4K, so this condition is satisfied once.

This case is not handled well.  The bug for it is mainly in newfs.
>From newfs.c:

% /*
%  * MAXBLKPG determines the maximum number of data blocks which are
%  * placed in a single cylinder group. The default is one indirect
%  * block worth of data blocks.
%  */
% #define MAXBLKPG(bsize)	((bsize) / sizeof(ufs2_daddr_t))

The comment is correct, but the code is wrong for ffs2.  Then
MAXBLKPG defaults to half an indirect block worth of data blocks.
I just use the default, so my ffs1 fs has maxbpg instead of 4K.

> The (bap[indx - 1] == 0) condition causes a move to a new cg after
> every hole.

Actually, this case is handled well later.

> This may help by leaving space to fill in the hole, but
> it is wrong if the hole will never be filled in or is small.  This
> seems to be just a vestige of code that implemented the old rotdelay
> pessimization.

Actually, it is still needed for using bap[indx - 1] at the end of function.

> Comments saying that we use fs_maxcontig near here
> are obviously vestiges of the pessimization.
>
> % 		/*
> % 		 * Find a cylinder with greater than average number of
> % 		 * unused data blocks.
> % 		 */
> % 		if (indx == 0 || bap[indx - 1] == 0)
> % 			startcg =
> % 			    ino_to_cg(fs, ip->i_number) + lbn / 
> fs->fs_maxbpg;
>
> At the start of an indirect block, and after a hole, we don't know where
> the previous block was so we use the cg of the inode advanced by the
> estimate (lbn / fs->fs_maxbpg) of how far we have advanced from the cg
> of the inode.  I think this estimate is too primitive to work right even
> a small fraction of the time.  Adjustment factors related to the number
> of maxbpg's per block of indexes and the fullness of the disk seem to
> be required.  Keeping track of the cg of the previous block would be better.

Actually, this estimate works very well.  We _want_ to change to a new
cg after every maxpg blocks.  The estimate gives the closest cg that
is possible if all the blocks are allocated as contiguously as we want.
If the disk is nearly full we will probably have to go further.  Starting
the search at the closest cg that we want gives a bias towards close
cg's that are not too close.

> % 		else
> % 			startcg = dtog(fs, bap[indx - 1]) + 1;
>
> Now there is no problem finding the cg of the previous block.  Note that
> we always add 1...
>
> % 		startcg %= fs->fs_ncg;
> % 		avgbfree = fs->fs_cstotal.cs_nbfree / fs->fs_ncg;
> % 		for (cg = startcg; cg < fs->fs_ncg; cg++)
>
> ... so the search gives maximal non-preference to the cg of the previous
> block.  I think things would work much better if we considered the
> current cg, if any, first (current cg = one containing previous block), and
> we actually know that cg.  This would be easy to try -- just don't add 1.
> Also try not adding the bad estimate (lbn / fs->fs_maxbpg), so that the
> search starts at the inode's cg in some cases -- then previous cg's will
> be reconsidered but hopefully the average limit will prevent them being
> used.

Actually, adding 1 is correct in most cases.  Here we think we have just
allocated maxcontig blocks in the current cg, so we _want_ to advance to
the next cg.  The problem is that we don't really know that we have
allocated that many blocks.  We have lots of previous block numbers in
bap[] and could inspect many of them, but we only inspect the previous
one.  The corresponding code in 4.4BSD is better -- it inspects the
one some distance before the previous one.  The corresponding diistance
here is maxbpg.  We could inspect the blocks at 1 previous and maxbpg
previous to quickly estimate if we have allocated all of the previous
maxbpg blocks in the same cylinder group.

> Note that in the calculation of avgbfree, division by ncg gives a
> granularity of ncg, so there is an inertia of ncg blocks against moving
> to the next cg.  A too-large ncg is a feature here.

This feature shouldn't make much difference, but we don't want it if we
are certain that we have just allocated maxbpg blocks in a cg.

Analysis of block layouts for a 200MB file shows no large problems in
this area, but some small ones.  This is with some problems already
fixed.  200MB is a bit small but gives data small enough to understand
easily.  The analysis is limited to ffs1 since I only have a layout-
printing program for that.  I don't use ffs2 and haven't fixed the
"some" problems for it.  Perhaps they are the ones that matter here.
(For what they are, see below.)

ffs1, no soft updates (all tests on an almost-new fs):

% fs_bsize = 16384
% fs_fsize = 2048
% 4:	lbn 0-11 blkno 1520-1615
% 	lbn [<1>indir]12-4107 blkno 1616-1623
% 	lbn 12-4107 blkno 1624-34391

Everything is perfectly configuous until here.  Without my fixes, the first
indirect block in the middle tends to be allocated discontiguously.  Here
lbn's have size fs_bsize = 16K, and blkno's have size fs_fsize = 2K; "4:"
is just the inode number; "[<n>indir>]" is an nth indirect block.

% 	lbn [<2>indir]4108-16781323 blkno 189592-189599

Bug.  cg's have size about 94000 in blkno units.  We have skipped the
entire second cg.

% 	lbn [<1>indir]4108-8203 blkno 189600-189607
% 	lbn 4108-6155 blkno 189608-205991

All contiguous.

% 	lbn 6156-8203 blkno 283640-300023

This is from the newfs bug (default maxbpg = half an indirect block's
worth of blkno's).  Here we advance to the next cg half way through
the indirect block.  The advance is only about 90000 blkno's so it
correctly doesn't skip a cg.

% 	lbn [<1>indir]8204-12299 blkno 377688-377695
% 	lbn 8204-10251 blkno 377696-394079
% 	lbn 10252-12299 blkno 471736-488119
% 	lbn [<1>indir]12300-16395 blkno 565784-565791
% 	lbn 12300-12799 blkno 565792-569791

The pattern continues with no problems except the default maxbpg being
to small.  This does almost what the OP wants -- with a huge disk,
even huge files fit in a few cg's (lots of cg's but few compared with
the total number).  With tunefs -e <maxbpg=bpg>, I think the layout
would be perfectly contiguous except for the skip after the first cg.

My fix is only for the first indirect block, so it doesn't make much
difference for large files.  With the default maxbpg, later indirect
blocks are always allocated in a new cg anyway.  Hopefully the
"primitive" estimate prevents this so that all indirect blocks have
a chance of being allocated contiguously, and other code cooperates
by not moving them.

ffs1, soft updates:

% fs_bsize = 16384
% fs_fsize = 2048
% 5:	lbn 0-11 blkno 34392-34487

For some reason, the file is started later in the first cg.

% 	lbn [<1>indir]12-4107 blkno 34488-34495
% 	lbn 12-4107 blkno 34496-67263

Contiguous.  Without my fix, soft updates seems to move the first indirect
block further away, and thus is noticeably slower.

% 	lbn [<2>indir]4108-16781323 blkno 285592-285599

Soft updates has skipped not just the second cg but the third one too.

% 	lbn [<1>indir]4108-8203 blkno 285600-285607
% 	lbn 4108-6155 blkno 285608-301991
% 	lbn 6156-8203 blkno 377688-394071
% 	lbn [<1>indir]8204-12299 blkno 471736-471743
% 	lbn 8204-10251 blkno 471744-488127
% 	lbn 10252-12299 blkno 565784-582167
% 	lbn [<1>indir]12300-16395 blkno 659832-659839
% 	lbn 12300-12799 blkno 659840-663839

The pattern continues (no more skips).

ffs1, no soft updates, maxbpg = 655360:

% fs_bsize = 16384
% fs_fsize = 2048
% 4:	lbn 0-11 blkno 1520-1615
% 	lbn [<1>indir]12-4107 blkno 1616-1623
% 	lbn 12-4107 blkno 1624-34391
% 	lbn [<2>indir]4108-16781323 blkno 95544-95551
% 	lbn [<1>indir]4108-8203 blkno 95552-95559
% 	lbn 4108-8203 blkno 95560-128327
% 	lbn [<1>indir]8204-12299 blkno 189592-189599
% 	lbn 8204-12299 blkno 189600-222367
% 	lbn [<1>indir]12300-16395 blkno 283640-283647
% 	lbn 12300-12799 blkno 283648-287647

The "primitive" estimate isn't helping -- a new cg is started for every
indirect block.

Bruce