From owner-freebsd-fs@FreeBSD.ORG Sun Sep 16 21:59:13 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4A13516A420 for ; Sun, 16 Sep 2007 21:59:13 +0000 (UTC) (envelope-from valenta@sutra.cz) Received: from slimak.dkm.cz (smtp.dkm.cz [62.24.64.34]) by mx1.freebsd.org (Postfix) with SMTP id AA4D613C469 for ; Sun, 16 Sep 2007 21:59:12 +0000 (UTC) (envelope-from valenta@sutra.cz) Received: (qmail 49346 invoked by uid 0); 16 Sep 2007 21:32:31 -0000 Received: from r2t207.net.upc.cz (HELO ?192.168.1.163?) (62.245.83.207) by smtp.dkm.cz with SMTP; 16 Sep 2007 21:32:31 -0000 From: Miroslav Valenta To: freebsd-fs@freebsd.org Content-Type: text/plain Date: Sun, 16 Sep 2007 23:32:36 +0200 Message-Id: <1189978356.30388.11.camel@afrodita> Mime-Version: 1.0 X-Mailer: Evolution 2.11.92 Content-Transfer-Encoding: 7bit Subject: slow transfers on webshare service X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: valenta@sutra.cz List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2007 21:59:13 -0000 Hi i have problem with slow transfers on my web filesharing server. I'm running FreeBSD 6.2 files are sending by lighttpd 1.4.18 : server.max-worker = 8 server.max-fds = 8192 server.network-backend = "writev" ----- HW: xeon 3160 4GB of RAM areca arc-1260 disk controller with 1GB cache + 8x 500GB sata2 hdd file transfers slow down when i'm about 500 downloading connections but when i send same file from same storage by same line it's sending fast. i think it must be something about lighttpd or sysctl tunning. Can you help me pls? From owner-freebsd-fs@FreeBSD.ORG Mon Sep 17 10:54:14 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1D0E816A417 for ; Mon, 17 Sep 2007 10:54:14 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id 9B3B413C465 for ; Mon, 17 Sep 2007 10:54:13 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from list by ciao.gmane.org with local (Exim 4.43) id 1IXEEv-00008D-BB for freebsd-fs@freebsd.org; Mon, 17 Sep 2007 12:54:05 +0200 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 17 Sep 2007 12:54:05 +0200 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 17 Sep 2007 12:54:05 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Ivan Voras Date: Mon, 17 Sep 2007 12:53:53 +0200 Lines: 59 Message-ID: References: <1189978356.30388.11.camel@afrodita> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-ripemd160; protocol="application/pgp-signature"; boundary="------------enigB8C6DDB4D0EDF1618C3D70D9" X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Thunderbird 1.5.0.12 (X11/20060911) In-Reply-To: <1189978356.30388.11.camel@afrodita> X-Enigmail-Version: 0.94.4.0 Sender: news Subject: Re: slow transfers on webshare service X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Sep 2007 10:54:14 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigB8C6DDB4D0EDF1618C3D70D9 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Miroslav Valenta wrote: > Hi=20 >=20 > i have problem with slow transfers on my web filesharing server. I'm > running FreeBSD 6.2=20 >=20 > files are sending by lighttpd 1.4.18 : >=20 > server.max-worker =3D 8 > server.max-fds =3D 8192 > server.network-backend =3D "writev" >=20 > ----- >=20 > HW: > xeon 3160=20 > 4GB of RAM > areca arc-1260 disk controller with 1GB cache + 8x 500GB sata2 hdd >=20 >=20 > file transfers slow down when i'm about 500 downloading connections > but when i send same file from same storage by same line it's sending > fast. Please characterize "slow" and "fast" - how is it slow and how fast do=20 you think it should be? Some general tips/ideas: - Do you really need 8 workers? Lighttpd is async server so it's about=20 as fast as it gets. If you only serve static files, larger number of=20 worker processes (and CPUs...) won't make a difference. - Do you use the "kqueue" extension for Lighttpd? - 500 parallel downloads probably mean lots of seeking on the disk drive = array, have you verified you can sustain the speed you need with the driv= es? --------------enigB8C6DDB4D0EDF1618C3D70D9 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFG7lzBldnAQVacBcgRA57tAKDp4XaigIoTpXintYjuqSx2fqdelACaAlsF Ojd8SKk6OE3KAg1+RUxo21M= =ufYo -----END PGP SIGNATURE----- --------------enigB8C6DDB4D0EDF1618C3D70D9-- From owner-freebsd-fs@FreeBSD.ORG Mon Sep 17 11:07:59 2007 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DD4BA16A419 for ; Mon, 17 Sep 2007 11:07:59 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id C68EA13C45D for ; Mon, 17 Sep 2007 11:07:59 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.1/8.14.1) with ESMTP id l8HB7xOx049365 for ; Mon, 17 Sep 2007 11:07:59 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.1/8.14.1/Submit) id l8HB7wrq049361 for freebsd-fs@FreeBSD.org; Mon, 17 Sep 2007 11:07:58 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 17 Sep 2007 11:07:58 GMT Message-Id: <200709171107.l8HB7wrq049361@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to you X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Sep 2007 11:08:00 -0000 Current FreeBSD problem reports Critical problems Serious problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o kern/114856 fs [ntfs] [patch] Bug in NTFS allows bogus file modes. o bin/115165 fs [PATCH] amd(8): add functionality of mount_nfs' -L -a o kern/116170 fs Kernel panic when mounting /tmp 5 problems total. Non-critical problems S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/114847 fs [ntfs] [patch] dirmask support for NTFS ala MSDOSFS 1 problem total. From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 12:28:41 2007 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1F1A116A420; Tue, 18 Sep 2007 12:28:41 +0000 (UTC) (envelope-from matteo@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id E8B5E13C4E8; Tue, 18 Sep 2007 12:28:40 +0000 (UTC) (envelope-from matteo@FreeBSD.org) Received: from freefall.freebsd.org (matteo@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.1/8.14.1) with ESMTP id l8ICSeB1047179; Tue, 18 Sep 2007 12:28:40 GMT (envelope-from matteo@freefall.freebsd.org) Received: (from matteo@localhost) by freefall.freebsd.org (8.14.1/8.14.1/Submit) id l8ICSefl047175; Tue, 18 Sep 2007 12:28:40 GMT (envelope-from matteo) Date: Tue, 18 Sep 2007 12:28:40 GMT Message-Id: <200709181228.l8ICSefl047175@freefall.freebsd.org> To: Andre.Albsmeier@siemens.com, matteo@FreeBSD.org, freebsd-fs@FreeBSD.org From: matteo@FreeBSD.org Cc: Subject: Re: bin/115165: [PATCH] amd(8): add functionality of mount_nfs' -L -a -d options to amd X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2007 12:28:41 -0000 Synopsis: [PATCH] amd(8): add functionality of mount_nfs' -L -a -d options to amd State-Changed-From-To: open->closed State-Changed-By: matteo State-Changed-When: Tue Sep 18 12:28:15 UTC 2007 State-Changed-Why: Bug was submitted to upstream maintainer, so no need to keep it open here http://www.freebsd.org/cgi/query-pr.cgi?pr=115165 From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 15:28:51 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4FD1F16A417 for ; Tue, 18 Sep 2007 15:28:51 +0000 (UTC) (envelope-from astrodog@gmail.com) Received: from nz-out-0506.google.com (nz-out-0506.google.com [64.233.162.238]) by mx1.freebsd.org (Postfix) with ESMTP id 125D213C47E for ; Tue, 18 Sep 2007 15:28:50 +0000 (UTC) (envelope-from astrodog@gmail.com) Received: by nz-out-0506.google.com with SMTP id l8so1022481nzf for ; Tue, 18 Sep 2007 08:28:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=kfeitG6T3aV+11kqDB8vy5AarvqhCPiVeOPRCbTnACs=; b=ewC/HftGeP+m1T0fz0B2roJSoJ+dfuKWLOtixiBR9J6vUlE3O4ts56Gq2uDy+OWRW+Cy79k4Ors6XFmcYc7zxDrkR5MYHp0ftT6Iw3r9PxllOT80pteTLYOhONTLNIe0qPMbATj3vhJjPeRDRFh8T54i4ec1N5IrZIeY5aZC83M= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=BONURClYb42GQ6Q1tGlLMjLjChUAEX336PBS3f0nMyL/owvbVVKr2KDaaL5P4JMk9NhAqPmMENBPdO+1tfAZxQF8jQRzTDnQIyBGOkOqVOn2xHSDtm2vhesYZf6FBSTPx1MaWJfXLRhyECqlMRF2PhAFkxbQchZft5ct0GK125Y= Received: by 10.115.60.1 with SMTP id n1mr1248187wak.1190129330030; Tue, 18 Sep 2007 08:28:50 -0700 (PDT) Received: by 10.141.74.5 with HTTP; Tue, 18 Sep 2007 08:28:49 -0700 (PDT) Message-ID: <2fd864e0709180828sec17035m5e575b5ad9701b08@mail.gmail.com> Date: Tue, 18 Sep 2007 10:28:49 -0500 From: Astrodog To: "Bruce Evans" In-Reply-To: <2fd864e0709180815y4c261252tfe9ce5c5a7130462@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <200709180037.l8I0bJb1003933@freefall.freebsd.org> <20070918211449.A75529@besplex.bde.org> <2fd864e0709180453l756d37c6y7dac8fa5fa8fcf15@mail.gmail.com> <2fd864e0709180514w627bb198r46f4ddecb212fd77@mail.gmail.com> <20070918224545.Y75789@besplex.bde.org> <2fd864e0709180815y4c261252tfe9ce5c5a7130462@mail.gmail.com> Cc: freebsd-fs@freebsd.org, linimon@freebsd.org Subject: Re: amd64/74811: [nfs] df, nfs mount, negative Avail -> 32/64-bit confusion X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2007 15:28:51 -0000 On 9/18/07, Astrodog wrote: > > I cannot see how to get the correct result non-accidentally without > > using the hack of passing negative values as large unsigned ones. > > Passing negative values as the difference of two unsigned values works > > with NetBSD's extension to statvfs (f_bresvd), but it doesn't work for > > nfs because it requires an extra value which the protocol doesn't > > support AFAIK (not far). > > > > Bruce > > > > From the above, it doesn't appear that NFS can support negative > values, in any reasonable way... and I suppose that saying "There are > zero blocks avalible for non-privileged users" is accurate, when > bavail <= 0. > > I'm going to dig through the RFCs and see if there's an otherwise > unused or underused variable that could be used to store bresvd, for > clients that could support it. > > Thanks for the detailed explaination, > --- Harrison > The only thing I've found, thus far, is to hijack the "NULL" NFSv3 operation. From what I can tell, clients are expected to discard the value. On clients that are supported, the returned value can be what should be subtracted from bfree to get bavail. bavail can be handled as it is now in the server, so non-supporting clients wouldn't see any change in behavior, beyond a NULL nfs operation taking a few cycles longer. Any thoughts? I'm aware that this certainly isn't proper behavior... but I also can't find anything that actually uses the NULL return. --- Harrison From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 15:39:42 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E6A8F16A419 for ; Tue, 18 Sep 2007 15:39:42 +0000 (UTC) (envelope-from astrodog@gmail.com) Received: from rv-out-0910.google.com (rv-out-0910.google.com [209.85.198.188]) by mx1.freebsd.org (Postfix) with ESMTP id BE55A13C46A for ; Tue, 18 Sep 2007 15:39:42 +0000 (UTC) (envelope-from astrodog@gmail.com) Received: by rv-out-0910.google.com with SMTP id l15so1511849rvb for ; Tue, 18 Sep 2007 08:39:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; bh=xwfqS5QrmfgE9WjVCr83fOS1aqXpehucbHk60RosuDU=; b=kYjFDlFvu0LyvZo1tWBAwI0fGhQLCTb+U7HFXZKL7kd0BiIiC2OfimkLKuSlNIP+HFw7/IZK+M9INcYIL6EwxuFnCKdaEXJ/Bhj/6vdSq3xbLz/+Yes915IvRFaKH99qHT/xkkxtSE9O5Pt6EMb27Zjw1gQ1DlREN63EDkouadU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=uRWqGUpKakqOHZ4RU0krG4dUUg2sbSUQ+jXHq7ZM4YSJy2MD+Iz1B3rRvT0mLs/MHr4D34tcnMpHN8Z/99JtXr2xJWVj3tURY08G3wfnnu+BG+uP45lh6m043bsyguNFTWLk1zwEdw/kxVUPfQeEdE2Ce9bm6U5NQnAqU/wNuSk= Received: by 10.141.50.17 with SMTP id c17mr1142414rvk.1190128510127; Tue, 18 Sep 2007 08:15:10 -0700 (PDT) Received: by 10.141.74.5 with HTTP; Tue, 18 Sep 2007 08:15:10 -0700 (PDT) Message-ID: <2fd864e0709180815y4c261252tfe9ce5c5a7130462@mail.gmail.com> Date: Tue, 18 Sep 2007 10:15:10 -0500 From: Astrodog To: "Bruce Evans" In-Reply-To: <20070918224545.Y75789@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <200709180037.l8I0bJb1003933@freefall.freebsd.org> <20070918211449.A75529@besplex.bde.org> <2fd864e0709180453l756d37c6y7dac8fa5fa8fcf15@mail.gmail.com> <2fd864e0709180514w627bb198r46f4ddecb212fd77@mail.gmail.com> <20070918224545.Y75789@besplex.bde.org> Cc: freebsd-fs@freebsd.org, linimon@freebsd.org Subject: Re: amd64/74811: [nfs] df, nfs mount, negative Avail -> 32/64-bit confusion X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2007 15:39:43 -0000 > I cannot see how to get the correct result non-accidentally without > using the hack of passing negative values as large unsigned ones. > Passing negative values as the difference of two unsigned values works > with NetBSD's extension to statvfs (f_bresvd), but it doesn't work for > nfs because it requires an extra value which the protocol doesn't > support AFAIK (not far). > > Bruce > >From the above, it doesn't appear that NFS can support negative values, in any reasonable way... and I suppose that saying "There are zero blocks avalible for non-privileged users" is accurate, when bavail <= 0. I'm going to dig through the RFCs and see if there's an otherwise unused or underused variable that could be used to store bresvd, for clients that could support it. Thanks for the detailed explaination, --- Harrison From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 18:09:36 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4A17616A469; Tue, 18 Sep 2007 18:09:36 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from elsa.codelab.cz (elsa.codelab.cz [82.208.36.70]) by mx1.freebsd.org (Postfix) with ESMTP id 0EFC513C465; Tue, 18 Sep 2007 18:09:35 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from localhost (localhost.codelab.cz [127.0.0.1]) by elsa.codelab.cz (Postfix) with ESMTP id 523C019E02A; Tue, 18 Sep 2007 19:51:12 +0200 (CEST) Received: from [192.168.1.2] (r3a200.net.upc.cz [213.220.192.200]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by elsa.codelab.cz (Postfix) with ESMTP id C0B4619E027; Tue, 18 Sep 2007 19:51:09 +0200 (CEST) Message-ID: <46F0105A.3060309@quip.cz> Date: Tue, 18 Sep 2007 19:52:26 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 X-Accept-Language: cz, cs, en, en-us MIME-Version: 1.0 To: Ivan Voras References: <1189978356.30388.11.camel@afrodita> In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: slow transfers on webshare service X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2007 18:09:36 -0000 Ivan Voras wrote: > Miroslav Valenta wrote: > >> Hi >> i have problem with slow transfers on my web filesharing server. I'm >> running FreeBSD 6.2 >> files are sending by lighttpd 1.4.18 : >> >> server.max-worker = 8 >> server.max-fds = 8192 >> server.network-backend = "writev" >> >> ----- >> >> HW: >> xeon 3160 4GB of RAM >> areca arc-1260 disk controller with 1GB cache + 8x 500GB sata2 hdd >> >> >> file transfers slow down when i'm about 500 downloading connections >> but when i send same file from same storage by same line it's sending >> fast. > > > Please characterize "slow" and "fast" - how is it slow and how fast do > you think it should be? > > Some general tips/ideas: > > - Do you really need 8 workers? Lighttpd is async server so it's about > as fast as it gets. If you only serve static files, larger number of > worker processes (and CPUs...) won't make a difference. > - Do you use the "kqueue" extension for Lighttpd? > - 500 parallel downloads probably mean lots of seeking on the disk drive > array, have you verified you can sustain the speed you need with the > drives? I am running Lighttpd for about 2 years on download server. With 1 worker, lighttpd seems limited to 110Mbps. After change to 4 workers throughput increase to about 190Mbps serving 250-400 clients. (daily traffic is ~750GB) I am using these settings: server.event-handler = "freebsd-kqueue" # needed on OS X #server.network-backend = "freebsd-sendfile" # better for small files server.network-backend = "writev" # better for large files? server.max-keep-alive-requests = 6 server.max-keep-alive-idle = 5 server.max-read-idle = 60 server.max-write-idle = 180 Miroslav Lachman From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 20:36:27 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F201216A418 for ; Tue, 18 Sep 2007 20:36:26 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id A55B913C4B4 for ; Tue, 18 Sep 2007 20:36:26 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from list by ciao.gmane.org with local (Exim 4.43) id 1IXjn6-00077w-Ey for freebsd-fs@freebsd.org; Tue, 18 Sep 2007 22:35:28 +0200 Received: from 78-0-83-191.adsl.net.t-com.hr ([78.0.83.191]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 18 Sep 2007 22:35:28 +0200 Received: from ivoras by 78-0-83-191.adsl.net.t-com.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 18 Sep 2007 22:35:28 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Ivan Voras Date: Tue, 18 Sep 2007 22:34:03 +0200 Lines: 31 Message-ID: References: <1189978356.30388.11.camel@afrodita> <46F0105A.3060309@quip.cz> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigA2CE85D9E26CCCE75676BF1B" X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: 78-0-83-191.adsl.net.t-com.hr User-Agent: Thunderbird 2.0.0.6 (Windows/20070728) In-Reply-To: <46F0105A.3060309@quip.cz> X-Enigmail-Version: 0.95.3 Sender: news Subject: Re: slow transfers on webshare service X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2007 20:36:27 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigA2CE85D9E26CCCE75676BF1B Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Miroslav Lachman wrote: > I am running Lighttpd for about 2 years on download server. With 1 > worker, lighttpd seems limited to 110Mbps. After change to 4 workers > throughput increase to about 190Mbps serving 250-400 clients. (daily > traffic is ~750GB) Ok. Can you run iostat during peak traffic and report its output? So we rule out disk drives. --------------enigA2CE85D9E26CCCE75676BF1B Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG8DZEldnAQVacBcgRAgYeAKDKpUpGebfbwQaMkL5F2FY+VNgt8QCgnhsp In4bDd09Xzw09JTS4Ra4mhQ= =FRDG -----END PGP SIGNATURE----- --------------enigA2CE85D9E26CCCE75676BF1B-- From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 21:33:01 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 300B716A418 for ; Tue, 18 Sep 2007 21:33:01 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx03.syd.optusnet.com.au (fallbackmx03.syd.optusnet.com.au [211.29.133.136]) by mx1.freebsd.org (Postfix) with ESMTP id B4C5E13C467 for ; Tue, 18 Sep 2007 21:33:00 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au [211.29.132.184]) by fallbackmx03.syd.optusnet.com.au (8.12.11.20060308/8.12.11) with ESMTP id l8IEq306024489 for ; Wed, 19 Sep 2007 00:52:03 +1000 Received: from besplex.bde.org (c220-239-235-248.carlnfd3.nsw.optusnet.com.au [220.239.235.248]) by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id l8IEpvl9028680 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 19 Sep 2007 00:51:59 +1000 Date: Wed, 19 Sep 2007 00:51:57 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Astrodog In-Reply-To: <2fd864e0709180514w627bb198r46f4ddecb212fd77@mail.gmail.com> Message-ID: <20070918224545.Y75789@besplex.bde.org> References: <200709180037.l8I0bJb1003933@freefall.freebsd.org> <20070918211449.A75529@besplex.bde.org> <2fd864e0709180453l756d37c6y7dac8fa5fa8fcf15@mail.gmail.com> <2fd864e0709180514w627bb198r46f4ddecb212fd77@mail.gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org, linimon@freebsd.org Subject: Re: amd64/74811: [nfs] df, nfs mount, negative Avail -> 32/64-bit confusion X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2007 21:33:01 -0000 [Redirected a bit] On Tue, 18 Sep 2007, Astrodog wrote: > On 9/18/07, Astrodog wrote: >> On 9/18/07, Bruce Evans wrote: >>> -current still breaks negative avail counts on the server by clamping them >>> them to 0, so the bug is less obvious on buggy clients. >> >> It appears that RFC 1094 calls for blocks free to be unsigned (2.2.8). >> I don't know how this could be handled, besides clamping, though. > > Rather, it calls for blocks free, and blocks availible to be unsigned. D'oh. RFC 1094 only covers nfsv2. That is so crufty that its RFC even specifies precisely unsigned for almost everything. IIRC, nfsv3 also specifies an unsigned type for the avail count, but that type is uint64_t, and nfsv4 is similar. This is clearly a bug in the spec, or rather, the spec doesn't support BSD's primary file system or what BSD's nfs has always done. nfs in FreeBSD-[1-4] ignores the spec and passes negative values as large unsigned ones, mostly by blindly copying bits. The server was broken on 2004/04/12 (between 5.2R and 5.3R). FreeBSD clients still try to support negative values being passed as large unsigned ones, but the clients with a 32-bit statfs have a lot of sign extension and overflow bugs that are most serious for such values. The design bug also affects statvfs(3). POSIX standardized this but not BSD's statfs(2). Most things in struct statvfs are typedefed almost to a fault (fsblkcnt_t for block counts, and fsfilcnt_t for file counts), but fsblkcnt_t is specified to be an unsigned type, so negative avail counts cannot work without hacks. I don't know how to work around the design bug for all clients. Clamping on the server seems to be best if the client doesn't support negative avail counts. NetBSD has large changes in this area, but they seem to reduce to clamping. In at least nfs_vfsops.c 1.144 (2005/01/02): - On the server, there is no clamping, but I think negative values can't happen anyway because the avail counts are obtained from the statvfs interface and statvfs is broken (but see below about f_bresvd; f_bresvd is not used here so something like clamping happens automatically). NetBSD has also fixed bogus truncation of file counts to 32 bits in the v3 case. Truncation is still blind, but only has to be to 32 bits for the v2 case. - on the client, the avail count is converted into statvfs's avail count (f_bavail) plus a NetBSD (?) extension of statvfs (f_bresvd). I think f_bresvd is NetBSD's solution to the design bug for statvfs and NetBSD needs this more than FreeBSD because NetBSD has converted many (?) utilities from statfs to statvfs. For nfs_statvfs(), f_bresvd is initialized to f_bfree - f_bavail (where the free and avail counts are whatever is passed by then server>. Then under a COMPAT_20 ifdef, avail counts which are so large that they can only be from an "old" server that is trying to pass a negative count, because b_avail to be set to 0. In applications like df, the final avail count is f_bfree - f_bresvd. This can easily be negative, and should be negative when f_bfree is 0 (no space for non-root) and f_bresvd is nonzero (some space for root). However, nfs can only initialize things correctly if the server is "old" (= not broken to spec). If the server is not "old" then the initializations are just: f_bfree = server f_bfree f_bavail = server f_bavail f_bresvd = f_bfree - f_bavail # XXX no way to know server f_bresvd and these are used like the following in df: # f_bavail is not used in df! avail = f_bfree - f_bresvd = f_bfree - (f_bfree - f_bavail) = f_bavail = server f_bavail (cast to int64_t) The resulting `int64_t avail' can only be negative if f_bavail is "negative" on the server, but we are using a difference and never using f_bavail in df to avoid abusing f_bavail for holding negative values, and in the case where the server actually passes us a "negative" f_bavail and COMPAT_20 is configured, then we clobber f_bavail to 0 in nfs_statvfs() and end up getting avail = f_bfree in df -- completely wrong. So the NetBSD code only seems to give the correct result accidentally, if the correct result is to print a negative avail count in df. It takes an "old" server and a client that thinks it doesn't support old servers (no COMPAT_20 configured). I cannot see how to get the correct result non-accidentally without using the hack of passing negative values as large unsigned ones. Passing negative values as the difference of two unsigned values works with NetBSD's extension to statvfs (f_bresvd), but it doesn't work for nfs because it requires an extra value which the protocol doesn't support AFAIK (not far). Bruce From owner-freebsd-fs@FreeBSD.ORG Wed Sep 19 07:36:24 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C6F0616A418 for ; Wed, 19 Sep 2007 07:36:24 +0000 (UTC) (envelope-from freebsd-fs@adam.gs) Received: from mail.adam.gs (mail.adam.gs [76.9.2.116]) by mx1.freebsd.org (Postfix) with ESMTP id 68C2D13C442 for ; Wed, 19 Sep 2007 07:36:24 +0000 (UTC) (envelope-from freebsd-fs@adam.gs) Received: from [127.0.0.1] (localhost.adam.gs [127.0.0.1]) by mail.adam.gs (Postfix) with ESMTP id 931A8F354AD for ; Wed, 19 Sep 2007 03:24:27 -0400 (EDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=simple; s=mail; d=adam.gs; b=UdIe9DWTAOtJoKU3g9jFzDLVIKJpU0NtDE+MARy76xhTKACBPSWO3Fh+tonm111XeyA5J74ARnsiLiRNI7BrdzU4crTz65kC+DHhX+KhjWf5C0n2Lcr02j+hYoO3dEcE8tCayXrg9ikEImU9Rq6PmcHxNE1oAkUBpk0HPonGJv8=; Received: from [10.0.1.125] (unknown [64.111.192.110]) (Authenticated sender: adam@adam.gs) by mail.adam.gs (Postfix) with ESMTP id 229A6F34D64; Wed, 19 Sep 2007 03:24:27 -0400 (EDT) Mime-Version: 1.0 (Apple Message framework v752.3) Content-Transfer-Encoding: 7bit Message-Id: Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed To: freebsd-current@freebsd.org, freebsd-fs@freebsd.org From: Adam Jacob Muller Date: Wed, 19 Sep 2007 03:24:25 -0400 X-Mailer: Apple Mail (2.752.3) Cc: Subject: ZFS pool not working on boot X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Sep 2007 07:36:24 -0000 Hello, I have a server with two ZFS pools, one is an internal raid0 using 2 drives connected via ahc. The other is an external storage array with 11 drives also using ahc, using raidz. (This is a dell 1650 and pv220s). On reboot, the pools do not come online on their own. Both pools consistently show as failed. the exact symptoms vary, however I have seen that many drives are marked as variously "corrupt" or "unavailable" most zpool operations fail with "pool is unavailable" errors. Here is the interesting part. Consistently, 100% of the time, a zpool export followed by a zpool import restores the arrays to an ONLINE status. Once the array is online, it's quite stable (I'm loving ZFS btw, thank you to everyone for the hard work on this, ZFS is fantastic) and works great. Anyone have any ideas why this might occur and what/if the solution is? Any additional information can be provided on-request, I am running current from approximately 1 week ago. -Adam From owner-freebsd-fs@FreeBSD.ORG Wed Sep 19 08:44:22 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DC66216A417 for ; Wed, 19 Sep 2007 08:44:22 +0000 (UTC) (envelope-from wilkinsa@obelix.dsto.defence.gov.au) Received: from digger1.defence.gov.au (digger1.defence.gov.au [203.5.217.4]) by mx1.freebsd.org (Postfix) with ESMTP id 4822813C45D for ; Wed, 19 Sep 2007 08:44:22 +0000 (UTC) (envelope-from wilkinsa@obelix.dsto.defence.gov.au) Received: from ednmsw510.dsto.defence.gov.au (ednmsw510.dsto.defence.gov.au [131.185.68.11]) by digger1.defence.gov.au (8.13.8/8.13.8) with ESMTP id l8J8ES3N017361; Wed, 19 Sep 2007 17:44:28 +0930 (CST) Received: from ednex510.dsto.defence.gov.au (ednex510.dsto.defence.gov.au) by ednmsw510.dsto.defence.gov.au (Clearswift SMTPRS 5.2.9) with ESMTP id ; Wed, 19 Sep 2007 17:55:52 +0930 Received: from obelix.dsto.defence.gov.au ([203.6.60.208]) by ednex510.dsto.defence.gov.au with Microsoft SMTPSVC(6.0.3790.1830); Wed, 19 Sep 2007 17:55:51 +0930 Received: from obelix.dsto.defence.gov.au (localhost [127.0.0.1]) by obelix.dsto.defence.gov.au (8.14.1/8.14.1) with ESMTP id l8J8PpGR056743; Wed, 19 Sep 2007 16:25:51 +0800 (WST) (envelope-from wilkinsa@obelix.dsto.defence.gov.au) Received: (from wilkinsa@localhost) by obelix.dsto.defence.gov.au (8.14.1/8.14.1/Submit) id l8J8Ppkr056742; Wed, 19 Sep 2007 16:25:51 +0800 (WST) (envelope-from wilkinsa) Date: Wed, 19 Sep 2007 16:25:51 +0800 From: "Wilkinson, Alex" To: freebsd-current@freebsd.org, freebsd-fs@freebsd.org Message-ID: <20070919082551.GS55051@obelix.dsto.defence.gov.au> Mail-Followup-To: freebsd-current@freebsd.org, freebsd-fs@freebsd.org References: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: Organisation: Defence Science Technology Organisation User-Agent: Mutt/1.5.16 (2007-06-09) X-OriginalArrivalTime: 19 Sep 2007 08:25:51.0952 (UTC) FILETIME=[B0AD1500:01C7FA96] X-TM-AS-Product-Ver: SMEX-7.0.0.1526-5.0.1021-15432.002 X-TM-AS-Result: No--7.209800-0.000000-31 Content-Transfer-Encoding: 7bit Cc: Subject: Re: ZFS pool not working on boot X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Sep 2007 08:44:22 -0000 0n Wed, Sep 19, 2007 at 03:24:25AM -0400, Adam Jacob Muller wrote: >I have a server with two ZFS pools, one is an internal raid0 using 2 drives >connected via ahc. The other is an external storage array with 11 drives >also using ahc, using raidz. (This is a dell 1650 and pv220s). >On reboot, the pools do not come online on their own. Both pools >consistently show as failed. Make sure your hostid doesn't change. If it does. Then ZFS will fail upon bootstrap. -aW IMPORTANT: This email remains the property of the Australian Defence Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT 1914. If you have received this email in error, you are requested to contact the sender and delete the email. From owner-freebsd-fs@FreeBSD.ORG Wed Sep 19 17:56:45 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4A90916A417; Wed, 19 Sep 2007 17:56:45 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from elsa.codelab.cz (elsa.codelab.cz [82.208.36.70]) by mx1.freebsd.org (Postfix) with ESMTP id 017C413C442; Wed, 19 Sep 2007 17:56:44 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from localhost (localhost.codelab.cz [127.0.0.1]) by elsa.codelab.cz (Postfix) with ESMTP id 1ABBD19E02A; Wed, 19 Sep 2007 19:56:43 +0200 (CEST) Received: from [192.168.1.2] (r3a200.net.upc.cz [213.220.192.200]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by elsa.codelab.cz (Postfix) with ESMTP id AEEDB19E027; Wed, 19 Sep 2007 19:56:40 +0200 (CEST) Message-ID: <46F16327.3010604@quip.cz> Date: Wed, 19 Sep 2007 19:57:59 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 X-Accept-Language: cz, cs, en, en-us MIME-Version: 1.0 To: Ivan Voras References: <1189978356.30388.11.camel@afrodita> <46F0105A.3060309@quip.cz> In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: slow transfers on webshare service X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Sep 2007 17:56:45 -0000 Ivan Voras wrote: > Miroslav Lachman wrote: > > >>I am running Lighttpd for about 2 years on download server. With 1 >>worker, lighttpd seems limited to 110Mbps. After change to 4 workers >>throughput increase to about 190Mbps serving 250-400 clients. (daily >>traffic is ~750GB) > > > Ok. Can you run iostat during peak traffic and report its output? So we > rule out disk drives. I have external disk array (da1) which is saturated at this time: # iostat -w 5 tty da0 da1 pass0 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 1 12.91 3 0.04 59.30 95 5.47 0.00 0 0.00 4 0 24 21 51 0 92 2.00 1 0.00 59.28 251 14.51 0.00 0 0.00 1 0 5 5 89 0 31 10.67 1 0.01 59.03 226 13.03 0.00 0 0.00 0 0 5 5 90 0 31 17.74 17 0.29 58.80 231 13.29 0.00 0 0.00 0 0 4 4 92 0 31 0.00 0 0.00 58.39 222 12.67 0.00 0 0.00 1 0 5 4 90 from systat -vmstat Disks da0 da1 pass0 pass1 pass2 KB/t 0.00 59.53 0.00 0.00 0.00 tps 0 226 0 0 0 MB/s 0.00 13.14 0.00 0.00 0.00 % busy 0 97 0 0 0 But without 4 workers a can't saturate array and traffic graph (by MRTG) will be limited on fixed bandwidth. So setting Lighttpd to use 4 workers definitely help in my case. (I don't know if it is same for Miroslav Valenta) Miroslav Lachman From owner-freebsd-fs@FreeBSD.ORG Wed Sep 19 18:02:04 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 032F616A419 for ; Wed, 19 Sep 2007 18:02:04 +0000 (UTC) (envelope-from qpadla@gmail.com) Received: from ug-out-1314.google.com (ug-out-1314.google.com [66.249.92.169]) by mx1.freebsd.org (Postfix) with ESMTP id 7296713C4F7 for ; Wed, 19 Sep 2007 18:02:02 +0000 (UTC) (envelope-from qpadla@gmail.com) Received: by ug-out-1314.google.com with SMTP id a2so345482ugf for ; Wed, 19 Sep 2007 11:02:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:from:reply-to:to:subject:date:user-agent:cc:references:in-reply-to:mime-version:content-type:content-transfer-encoding:message-id; bh=Ih9eOwvYLzYNXFENTz4QnYvG65n2bwuJ3r/o9Qaks9c=; b=XT3T/kuX5Dzh52jqyxo1JF/NiWZPwMKDZz0xtiK7WriMY58FCUtYZ06106WiacPtQAc3d3z6VRVMejUvLF1EaMVklD2Kc9IYX9qnkoxxN0oMTYi5ghOMB6kFhVUGjfSKt7b5dxbpoNb8dtgSD3M/rYUulOEhFPfity+Xt+bb+hI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:from:reply-to:to:subject:date:user-agent:cc:references:in-reply-to:mime-version:content-type:content-transfer-encoding:message-id; b=YntcYChpzVQ8UkpTPOkN2d8IKHnKPBK3B3ni3YO8zKFBwX8u+v8UI5GcxXKH+OLqxkR3Uhf1s2jYbY5Ixh7+jwCaTlbd+aG3i0keJPzYp9J9wsP2FXMGXIaR9SN9+n6h1WXXhO3xs0kN8kA9dQMxlskUn7NLU+JrS0inTRVMiqs= Received: by 10.78.136.9 with SMTP id j9mr601144hud.1190224921368; Wed, 19 Sep 2007 11:02:01 -0700 (PDT) Received: from orion ( [77.109.33.29]) by mx.google.com with ESMTPS id 39sm1091806hug.2007.09.19.11.01.57 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 19 Sep 2007 11:02:00 -0700 (PDT) From: Nikolay Pavlov To: freebsd-fs@freebsd.org Date: Wed, 19 Sep 2007 21:01:47 +0300 User-Agent: KMail/1.9.7 References: <1189978356.30388.11.camel@afrodita> <46F16327.3010604@quip.cz> In-Reply-To: <46F16327.3010604@quip.cz> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart2108894.xJiG5SVbVi"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200709192101.51636.qpadla@gmail.com> Cc: Ivan Voras Subject: Re: slow transfers on webshare service X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: qpadla@gmail.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Sep 2007 18:02:04 -0000 --nextPart2108894.xJiG5SVbVi Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On Wednesday 19 September 2007 20:57:59 Miroslav Lachman wrote: > Ivan Voras wrote: > > Miroslav Lachman wrote: > >>I am running Lighttpd for about 2 years on download server. With 1 > >>worker, lighttpd seems limited to 110Mbps. After change to 4 workers > >>throughput increase to about 190Mbps serving 250-400 clients. (daily > >>traffic is ~750GB) > > > > Ok. Can you run iostat during peak traffic and report its output? So > > we rule out disk drives. > > I have external disk array (da1) which is saturated at this time: > > # iostat -w 5 > tty da0 da1 pass0 =20 > cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy > in id 0 1 12.91 3 0.04 59.30 95 5.47 0.00 0 0.00 4 0 24 > 21 51 0 92 2.00 1 0.00 59.28 251 14.51 0.00 0 0.00 1 0 5 > 5 89 0 31 10.67 1 0.01 59.03 226 13.03 0.00 0 0.00 0 0 5 > 5 90 0 31 17.74 17 0.29 58.80 231 13.29 0.00 0 0.00 0 0 4 > 4 92 0 31 0.00 0 0.00 58.39 222 12.67 0.00 0 0.00 1 0 5 > 4 90 > > from systat -vmstat > > Disks da0 da1 pass0 pass1 pass2 > KB/t 0.00 59.53 0.00 0.00 0.00 > tps 0 226 0 0 0 > MB/s 0.00 13.14 0.00 0.00 0.00 > % busy 0 97 0 0 0 > > But without 4 workers a can't saturate array and traffic graph (by MRTG) > will be limited on fixed bandwidth. > So setting Lighttpd to use 4 workers definitely help in my case. (I > don't know if it is same for Miroslav Valenta) > > Miroslav Lachman > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" Is there any reason to do not use sendfile as it is in your lighttpd=20 config? =2D-=20 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 =2D Best regards, Nikolay Pavlov. <<<----------------------------------- = =20 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 --nextPart2108894.xJiG5SVbVi Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQBG8WQP/2R6KvEYGaIRAl8mAKDPGv6THoq2knQ58D/a0qz8yiB36gCggc6H EXSdA1RuhVXeDvBwsQJf+6Q= =5mYe -----END PGP SIGNATURE----- --nextPart2108894.xJiG5SVbVi-- From owner-freebsd-fs@FreeBSD.ORG Wed Sep 19 22:13:18 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B4BC616A420 for ; Wed, 19 Sep 2007 22:13:18 +0000 (UTC) (envelope-from freebsd-fs@adam.gs) Received: from mail.adam.gs (mail.adam.gs [76.9.2.116]) by mx1.freebsd.org (Postfix) with ESMTP id 8021F13C481 for ; Wed, 19 Sep 2007 22:13:18 +0000 (UTC) (envelope-from freebsd-fs@adam.gs) Received: from [127.0.0.1] (localhost.adam.gs [127.0.0.1]) by mail.adam.gs (Postfix) with ESMTP id A4B64F3555A for ; Wed, 19 Sep 2007 18:13:17 -0400 (EDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=simple; s=mail; d=adam.gs; b=Iu40a2222tG6i1E2r8rF7JFbGuMROpFojC+ef03eAmXaSzHZLlDgjLl19pJ/OUe108T3Va6r/hKEXn2gav8lAdsa1cmmPIRnUr5j+wk4OXaaUirB1W4BPlSovxE1SyR7aH9AP7abt81yYjYqRWIvkaYIlipgaWszFi+H4cNHLiI=; Received: from [66.230.128.46] (unknown [66.230.128.46]) (Authenticated sender: adam@adam.gs) by mail.adam.gs (Postfix) with ESMTP id 4DECDF35417; Wed, 19 Sep 2007 18:13:17 -0400 (EDT) In-Reply-To: <20070919082551.GS55051@obelix.dsto.defence.gov.au> References: <20070919082551.GS55051@obelix.dsto.defence.gov.au> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Adam Jacob Muller Date: Wed, 19 Sep 2007 18:13:15 -0400 To: "Wilkinson, Alex" X-Mailer: Apple Mail (2.752.3) Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: ZFS pool not working on boot X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Sep 2007 22:13:18 -0000 On Sep 19, 2007, at 4:25 AM, Wilkinson, Alex wrote: > 0n Wed, Sep 19, 2007 at 03:24:25AM -0400, Adam Jacob Muller wrote: > >> I have a server with two ZFS pools, one is an internal raid0 using >> 2 drives >> connected via ahc. The other is an external storage array with 11 >> drives >> also using ahc, using raidz. (This is a dell 1650 and pv220s). >> On reboot, the pools do not come online on their own. Both pools >> consistently show as failed. > > Make sure your hostid doesn't change. If it does. Then ZFS will > fail upon bootstrap. > > -aW > No, The hostid is not changing, just rebooted and replicated the problem. Also it seems like from reading ZFS docs that the symptoms would be that the pool would simply need to be imported again if the host id changed? after another reboot, I see this: # zpool status pool: tank state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-D3 scrub: none requested config: NAME STATE READ WRITE CKSUM tank UNAVAIL 0 0 0 insufficient replicas da1 ONLINE 0 0 0 da2 UNAVAIL 0 0 0 cannot open ... more output showing the other array with 11 drives is fine # zpool export tank # zpool import tank # zpool status pool: tank state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 errors: No known data errors (11-drive raidz is fine still of course) From owner-freebsd-fs@FreeBSD.ORG Wed Sep 19 22:56:10 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EF41816A420 for ; Wed, 19 Sep 2007 22:56:10 +0000 (UTC) (envelope-from johan@stromnet.se) Received: from av7-1-sn3.vrr.skanova.net (av7-1-sn3.vrr.skanova.net [81.228.9.181]) by mx1.freebsd.org (Postfix) with ESMTP id 7137713C45A for ; Wed, 19 Sep 2007 22:56:09 +0000 (UTC) (envelope-from johan@stromnet.se) Received: by av7-1-sn3.vrr.skanova.net (Postfix, from userid 502) id BE82A38054; Thu, 20 Sep 2007 00:31:00 +0200 (CEST) Received: from smtp3-2-sn3.vrr.skanova.net (smtp3-2-sn3.vrr.skanova.net [81.228.9.102]) by av7-1-sn3.vrr.skanova.net (Postfix) with ESMTP id AA09E37F09 for ; Thu, 20 Sep 2007 00:31:00 +0200 (CEST) Received: from phomca.stromnet.se (90-224-172-102-no129.tbcn.telia.com [90.224.172.102]) by smtp3-2-sn3.vrr.skanova.net (Postfix) with ESMTP id 3B4CF37E44 for ; Thu, 20 Sep 2007 00:32:03 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by phomca.stromnet.se (Postfix) with ESMTP id DA429BAF1 for ; Thu, 20 Sep 2007 00:32:03 +0200 (CEST) X-Virus-Scanned: amavisd-new at stromnet.se Received: from phomca.stromnet.se ([127.0.0.1]) by localhost (phomca.stromnet.se [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DuPDfuizN5G5 for ; Thu, 20 Sep 2007 00:32:00 +0200 (CEST) Received: from [IPv6:2001:16d8:ff20:1:217:f2ff:fef0:d6b7] (unknown [IPv6:2001:16d8:ff20:1:217:f2ff:fef0:d6b7]) by phomca.stromnet.se (Postfix) with ESMTP id AEF9ABAEF for ; Thu, 20 Sep 2007 00:32:00 +0200 (CEST) Mime-Version: 1.0 (Apple Message framework v752.3) Content-Transfer-Encoding: quoted-printable Message-Id: Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed To: freebsd-fs@freebsd.org From: =?ISO-8859-1?Q?Johan_Str=F6m?= Date: Thu, 20 Sep 2007 00:31:56 +0200 X-Mailer: Apple Mail (2.752.3) Subject: ZFS (and quota) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Sep 2007 22:56:11 -0000 Hello I just installed FreeBSD-current on a box (actually upgraded 6.2 to -=20 current) to experiment a bit. I was playing around with ZFS a bit and tried out the quota features. =20= While doing this I noticed that it doesnt seem like you get a "disk =20 full" notice the same way as you do on a "normal" (UFS) filesystem. =20 Instead of aborting the operation with "No space left on device" it =20 just continued: [root@devbox ~]# zpool create tank /dev/ad2 [root@devbox ~]# zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT tank 37.2G 111K 37.2G 0% ONLINE - [root@devbox /tank]# zfs create -V 10M tank/set3vol [root@devbox /tank]# newfs /dev/zvol/tank/set3vol /dev/zvol/tank/set3vol: 10.0MB (20480 sectors) block size 16384, =20 fragment size 2048 using 4 cylinder groups of 2.52MB, 161 blks, 384 inodes. super-block backups (for fsck -b #) at: 160, 5312, 10464, 15616 [root@devbox /tank]# mount /dev/zvol/tank/set3vol set3vol/ [root@devbox /tank]# cd set3vol/ [root@devbox /tank/set3vol]# dd if=3D/dev/urandom of=3Dtest /tank/set3vol: write failed, filesystem is full dd: test: No space left on device 19169+0 records in 19168+0 records out 9814016 bytes transferred in 2.276896 secs (4310261 bytes/sec) [root@devbox /tank]# zfs create tank/set2 [root@devbox /tank/set2]# zfs set quota=3D10M tank/set2 [root@devbox /tank/set2]# zfs get quota tank/set2 NAME PROPERTY VALUE SOURCE tank/set2 quota 10M local [root@devbox /tank/set2]# dd if=3D/dev/urandom of=3Dtest ^C 18563+0 records in 18562+0 records out 9503744 bytes transferred in 199.564353 secs (47622 bytes/sec) [root@devbox /tank/set2]# zfs list tank/set2 NAME USED AVAIL REFER MOUNTPOINT tank/set2 9.15M 870K 9.15M /tank/set2 No hard stop there, it just tries to write more and more and more.. =20 Well the quota is enforced fine but shouldnt there be some more hard =20 error? I'm not sure how regular UFS quotas work though since I never =20 used them, but this seems like strange behaviour. Anyway, how "stable" is the ZFS support and -current / Fbsd7 in =20 general now? I'm about to get a new server, 8 core xeon thingy with =20 lots of disk, so I would probably benifit very much from running =20 freebsd-7 (much better multicore performance if i've understood =20 correct). Beeing able to use ZFS for some of my jails would rock too, =20= having individual quotas and all the other flexibilitys ZFS provides =20 (ie creating a new set for every jail and enforce individual quota).. =20= Would anyone dare to do this on a production machine yet? Is anyone =20 doing it? Well, it can't be said to many times, keep up the good work! Thanks =20 all fbsd developers (and others too!) :) -- Johan Str=F6m Stromnet johan@stromnet.se http://www.stromnet.se/ From owner-freebsd-fs@FreeBSD.ORG Wed Sep 19 23:01:43 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D688316A417; Wed, 19 Sep 2007 23:01:43 +0000 (UTC) (envelope-from acd@acd.homelinux.org) Received: from server0.acapsecurity.com (acapsecurity.com [207.38.28.204]) by mx1.freebsd.org (Postfix) with ESMTP id B2E5813C459; Wed, 19 Sep 2007 23:01:43 +0000 (UTC) (envelope-from acd@acd.homelinux.org) Received: from acd.homelinux.org (unknown [10.0.15.5]) by server0.acapsecurity.com (Postfix) with ESMTP id 63B4822B6E; Wed, 19 Sep 2007 15:44:50 -0700 (PDT) Received: by acd.homelinux.org (Postfix, from userid 500) id C86593051; Wed, 19 Sep 2007 16:44:49 -0600 (MDT) From: Axel To: Adam Jacob Muller References: Date: Wed, 19 Sep 2007 16:44:49 -0600 In-Reply-To: (Adam Jacob Muller's message of "Wed\, 19 Sep 2007 03\:24\:25 -0400") Message-ID: User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: ZFS pool not working on boot X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Sep 2007 23:01:44 -0000 Adam Jacob Muller writes: > Hello, > I have a server with two ZFS pools, one is an internal raid0 using 2 > drives connected via ahc. The other is an external storage array with > 11 drives also using ahc, using raidz. (This is a dell 1650 and > pv220s). > On reboot, the pools do not come online on their own. Both pools > consistently show as failed. > > the exact symptoms vary, however I have seen that many drives are > marked as variously "corrupt" or "unavailable" most zpool operations > fail with "pool is unavailable" errors. > > Here is the interesting part. > Consistently, 100% of the time, a zpool export followed by a zpool > import restores the arrays to an ONLINE status. Once the array is > online, it's quite stable (I'm loving ZFS btw, thank you to everyone > for the hard work on this, ZFS is fantastic) and works great. > > Anyone have any ideas why this might occur and what/if the solution is? > > Any additional information can be provided on-request, I am running > current from approximately 1 week ago. > > -Adam > There is a file called /boot/zfs/zpool.cache that is kept in sync and loaded at boot time. If that's not there , e.g. by your /boot pointing to it , you're hosed. -- Axel From owner-freebsd-fs@FreeBSD.ORG Thu Sep 20 00:06:26 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9047616A420 for ; Thu, 20 Sep 2007 00:06:26 +0000 (UTC) (envelope-from freebsd-fs@adam.gs) Received: from mail.adam.gs (mail.adam.gs [76.9.2.116]) by mx1.freebsd.org (Postfix) with ESMTP id 0F49513C468 for ; Thu, 20 Sep 2007 00:05:06 +0000 (UTC) (envelope-from freebsd-fs@adam.gs) Received: from [127.0.0.1] (localhost.adam.gs [127.0.0.1]) by mail.adam.gs (Postfix) with ESMTP id A89BDF35562 for ; Wed, 19 Sep 2007 20:05:05 -0400 (EDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=simple; s=mail; d=adam.gs; b=IrX7p75RCxrTxR8EtDTFM08tCDuP45V7BCb1VAY1VtlfbGCAIBpI18boSNOUomlSHJOJ+kAvPNeyDQ/9juDLT1yDZCCD2HM1Chz5QfOu/An4MtdfN+R3Yyf8YB7YVJfbdYQedMSKCTWdceNZAifxBMotjubyV97x4fgpOADCqVw=; Received: from [66.230.128.46] (unknown [66.230.128.46]) (Authenticated sender: adam@adam.gs) by mail.adam.gs (Postfix) with ESMTP id 49949F35385; Wed, 19 Sep 2007 20:05:05 -0400 (EDT) In-Reply-To: References: Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Adam Jacob Muller Date: Wed, 19 Sep 2007 20:05:03 -0400 To: Axel X-Mailer: Apple Mail (2.752.3) Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: ZFS pool not working on boot X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Sep 2007 00:06:26 -0000 On Sep 19, 2007, at 6:44 PM, Axel wrote: > Adam Jacob Muller writes: > >> Hello, >> I have a server with two ZFS pools, one is an internal raid0 using 2 >> drives connected via ahc. The other is an external storage array with >> 11 drives also using ahc, using raidz. (This is a dell 1650 and >> pv220s). >> On reboot, the pools do not come online on their own. Both pools >> consistently show as failed. >> >> the exact symptoms vary, however I have seen that many drives are >> marked as variously "corrupt" or "unavailable" most zpool operations >> fail with "pool is unavailable" errors. >> >> Here is the interesting part. >> Consistently, 100% of the time, a zpool export followed by a zpool >> import restores the arrays to an ONLINE status. Once the array is >> online, it's quite stable (I'm loving ZFS btw, thank you to everyone >> for the hard work on this, ZFS is fantastic) and works great. >> >> Anyone have any ideas why this might occur and what/if the >> solution is? >> >> Any additional information can be provided on-request, I am running >> current from approximately 1 week ago. >> >> -Adam >> > > There is a file called /boot/zfs/zpool.cache that is kept in sync > and loaded at boot time. > > If that's not there , e.g. by your /boot pointing to it , you're > hosed. > File is there, of note is that some of the prior reboots had been "unintentional" reboots, so it is possible that that file was corrupt, however, it does not seem correct for zfs to come up in a state that shows drives as corrupted and/or unavailable. I believe I have corrected the crashing issue, however it still does not seem that this is the correct behavior. - Adam From owner-freebsd-fs@FreeBSD.ORG Thu Sep 20 02:39:13 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BB61F16A421; Thu, 20 Sep 2007 02:39:13 +0000 (UTC) (envelope-from acd@acd.homelinux.org) Received: from server0.acapsecurity.com (acapsecurity.com [207.38.28.204]) by mx1.freebsd.org (Postfix) with ESMTP id 96BA513C45E; Thu, 20 Sep 2007 02:39:13 +0000 (UTC) (envelope-from acd@acd.homelinux.org) Received: from acd.homelinux.org (unknown [10.0.15.5]) by server0.acapsecurity.com (Postfix) with ESMTP id 310FB22B6E; Wed, 19 Sep 2007 19:39:12 -0700 (PDT) Received: by acd.homelinux.org (Postfix, from userid 500) id 551D43086; Wed, 19 Sep 2007 20:39:12 -0600 (MDT) From: Axel To: Adam Jacob Muller References: Date: Wed, 19 Sep 2007 20:39:11 -0600 In-Reply-To: (Adam Jacob Muller's message of "Wed\, 19 Sep 2007 20\:05\:03 -0400") Message-ID: User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: ZFS pool not working on boot X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Sep 2007 02:39:13 -0000 Adam Jacob Muller writes: > On Sep 19, 2007, at 6:44 PM, Axel wrote: > >> There is a file called /boot/zfs/zpool.cache that is kept in sync >> and loaded at boot time. >> >> If that's not there , e.g. by your /boot pointing to it , you're >> hosed. >> > > File is there, of note is that some of the prior reboots had been > "unintentional" reboots, so it is possible that that file was > corrupt, however, it does not seem correct for zfs to come up in a > state that shows drives as corrupted and/or unavailable. I believe I > have corrected the crashing issue, however it still does not seem > that this is the correct behavior. > > - Adam > If you have a working root outside of zfs I'd do the following: 1) Rename the zpool.cache to something else to be safe 2) Reboot, make sure that /boot/zfs points to the right location, and reimport the pools. 3) Should be fine from there on. I had sort of the same issue, the zpool.cache isn't documented too well yet; I only stumbled over it by doing a "lsmod" at the loader prompt;it's one reason root can be on zfs before hostid is set. If you setup zfs and don't have the future /boot/zfs set right it won't work because the information gets lost. With / on zfs it's crucial to have /boot point to the actual UFS boot partition and not be in your zfs / somewhere, cause that gets ignored until it's mounted. It's a good idea to keep the actual old UFS / directory around although only /boot gets used in there if you mount / from zfs. http://wiki.freebsd.org/ZFS comes in handy. And yes, I do love zfs too :-) -- Axel From owner-freebsd-fs@FreeBSD.ORG Thu Sep 20 21:15:33 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1758416A47B for ; Thu, 20 Sep 2007 21:15:33 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (tim.des.no [194.63.250.121]) by mx1.freebsd.org (Postfix) with ESMTP id A1BB513C467 for ; Thu, 20 Sep 2007 21:15:11 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (localhost [127.0.0.1]) by spam.des.no (Postfix) with ESMTP id 05EE520A4; Thu, 20 Sep 2007 11:26:16 +0200 (CEST) X-Spam-Tests: AWL X-Spam-Learn: disabled X-Spam-Score: 0.0/3.0 X-Spam-Checker-Version: SpamAssassin 3.2.1 (2007-05-02) on tim.des.no Received: from ds4.des.no (des.no [80.203.243.180]) by smtp.des.no (Postfix) with ESMTP id DFC1520A0; Thu, 20 Sep 2007 11:26:15 +0200 (CEST) Received: by ds4.des.no (Postfix, from userid 1001) id CAFCC84480; Thu, 20 Sep 2007 11:26:15 +0200 (CEST) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: Johan =?utf-8?Q?Str=C3=B6m?= References: Date: Thu, 20 Sep 2007 11:26:15 +0200 In-Reply-To: ("Johan =?utf-8?Q?Str=C3=B6m=22's?= message of "Thu\, 20 Sep 2007 00\:31\:56 +0200") Message-ID: <86y7f1ofgo.fsf@ds4.des.no> User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: ZFS (and quota) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Sep 2007 21:15:33 -0000 Johan Str=C3=B6m writes: > I was playing around with ZFS a bit and tried out the quota > features. While doing this I noticed that it doesnt seem like you get > a "disk full" notice the same way as you do on a "normal" (UFS) > filesystem. Instead of aborting the operation with "No space left on > device" it just continued: > [...] > [root@devbox /tank]# zfs create tank/set2 > [root@devbox /tank/set2]# zfs set quota=3D10M tank/set2 > [root@devbox /tank/set2]# zfs get quota tank/set2 > NAME PROPERTY VALUE SOURCE > tank/set2 quota 10M local > [root@devbox /tank/set2]# dd if=3D/dev/urandom of=3Dtest > ^C > 18563+0 records in > 18562+0 records out > 9503744 bytes transferred in 199.564353 secs (47622 bytes/sec) > [root@devbox /tank/set2]# zfs list tank/set2 > NAME USED AVAIL REFER MOUNTPOINT > tank/set2 9.15M 870K 9.15M /tank/set2 See what it says under AVAIL? You killed it before it filled the disk. des@ds4 ~% sudo zfs create raid/q des@ds4 ~% sudo zfs set quota=3D1m raid/q des@ds4 ~% sudo dd if=3D/dev/zero of=3D/raid/q/test bs=3D65536 dd: /raid/q/test: Disc quota exceeded 16+0 records in 15+0 records out 983040 bytes transferred in 2.533990 secs (387942 bytes/sec) des@ds4 ~% zfs list raid/q NAME USED AVAIL REFER MOUNTPOINT raid/q 1.03M 0 1.03M /raid/q DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-fs@FreeBSD.ORG Thu Sep 20 21:45:28 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 31B8716A46C; Thu, 20 Sep 2007 21:45:28 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from woozle.rinet.ru (woozle.rinet.ru [195.54.192.68]) by mx1.freebsd.org (Postfix) with ESMTP id AD0FD13C50A; Thu, 20 Sep 2007 21:45:27 +0000 (UTC) (envelope-from marck@rinet.ru) Received: from localhost (localhost [127.0.0.1]) by woozle.rinet.ru (8.14.1/8.14.1) with ESMTP id l8KLjAIq035406; Fri, 21 Sep 2007 01:45:10 +0400 (MSD) (envelope-from marck@rinet.ru) Date: Fri, 21 Sep 2007 01:45:10 +0400 (MSD) From: Dmitry Morozovsky To: Dan Nelson In-Reply-To: <20070920150840.GG7562@dan.emsphone.com> Message-ID: <20070921014243.C33213@woozle.rinet.ru> References: <20070920150840.GG7562@dan.emsphone.com> X-NCC-RegID: ru.rinet X-OpenPGP-Key-ID: 6B691B03 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (woozle.rinet.ru [0.0.0.0]); Fri, 21 Sep 2007 01:45:10 +0400 (MSD) Cc: freebsd-fs@freebsd.org, Adam Jacob Muller , freebsd-current@freebsd.org, Axel Subject: Re: ZFS pool not working on boot X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Sep 2007 21:45:28 -0000 On Thu, 20 Sep 2007, Dan Nelson wrote: DN> > It's a good idea to keep the actual old UFS / directory around DN> > although only /boot gets used in there if you mount / from zfs. DN> DN> What I do is populate my UFS /.boot filesystem with /etc, /lib, /libexec, DN> /bin, and /sbin from my root filesystem, so if zfs fails to load it's DN> easy to recover. With small patch to rescue (including zpool and zfs together with libraries involved) all that required is copying /rescue and symlinking /bin and /sbin to it. Well, you also have to mkdir dev and possibly have ./etc/{,s}pwd.db to make tar happy... Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] [ FreeBSD committer: marck@FreeBSD.org ] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------ From owner-freebsd-fs@FreeBSD.ORG Thu Sep 20 21:46:01 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 90EC316A41B for ; Thu, 20 Sep 2007 21:46:01 +0000 (UTC) (envelope-from dan@dan.emsphone.com) Received: from dan.emsphone.com (dan.emsphone.com [199.67.51.101]) by mx1.freebsd.org (Postfix) with ESMTP id 4B17513C4A5 for ; Thu, 20 Sep 2007 21:46:01 +0000 (UTC) (envelope-from dan@dan.emsphone.com) Received: (from dan@localhost) by dan.emsphone.com (8.14.1/8.14.1) id l8KF8eX6034001; Thu, 20 Sep 2007 10:08:40 -0500 (CDT) (envelope-from dan) Date: Thu, 20 Sep 2007 10:08:40 -0500 From: Dan Nelson To: Axel Message-ID: <20070920150840.GG7562@dan.emsphone.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-OS: FreeBSD 7.0-CURRENT User-Agent: Mutt/1.5.16 (2007-06-09) Cc: freebsd-fs@freebsd.org, Adam Jacob Muller , freebsd-current@freebsd.org Subject: Re: ZFS pool not working on boot X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Sep 2007 21:46:01 -0000 In the last episode (Sep 19), Axel said: > Adam Jacob Muller writes: > > On Sep 19, 2007, at 6:44 PM, Axel wrote: > >> There is a file called /boot/zfs/zpool.cache that is kept in sync > >> and loaded at boot time. > >> > >> If that's not there , e.g. by your /boot pointing to it , you're > >> hosed. > > > > File is there, of note is that some of the prior reboots had been > > "unintentional" reboots, so it is possible that that file was > > corrupt, however, it does not seem correct for zfs to come up in a > > state that shows drives as corrupted and/or unavailable. I believe I > > have corrected the crashing issue, however it still does not seem > > that this is the correct behavior. > > If you have a working root outside of zfs I'd do the following: > > 1) Rename the zpool.cache to something else to be safe > 2) Reboot, make sure that /boot/zfs points to the right location, > and reimport the pools. > 3) Should be fine from there on. > > I had sort of the same issue, the zpool.cache isn't documented too > well yet; I only stumbled over it by doing a "lsmod" at the loader > prompt;it's one reason root can be on zfs before hostid is set. If > you setup zfs and don't have the future /boot/zfs set right it won't > work because the information gets lost. With / on zfs it's crucial to > have /boot point to the actual UFS boot partition and not be in your > zfs / somewhere, cause that gets ignored until it's mounted. > > It's a good idea to keep the actual old UFS / directory around > although only /boot gets used in there if you mount / from zfs. What I do is populate my UFS /.boot filesystem with /etc, /lib, /libexec, /bin, and /sbin from my root filesystem, so if zfs fails to load it's easy to recover. -- Dan Nelson dnelson@allantgroup.com From owner-freebsd-fs@FreeBSD.ORG Thu Sep 20 21:50:11 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5F5B516A473 for ; Thu, 20 Sep 2007 21:50:11 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (tim.des.no [194.63.250.121]) by mx1.freebsd.org (Postfix) with ESMTP id 06C5513C48E for ; Thu, 20 Sep 2007 21:50:11 +0000 (UTC) (envelope-from des@des.no) Received: from tim.des.no (localhost [127.0.0.1]) by spam.des.no (Postfix) with ESMTP id 1ABC020A6; Thu, 20 Sep 2007 14:54:55 +0200 (CEST) X-Spam-Tests: AWL X-Spam-Learn: disabled X-Spam-Score: 0.0/3.0 X-Spam-Checker-Version: SpamAssassin 3.2.1 (2007-05-02) on tim.des.no Received: from ds4.des.no (des.no [80.203.243.180]) by smtp.des.no (Postfix) with ESMTP id 8E99F20A5; Thu, 20 Sep 2007 14:54:54 +0200 (CEST) Received: by ds4.des.no (Postfix, from userid 1001) id 7CBAD84480; Thu, 20 Sep 2007 14:54:54 +0200 (CEST) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: Johan =?utf-8?Q?Str=C3=B6m?= References: <86y7f1ofgo.fsf@ds4.des.no> <93AE0860-0FF6-47C7-ACFC-D882D13EC7DB@stromnet.se> Date: Thu, 20 Sep 2007 14:54:54 +0200 In-Reply-To: <93AE0860-0FF6-47C7-ACFC-D882D13EC7DB@stromnet.se> ("Johan =?utf-8?Q?Str=C3=B6m=22's?= message of "Thu\, 20 Sep 2007 14\:26\:25 +0200") Message-ID: <86ps0do5sx.fsf@ds4.des.no> User-Agent: Gnus/5.110006 (No Gnus v0.6) Emacs/22.1 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: ZFS (and quota) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Sep 2007 21:50:11 -0000 Johan Str=C3=B6m writes: > Dag-Erling Sm=C3=B8rgrav writes: > > des@ds4 ~% sudo zfs create raid/q > > des@ds4 ~% sudo zfs set quota=3D1m raid/q > > des@ds4 ~% sudo dd if=3D/dev/zero of=3D/raid/q/test bs=3D65536 > > dd: /raid/q/test: Disc quota exceeded > > 16+0 records in > > 15+0 records out > > 983040 bytes transferred in 2.533990 secs (387942 bytes/sec) > > des@ds4 ~% zfs list raid/q > > NAME USED AVAIL REFER MOUNTPOINT > > raid/q 1.03M 0 1.03M /raid/q > With the bs=3D65536 parameter it works as expected, I get Disk quota > exceeded. Without it it just keeps on running until I interrupt it It seems that with small block sizes, it becomes increasingly slow as the partition fills up. You can easily see that by pressing ^T while dd is running; you will see that it still makes progress, but very slowly. des@ds4 ~% sudo dd if=3D/dev/zero of=3D/raid/q/test load: 0.18 cmd: dd 20250 [zfs:(&tx->tx_quiesce_done_cv)] 0.01u 0.48s 0% 11= 92k 17245+0 records in 17244+0 records out 8828928 bytes transferred in 18.743790 secs (471032 bytes/sec) load: 0.17 cmd: dd 20250 [zfs:(&tx->tx_quiesce_done_cv)] 0.01u 0.49s 0% 12= 12k 17273+0 records in 17272+0 records out 8843264 bytes transferred in 23.642442 secs (374042 bytes/sec) load: 0.24 cmd: dd 20250 [zfs:(&tx->tx_quiesce_done_cv)] 0.01u 0.51s 0% 12= 12k 17406+0 records in 17405+0 records out 8911360 bytes transferred in 45.053364 secs (197796 bytes/sec) load: 0.15 cmd: dd 20250 [zfs:(&tx->tx_quiesce_done_cv)] 0.01u 0.55s 0% 12= 12k 17601+0 records in 17600+0 records out 9011200 bytes transferred in 76.173965 secs (118298 bytes/sec) load: 0.06 cmd: dd 20250 [zfs:(&tx->tx_quiesce_done_cv)] 0.02u 0.60s 0% 12= 12k 17906+0 records in 17905+0 records out 9167360 bytes transferred in 126.020690 secs (72745 bytes/sec) ^C18259+0 records in 18258+0 records out 9348096 bytes transferred in 185.266755 secs (50457 bytes/sec) DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-fs@FreeBSD.ORG Thu Sep 20 22:44:50 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DCD7C16A417 for ; Thu, 20 Sep 2007 22:44:50 +0000 (UTC) (envelope-from johan@stromnet.se) Received: from av12-2-sn2.hy.skanova.net (av12-2-sn2.hy.skanova.net [81.228.8.186]) by mx1.freebsd.org (Postfix) with ESMTP id 6BCA613C48E for ; Thu, 20 Sep 2007 22:44:50 +0000 (UTC) (envelope-from johan@stromnet.se) Received: by av12-2-sn2.hy.skanova.net (Postfix, from userid 502) id 33D4F381B7; Thu, 20 Sep 2007 14:26:35 +0200 (CEST) Received: from smtp4-2-sn2.hy.skanova.net (smtp4-2-sn2.hy.skanova.net [81.228.8.93]) by av12-2-sn2.hy.skanova.net (Postfix) with ESMTP id 09338381B7; Thu, 20 Sep 2007 14:26:35 +0200 (CEST) Received: from phomca.stromnet.se (90-224-172-102-no129.tbcn.telia.com [90.224.172.102]) by smtp4-2-sn2.hy.skanova.net (Postfix) with ESMTP id AA87637E45; Thu, 20 Sep 2007 14:26:34 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by phomca.stromnet.se (Postfix) with ESMTP id 55BB6BAF3; Thu, 20 Sep 2007 14:26:34 +0200 (CEST) X-Virus-Scanned: amavisd-new at stromnet.se Received: from phomca.stromnet.se ([127.0.0.1]) by localhost (phomca.stromnet.se [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id K0VY8YPGjdG9; Thu, 20 Sep 2007 14:26:33 +0200 (CEST) Received: from [IPv6:2001:16d8:ff20:1:217:f2ff:fef0:d6b7] (unknown [IPv6:2001:16d8:ff20:1:217:f2ff:fef0:d6b7]) by phomca.stromnet.se (Postfix) with ESMTP id 11C63BAEF; Thu, 20 Sep 2007 14:26:33 +0200 (CEST) In-Reply-To: <86y7f1ofgo.fsf@ds4.des.no> References: <86y7f1ofgo.fsf@ds4.des.no> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed Message-Id: <93AE0860-0FF6-47C7-ACFC-D882D13EC7DB@stromnet.se> Content-Transfer-Encoding: quoted-printable From: =?ISO-8859-1?Q?Johan_Str=F6m?= Date: Thu, 20 Sep 2007 14:26:25 +0200 To: =?ISO-8859-1?Q?Dag-Erling_Sm=F8rgrav?= X-Mailer: Apple Mail (2.752.3) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS (and quota) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Sep 2007 22:44:50 -0000 On Sep 20, 2007, at 11:26 , Dag-Erling Sm=F8rgrav wrote: > Johan Str=F6m writes: >> I was playing around with ZFS a bit and tried out the quota >> features. While doing this I noticed that it doesnt seem like you get >> a "disk full" notice the same way as you do on a "normal" (UFS) >> filesystem. Instead of aborting the operation with "No space left on >> device" it just continued: >> [...] >> [root@devbox /tank]# zfs create tank/set2 >> [root@devbox /tank/set2]# zfs set quota=3D10M tank/set2 >> [root@devbox /tank/set2]# zfs get quota tank/set2 >> NAME PROPERTY VALUE SOURCE >> tank/set2 quota 10M local >> [root@devbox /tank/set2]# dd if=3D/dev/urandom of=3Dtest >> ^C >> 18563+0 records in >> 18562+0 records out >> 9503744 bytes transferred in 199.564353 secs (47622 bytes/sec) >> [root@devbox /tank/set2]# zfs list tank/set2 >> NAME USED AVAIL REFER MOUNTPOINT >> tank/set2 9.15M 870K 9.15M /tank/set2 > > See what it says under AVAIL? You killed it before it filled the =20 > disk. > [root@devbox /home/johan]# zfs list tank/set2 NAME USED AVAIL REFER MOUNTPOINT tank/set2 9.15M 870K 9.15M /tank/set2 Yes i did, but after 200 seconds one would think that 10Mbs should be =20= filled (took 2.2s on the ufs) right? :) > des@ds4 ~% sudo zfs create raid/q > des@ds4 ~% sudo zfs set quota=3D1m raid/q > des@ds4 ~% sudo dd if=3D/dev/zero of=3D/raid/q/test bs=3D65536 > dd: /raid/q/test: Disc quota exceeded > 16+0 records in > 15+0 records out > 983040 bytes transferred in 2.533990 secs (387942 bytes/sec) > des@ds4 ~% zfs list raid/q > NAME USED AVAIL REFER MOUNTPOINT > raid/q 1.03M 0 1.03M /raid/q With the bs=3D65536 parameter it works as expected, I get Disk quota =20= exceeded. Without it it just keeps on running until I interrupt it > > DES > --=20 > Dag-Erling Sm=F8rgrav - des@des.no From owner-freebsd-fs@FreeBSD.ORG Thu Sep 20 22:53:18 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 669C116A4A6 for ; Thu, 20 Sep 2007 22:53:18 +0000 (UTC) (envelope-from johan@stromnet.se) Received: from av9-2-sn2.hy.skanova.net (av9-2-sn2.hy.skanova.net [81.228.8.180]) by mx1.freebsd.org (Postfix) with ESMTP id A71E213C4B2 for ; Thu, 20 Sep 2007 22:53:17 +0000 (UTC) (envelope-from johan@stromnet.se) Received: by av9-2-sn2.hy.skanova.net (Postfix, from userid 502) id BA12838295; Thu, 20 Sep 2007 14:28:58 +0200 (CEST) Received: from smtp4-2-sn2.hy.skanova.net (smtp4-2-sn2.hy.skanova.net [81.228.8.93]) by av9-2-sn2.hy.skanova.net (Postfix) with ESMTP id A3A8C381BB; Thu, 20 Sep 2007 14:28:58 +0200 (CEST) Received: from phomca.stromnet.se (90-224-172-102-no129.tbcn.telia.com [90.224.172.102]) by smtp4-2-sn2.hy.skanova.net (Postfix) with ESMTP id 77EE637E42; Thu, 20 Sep 2007 14:28:58 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by phomca.stromnet.se (Postfix) with ESMTP id 5CB48BAF4; Thu, 20 Sep 2007 14:28:58 +0200 (CEST) X-Virus-Scanned: amavisd-new at stromnet.se Received: from phomca.stromnet.se ([127.0.0.1]) by localhost (phomca.stromnet.se [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Kn4rnF5OKJwi; Thu, 20 Sep 2007 14:28:52 +0200 (CEST) Received: from [IPv6:2001:16d8:ff20:1:217:f2ff:fef0:d6b7] (unknown [IPv6:2001:16d8:ff20:1:217:f2ff:fef0:d6b7]) by phomca.stromnet.se (Postfix) with ESMTP id A5BB7BAEF; Thu, 20 Sep 2007 14:28:52 +0200 (CEST) In-Reply-To: <20070920115621.GF4517@garage.freebsd.pl> References: <20070920115621.GF4517@garage.freebsd.pl> Mime-Version: 1.0 (Apple Message framework v752.3) Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed Message-Id: <8B5FB4B1-2398-491C-95F4-E79361606916@stromnet.se> Content-Transfer-Encoding: quoted-printable From: =?ISO-8859-1?Q?Johan_Str=F6m?= Date: Thu, 20 Sep 2007 14:28:45 +0200 To: Pawel Jakub Dawidek X-Mailer: Apple Mail (2.752.3) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS (and quota) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Sep 2007 22:53:18 -0000 On Sep 20, 2007, at 13:56 , Pawel Jakub Dawidek wrote: > On Thu, Sep 20, 2007 at 12:31:56AM +0200, Johan Str=F6m wrote: >> Hello >> >> I just installed FreeBSD-current on a box (actually upgraded 6.2 to - >> current) to experiment a bit. >> I was playing around with ZFS a bit and tried out the quota features. >> While doing this I noticed that it doesnt seem like you get a "disk >> full" notice the same way as you do on a "normal" (UFS) filesystem. >> Instead of aborting the operation with "No space left on device" it >> just continued: > [...] >> [root@devbox /tank]# zfs create tank/set2 >> [root@devbox /tank/set2]# zfs set quota=3D10M tank/set2 >> [root@devbox /tank/set2]# zfs get quota tank/set2 >> NAME PROPERTY VALUE SOURCE >> tank/set2 quota 10M local >> [root@devbox /tank/set2]# dd if=3D/dev/urandom of=3Dtest >> ^C >> 18563+0 records in >> 18562+0 records out >> 9503744 bytes transferred in 199.564353 secs (47622 bytes/sec) >> [root@devbox /tank/set2]# zfs list tank/set2 >> NAME USED AVAIL REFER MOUNTPOINT >> tank/set2 9.15M 870K 9.15M /tank/set2 >> >> No hard stop there, it just tries to write more and more and more.. >> Well the quota is enforced fine but shouldnt there be some more hard >> error? I'm not sure how regular UFS quotas work though since I never >> used them, but this seems like strange behaviour. > > Hmm, seems to work just fine here: > > beast:root:~# zfs create tank/foo > beast:root:~# zfs set quota=3D10m tank/foo > > beast:root:~# dd if=3D/dev/random of=3D/tank/foo/test bs=3D1m > dd: /tank/foo/test: Disc quota exceeded > 11+0 records in > 10+0 records out > 10485760 bytes transferred in 6.109407 secs (1716330 bytes/sec) > > beast:root:~# df -h /tank/foo > Filesystem Size Used Avail Capacity Mounted on > tank/foo 10M 10M 0B 100% /tank/foo > > I think you just waited not long enough:) You didn't give block size > argument to dd(1), so it used 512 bytes. Please be more patient, retry > and report back, thanks! You where correct :) [root@devbox /tank/set2]# dd if=3D/dev/urandom of=3Dtest2 dd: test2: Disc quota exceeded 1538+0 records in 1537+0 records out 786944 bytes transferred in 202.628064 secs (3884 bytes/sec) But the last day i ran it for at least 300 seconds wihtout having a =20 stop.. When i did it on UFS it took 2 seconds to fill up altogether, =20 with zfs it keept on going much longer? Retested: [root@devbox /tank/set3vol]# ls -al total 6 drwxr-xr-x 3 root wheel 512 Sep 20 14:16 . drwxr-xr-x 5 root wheel 5 Sep 20 00:22 .. drwxrwxr-x 2 root operator 512 Sep 20 00:21 .snap [root@devbox /tank/set3vol]# dd if=3D/dev/urandom of=3Dtest /tank/set3vol: write failed, filesystem is full dd: test: No space left on device 19169+0 records in 19168+0 records out 9814016 bytes transferred in 2.176188 secs (4509728 bytes/sec) [root@devbox /tank/set3vol]# cd ../set2/ [root@devbox /tank/set2]# ls -al total 3 drwxr-xr-x 2 root wheel 2 Sep 20 14:16 . drwxr-xr-x 5 root wheel 5 Sep 20 00:22 .. [root@devbox /tank/set2]# dd if=3D/dev/urandom of=3Dtest dd: test: Disc quota exceeded 20226+0 records in 20225+0 records out 10355200 bytes transferred in 456.448610 secs (22686 bytes/sec) [root@devbox /tank/set2]# df -h Filesystem Size Used Avail Capacity Mounted on /dev/ad0s1a 496M 174M 282M 38% / devfs 1.0K 1.0K 0B 100% /dev /dev/ad0s1e 496M 28K 456M 0% /tmp /dev/ad0s1f 5.0G 2.8G 1.8G 61% /usr /dev/ad0s1d 1.2G 105M 1.0G 9% /var tank 37G 0B 37G 0% /tank tank/set1 37G 0B 37G 0% /tank/set1 /dev/zvol/tank/set3vol 9.4M 9.4M -728K 108% /tank/set3vol tank/set2 10M 10M 0B 100% /tank/set2 [root@devbox /tank/set2]# On UFS 2.1 sec (altough disk full, not quota full), on ZFS 450sec. > > --=20 > Pawel Jakub Dawidek http://www.wheel.pl > pjd@FreeBSD.org http://www.FreeBSD.org > FreeBSD committer Am I Evil? Yes, I Am! From owner-freebsd-fs@FreeBSD.ORG Thu Sep 20 23:46:49 2007 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E20CA16A417 for ; Thu, 20 Sep 2007 23:46:49 +0000 (UTC) (envelope-from etc@fluffles.net) Received: from auriate.fluffles.net (cust.95.160.adsl.cistron.nl [195.64.95.160]) by mx1.freebsd.org (Postfix) with ESMTP id 9FE2613C469 for ; Thu, 20 Sep 2007 23:46:49 +0000 (UTC) (envelope-from etc@fluffles.net) Received: from 82-136-249-178.ip.tiscali.nl ([82.136.249.178] helo=[10.0.0.18]) by auriate.fluffles.net with esmtpa (Exim 4.66 (FreeBSD)) (envelope-from ) id 1IYJWX-000F3G-29 for freebsd-fs@FreeBSD.org; Thu, 20 Sep 2007 12:44:45 +0200 Message-ID: <46F24F2C.40205@fluffles.net> Date: Thu, 20 Sep 2007 12:45:00 +0200 From: Fluffles User-Agent: Thunderbird 2.0.0.6 (Windows/20070728) MIME-Version: 1.0 To: freebsd-fs@FreeBSD.org Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: Writing contigiously to UFS2? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Sep 2007 23:46:50 -0000 Hello list, I've setup a concat of 8 disks for my new NAS, using ataidle to spindown the disks not needed. This allows me to save power and noise/heat by running only the drives that are actually in use. My problem is UFS. UFS2 seems to write to 4 disks, even though all the data written so far can easily fit on just one disk. What's going on here? I looked at newfs parameters, but in the past was unable to make newfs write contigiously. It seems UFS2 always writes to a new cylinder. Is there any way to force UFS to write contigiously? Or at least limit the problem? If i write 400GB to a 4TB volume consisting of 8x 500GB disks, i want all data to be on the first disk. If the data spreads, then more disks will be 'awaken' when i read my data, which defeats the purpose of my power-saving NAS experiment. Any feedback is welcome. Using FreeBSD 6.2-RELEASE i386, used newfs -U -S 2048 . - Veronica From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 06:57:18 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4DA6016A494 for ; Fri, 21 Sep 2007 06:57:18 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (arm132.internetdsl.tpnet.pl [83.17.198.132]) by mx1.freebsd.org (Postfix) with ESMTP id E5D5413C45A for ; Fri, 21 Sep 2007 06:57:17 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 242E645CD9; Thu, 20 Sep 2007 13:57:49 +0200 (CEST) Received: from localhost (pjd.wheel.pl [10.0.1.1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 5627C45683; Thu, 20 Sep 2007 13:57:45 +0200 (CEST) Date: Thu, 20 Sep 2007 13:56:21 +0200 From: Pawel Jakub Dawidek To: Johan =?iso-8859-1?Q?Str=F6m?= Message-ID: <20070920115621.GF4517@garage.freebsd.pl> References: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="jt0yj30bxbg11sci" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 7.0-CURRENT i386 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-5.9 required=3.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.0.4 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS (and quota) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2007 06:57:18 -0000 --jt0yj30bxbg11sci Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Sep 20, 2007 at 12:31:56AM +0200, Johan Str=F6m wrote: > Hello >=20 > I just installed FreeBSD-current on a box (actually upgraded 6.2 to -=20 > current) to experiment a bit. > I was playing around with ZFS a bit and tried out the quota features. =20 > While doing this I noticed that it doesnt seem like you get a "disk =20 > full" notice the same way as you do on a "normal" (UFS) filesystem. =20 > Instead of aborting the operation with "No space left on device" it =20 > just continued: [...] > [root@devbox /tank]# zfs create tank/set2 > [root@devbox /tank/set2]# zfs set quota=3D10M tank/set2 > [root@devbox /tank/set2]# zfs get quota tank/set2 > NAME PROPERTY VALUE SOURCE > tank/set2 quota 10M local > [root@devbox /tank/set2]# dd if=3D/dev/urandom of=3Dtest > ^C > 18563+0 records in > 18562+0 records out > 9503744 bytes transferred in 199.564353 secs (47622 bytes/sec) > [root@devbox /tank/set2]# zfs list tank/set2 > NAME USED AVAIL REFER MOUNTPOINT > tank/set2 9.15M 870K 9.15M /tank/set2 >=20 > No hard stop there, it just tries to write more and more and more.. =20 > Well the quota is enforced fine but shouldnt there be some more hard =20 > error? I'm not sure how regular UFS quotas work though since I never =20 > used them, but this seems like strange behaviour. Hmm, seems to work just fine here: beast:root:~# zfs create tank/foo beast:root:~# zfs set quota=3D10m tank/foo beast:root:~# dd if=3D/dev/random of=3D/tank/foo/test bs=3D1m dd: /tank/foo/test: Disc quota exceeded 11+0 records in 10+0 records out 10485760 bytes transferred in 6.109407 secs (1716330 bytes/sec) beast:root:~# df -h /tank/foo Filesystem Size Used Avail Capacity Mounted on tank/foo 10M 10M 0B 100% /tank/foo I think you just waited not long enough:) You didn't give block size argument to dd(1), so it used 512 bytes. Please be more patient, retry and report back, thanks! --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --jt0yj30bxbg11sci Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFG8l/lForvXbEpPzQRAnKoAKCJw4sN4Zp84e2WcJESOpcP9VS1qwCfYm0O Vnl2pgGkuUinVIDnD+IGkvI= =O01/ -----END PGP SIGNATURE----- --jt0yj30bxbg11sci-- From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 10:49:47 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1791D16A418 for ; Fri, 21 Sep 2007 10:49:47 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id 9BF9013C481 for ; Fri, 21 Sep 2007 10:49:46 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from list by ciao.gmane.org with local (Exim 4.43) id 1IYg4s-0005Pc-GZ for freebsd-fs@freebsd.org; Fri, 21 Sep 2007 12:49:42 +0200 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 21 Sep 2007 12:49:42 +0200 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 21 Sep 2007 12:49:42 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Ivan Voras Date: Fri, 21 Sep 2007 12:49:28 +0200 Lines: 52 Message-ID: References: <46F24F2C.40205@fluffles.net> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-ripemd160; protocol="application/pgp-signature"; boundary="------------enig5CF3686197E6B1275D584F4E" X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Thunderbird 1.5.0.12 (X11/20060911) In-Reply-To: <46F24F2C.40205@fluffles.net> X-Enigmail-Version: 0.94.4.0 Sender: news Subject: Re: Writing contigiously to UFS2? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2007 10:49:47 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig5CF3686197E6B1275D584F4E Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Fluffles wrote: > Hello list, >=20 > I've setup a concat of 8 disks for my new NAS, using ataidle to spindow= n=20 > the disks not needed. This allows me to save power and noise/heat by=20 > running only the drives that are actually in use. >=20 > My problem is UFS. UFS2 seems to write to 4 disks, even though all the = There 4 drives are used in what RAID form? If it's RAID0/stripe, you=20 can't avoid data being spread across the drives (since this is the point = of having RAID0). > data written so far can easily fit on just one disk. What's going on=20 > here? I looked at newfs parameters, but in the past was unable to make = > newfs write contigiously. It seems UFS2 always writes to a new cylinder= =2E=20 > Is there any way to force UFS to write contigiously? Or at least limit = > the problem? If the drives are simply concatenated, then there might be weird=20 behaviour in choosing what cylinder groups to allocate for files. UFS=20 forces big files to be spread across cylinder groups so that no large=20 file fills entire cgs. --------------enig5CF3686197E6B1275D584F4E Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFG86G5ldnAQVacBcgRA5udAJ9L3OmCHFrUOawWoO7KtdoDM2OSQQCgi1xG 8r9XI/M4ebP4xNTsmbXKSKk= =18nj -----END PGP SIGNATURE----- --------------enig5CF3686197E6B1275D584F4E-- From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 11:09:04 2007 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7931716A418 for ; Fri, 21 Sep 2007 11:09:04 +0000 (UTC) (envelope-from etc@fluffles.net) Received: from auriate.fluffles.net (cust.95.160.adsl.cistron.nl [195.64.95.160]) by mx1.freebsd.org (Postfix) with ESMTP id 2DA4113C4C4 for ; Fri, 21 Sep 2007 11:09:03 +0000 (UTC) (envelope-from etc@fluffles.net) Received: from 82-136-249-178.ip.tiscali.nl ([82.136.249.178] helo=[10.0.0.18]) by auriate.fluffles.net with esmtpa (Exim 4.66 (FreeBSD)) (envelope-from ) id 1IYgNK-000IY7-IS; Fri, 21 Sep 2007 13:08:46 +0200 Message-ID: <46F3A64C.4090507@fluffles.net> Date: Fri, 21 Sep 2007 13:09:00 +0200 From: Fluffles User-Agent: Thunderbird 2.0.0.6 (Windows/20070728) MIME-Version: 1.0 To: freebsd-fs@FreeBSD.org Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Cc: Ivan Voras Subject: Re: Writing contigiously to UFS2? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2007 11:09:04 -0000 Ivan Voras wrote: > There 4 drives are used in what RAID form? If it's RAID0/stripe, you can't avoid data being spread across the drives (since this is the point of having RAID0). It's an array of 8 drives in gconcat, so they are using the JBOD / spanning / concatenating scheme, which does not have a RAID designation but rather is a bunch of disks glued to each other. Thus, there is no striping involved. Offset 0 to 500GB will 'land' on disk0 and then disk1 takes over, in scheme: offset 0 ------------------------------------------------------------------- offset 4TB disk0 -> disk1 -> disk2 -> disk3 -> disk4 -> disk5 -> disk6 -> disk7 (for everyone not familiar with concatenation) > If the drives are simply concatenated, then there might be weird behavior in choosing what cylinder groups to allocate for files. UFS forces big files to be spread across cylinder groups so that no large file fills entire cgs. Exactly! And this is my problem. I do not like this behavior for various reasons: - it causes lower sequential transfer speed because the disks have to seek regularly - UFS causes 2 reads per second when writing sequentially, probably some meta-data thing but i don't like it either - files are not written contiguously which causes fragmentation, essentially UFS forces big files to become fragmented this way. Even worse: data is being stored at weird locations, so that my energy efficient NAS project becomes crippled. Even with the first 400GB of data, it's storing that on the first 4 disks in my concat configuration, so that when opening folders i have to wait 10 seconds before the disk is spinned up. For regular operation, multiple disk have to be spinned up which is not practical and unnecessary. Is there any way to force UFS to write contiguously? Else i think i should try linux with some linux filesystem (XFS, Reiser, JFS) in the hope they do not suffer from this problem. In the past when testing geom_raid5 I've tried to tune newfs parameters so that it would write contiguously but still there were regular 2-phase writes which mean data was not written contiguously. I really dislike this behavior. - Veronica From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 11:36:31 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7CDAE16A468 for ; Fri, 21 Sep 2007 11:36:31 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id 0A77213C465 for ; Fri, 21 Sep 2007 11:36:31 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from list by ciao.gmane.org with local (Exim 4.43) id 1IYgns-0003m1-6q for freebsd-fs@freebsd.org; Fri, 21 Sep 2007 13:36:12 +0200 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 21 Sep 2007 13:36:12 +0200 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 21 Sep 2007 13:36:12 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Ivan Voras Date: Fri, 21 Sep 2007 13:35:47 +0200 Lines: 46 Message-ID: References: <46F3A64C.4090507@fluffles.net> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-ripemd160; protocol="application/pgp-signature"; boundary="------------enig32BE1CB392C24DEE913EB202" X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Thunderbird 1.5.0.12 (X11/20060911) In-Reply-To: <46F3A64C.4090507@fluffles.net> X-Enigmail-Version: 0.94.4.0 Sender: news Subject: Re: Writing contigiously to UFS2? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2007 11:36:31 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig32BE1CB392C24DEE913EB202 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Fluffles wrote: > Even worse: data is being stored at weird locations, so that my energy = > efficient NAS project becomes crippled. Even with the first 400GB of=20 > data, it's storing that on the first 4 disks in my concat configuration= ,=20 > In the past when testing geom_raid5 I've tried to tune newfs parameters= =20 > so that it would write contiguously but still there were regular 2-phas= e=20 > writes which mean data was not written contiguously. I really dislike=20 > this behavior. I agree, this is my least favorite aspect of UFS (maybe together with=20 nonimplementation of extents), for various reasons. I feel it's time to=20 start heavy lobbying for finishing FreeBSD's implementations of XFS and=20 raiserfs :) (ZFS is not the ultimate solution: 1) replacing UFS monoculture with ZFS = monoculture will sooner or later yield problems, and 2) sometimes a=20 "dumb" unix filesystem is preferred to the "smart" ZFS). --------------enig32BE1CB392C24DEE913EB202 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFG86yTldnAQVacBcgRA6qHAJ9q44TYMr5HghYLvGLddFNJXoRljgCeNONU cMqIOf4ZoHWLPQEGl2vvSCA= =dlel -----END PGP SIGNATURE----- --------------enig32BE1CB392C24DEE913EB202-- From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 12:10:31 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B9B4816A41A for ; Fri, 21 Sep 2007 12:10:31 +0000 (UTC) (envelope-from anderson@freebsd.org) Received: from ns.trinitel.com (186.161.36.72.static.reverse.ltdomains.com [72.36.161.186]) by mx1.freebsd.org (Postfix) with ESMTP id 9976613C4BA for ; Fri, 21 Sep 2007 12:10:31 +0000 (UTC) (envelope-from anderson@freebsd.org) Received: from proton.storspeed.com (209-163-168-124.static.twtelecom.net [209.163.168.124]) (authenticated bits=0) by ns.trinitel.com (8.14.1/8.14.1) with ESMTP id l8LCAQEs098414; Fri, 21 Sep 2007 07:10:28 -0500 (CDT) (envelope-from anderson@freebsd.org) Message-ID: <46F3B4B0.40606@freebsd.org> Date: Fri, 21 Sep 2007 07:10:24 -0500 From: Eric Anderson User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Fluffles References: <46F3A64C.4090507@fluffles.net> In-Reply-To: <46F3A64C.4090507@fluffles.net> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.8 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on ns.trinitel.com Cc: freebsd-fs@freebsd.org, Ivan Voras Subject: Re: Writing contigiously to UFS2? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2007 12:10:31 -0000 Fluffles wrote: > > Ivan Voras wrote: > > There 4 drives are used in what RAID form? If it's RAID0/stripe, you > can't avoid data being spread across the drives (since this is the point > of having RAID0). > > It's an array of 8 drives in gconcat, so they are using the JBOD / > spanning / concatenating scheme, which does not have a RAID designation > but rather is a bunch of disks glued to each other. Thus, there is no > striping involved. Offset 0 to 500GB will 'land' on disk0 and then disk1 > takes over, in scheme: > > offset 0 > ------------------------------------------------------------------- > offset 4TB > disk0 -> disk1 -> disk2 -> disk3 -> disk4 -> disk5 -> disk6 -> disk7 > > (for everyone not familiar with concatenation) > > > > If the drives are simply concatenated, then there might be weird > behavior in choosing what cylinder groups to allocate for files. UFS > forces big files to be spread across cylinder groups so that no large > file fills entire cgs. > > Exactly! And this is my problem. I do not like this behavior for various > reasons: > - it causes lower sequential transfer speed because the disks have to > seek regularly > - UFS causes 2 reads per second when writing sequentially, probably some > meta-data thing but i don't like it either > - files are not written contiguously which causes fragmentation, > essentially UFS forces big files to become fragmented this way. > > Even worse: data is being stored at weird locations, so that my energy > efficient NAS project becomes crippled. Even with the first 400GB of > data, it's storing that on the first 4 disks in my concat configuration, > so that when opening folders i have to wait 10 seconds before the disk > is spinned up. For regular operation, multiple disk have to be spinned > up which is not practical and unnecessary. Is there any way to force UFS > to write contiguously? Else i think i should try linux with some linux > filesystem (XFS, Reiser, JFS) in the hope they do not suffer from this > problem. > > In the past when testing geom_raid5 I've tried to tune newfs parameters > so that it would write contiguously but still there were regular 2-phase > writes which mean data was not written contiguously. I really dislike > this behavior. This notion of breaking up large blocks of data into smaller chunks is a fundamental of the UFS (well, FFS) filesystem, and has been around for ages. I'm not saying it's the One True FS Format by any means, but many many other file systems use the same principals. The largest file size per chunk in a cylinder group is calculated at newfs time, which determines also how many cylinder groups there should be. I think the largest size I've seen was something in the 460MB-ish range, meaning any contiguous write above that would span more than one cylinder group. The max cylinder group size also has another bad side effect - the more cylinder groups you have, the longer it takes a snapshot to be created. I recommend trying msdos fs. On recent -CURRENT, it should perform fairly well (akin to UFS2 I think), and if I recall correctly, has a more contiguous block layout. In the end, extending UFS2 to support much larger cylinder group sizes would hugely beneficial. Instead of forcing XFS, reiserfs, JFS, ext[23], etc, to be writable (which most of those are GPL'ed), why not start the (immensely huge) task of a UFS3, which has support for all the things we need for the next 5-10yrs? UFS2 has served well from 5.x->7.x, but what about the future? Making a UFS3 takes time, and dedication from developers. Eric From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 12:12:40 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8684816A417 for ; Fri, 21 Sep 2007 12:12:40 +0000 (UTC) (envelope-from se@FreeBSD.org) Received: from spacemail2-out.mgmt.space.net (spacemail2-out.mgmt.Space.Net [194.97.149.148]) by mx1.freebsd.org (Postfix) with ESMTP id 0CADD13C474 for ; Fri, 21 Sep 2007 12:12:39 +0000 (UTC) (envelope-from se@FreeBSD.org) X-SpaceNet-SBRS: None X-IronPort-AV: E=Sophos;i="4.20,283,1186351200"; d="scan'208";a="53864237" Received: from mail.atsec.com ([195.30.252.105]) by spacemail2-out.mgmt.space.net with ESMTP; 21 Sep 2007 14:12:39 +0200 Received: from [10.2.2.88] (frueh.atsec.com [217.110.13.170]) (Authenticated sender: se@atsec.com) by mail.atsec.com (Postfix) with ESMTP id 94988720A68; Fri, 21 Sep 2007 14:12:38 +0200 (CEST) Message-ID: <46F3B520.1070708@FreeBSD.org> Date: Fri, 21 Sep 2007 14:12:16 +0200 From: Stefan Esser User-Agent: Thunderbird 2.0.0.6 (Windows/20070728) MIME-Version: 1.0 To: Ivan Voras References: <46F3A64C.4090507@fluffles.net> In-Reply-To: X-Enigmail-Version: 0.95.3 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: Writing contigiously to UFS2? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2007 12:12:40 -0000 Ivan Voras wrote: > Fluffles wrote: > >> Even worse: data is being stored at weird locations, so that my energy >> efficient NAS project becomes crippled. Even with the first 400GB of >> data, it's storing that on the first 4 disks in my concat configuration, > >> In the past when testing geom_raid5 I've tried to tune newfs >> parameters so that it would write contiguously but still there were >> regular 2-phase writes which mean data was not written contiguously. I >> really dislike this behavior. > > I agree, this is my least favorite aspect of UFS (maybe together with > nonimplementation of extents), for various reasons. I feel it's time to > start heavy lobbying for finishing FreeBSD's implementations of XFS and > raiserfs :) > > (ZFS is not the ultimate solution: 1) replacing UFS monoculture with ZFS > monoculture will sooner or later yield problems, and 2) sometimes a > "dumb" unix filesystem is preferred to the "smart" ZFS). Both XFS and ReiserFS are quite complex compared to UFS definitely not well described by the term "dumb" ;-) The FFS paper by McKusick et.al describes the historical allocation strategy, which was somewhat modified in FreeBSD a few years ago in order to adapt to modern disk sizes (larger cylinder groups, meaning it is not a good idea to create each new directory in a new cylinder group). The code that implements the block layout strategy is easily found in the sources and can be modified without too much risk to your file systems consistency ... Regards, STefan From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 12:51:53 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5D58616A468 for ; Fri, 21 Sep 2007 12:51:53 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id DC67A13C4B0 for ; Fri, 21 Sep 2007 12:51:52 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from list by ciao.gmane.org with local (Exim 4.43) id 1IYhwr-000289-9x for freebsd-fs@freebsd.org; Fri, 21 Sep 2007 14:49:33 +0200 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 21 Sep 2007 14:49:33 +0200 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 21 Sep 2007 14:49:33 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Ivan Voras Date: Fri, 21 Sep 2007 14:45:35 +0200 Lines: 64 Message-ID: References: <46F3A64C.4090507@fluffles.net> <46F3B520.1070708@FreeBSD.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-ripemd160; protocol="application/pgp-signature"; boundary="------------enigE36410BE42DF53D7D8D7BB25" X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Thunderbird 1.5.0.12 (X11/20060911) In-Reply-To: <46F3B520.1070708@FreeBSD.org> X-Enigmail-Version: 0.94.4.0 Sender: news Subject: Re: Writing contigiously to UFS2? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2007 12:51:53 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigE36410BE42DF53D7D8D7BB25 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Stefan Esser wrote: > Ivan Voras wrote: >> (ZFS is not the ultimate solution: 1) replacing UFS monoculture with Z= FS >> monoculture will sooner or later yield problems, and 2) sometimes a >> "dumb" unix filesystem is preferred to the "smart" ZFS). >=20 > Both XFS and ReiserFS are quite complex compared to UFS definitely > not well described by the term "dumb" ;-) Of course, I mean no disrespect to them, I've read enough papers on them = to realize their complexity :) By "dumb" I meant they behave like "point = them to a device and they will stick to it", i.e. they don't come with a = volume manager. > The FFS paper by McKusick et.al describes the historical allocation > strategy, which was somewhat modified in FreeBSD a few years ago in > order to adapt to modern disk sizes (larger cylinder groups, meaning > it is not a good idea to create each new directory in a new cylinder > group). [thinking out loud:] From experience (not from reading code or the docs) I conclude that=20 cylinder groups cannot be larger than around 190 MB. I know this from=20 numerous runnings of newfs and during development of gvirstor which=20 interacts with cg in an "interesting" way. I know the reasons why cgs=20 exist (mainly to lower latencies from seeking) but with todays drives=20 and memory configurations it would sometimes be nice to make them larger = or in the extreme, make just one cg that covers the entire drive.=20 Though, this extreme would in case of concat configurations put all of=20 block and inode metadata on the first drive which could have interesting = effects on performance. Of course, with seek-less drives (solid state)=20 there's no reason to have cgs at all. --------------enigE36410BE42DF53D7D8D7BB25 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFG87zvldnAQVacBcgRA+QDAJ9DAe3v2ddR9WZRzx/KlKjBye0erACfYHMg VoW1ozr75Mbml3V8oN0Sw3M= =DEqb -----END PGP SIGNATURE----- --------------enigE36410BE42DF53D7D8D7BB25-- From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 12:55:35 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BACFE16A418 for ; Fri, 21 Sep 2007 12:55:35 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id 73ACF13C458 for ; Fri, 21 Sep 2007 12:55:35 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from root by ciao.gmane.org with local (Exim 4.43) id 1IYi2A-0003br-7D for freebsd-fs@freebsd.org; Fri, 21 Sep 2007 14:55:02 +0200 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 21 Sep 2007 14:55:02 +0200 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 21 Sep 2007 14:55:02 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Ivan Voras Date: Fri, 21 Sep 2007 14:50:14 +0200 Lines: 40 Message-ID: References: <46F3A64C.4090507@fluffles.net> <46F3B4B0.40606@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-ripemd160; protocol="application/pgp-signature"; boundary="------------enigAF8FCB6DAC7E006ECCEEFF0A" X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Thunderbird 1.5.0.12 (X11/20060911) In-Reply-To: <46F3B4B0.40606@freebsd.org> X-Enigmail-Version: 0.94.4.0 Sender: news Subject: Re: Writing contigiously to UFS2? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2007 12:55:35 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigAF8FCB6DAC7E006ECCEEFF0A Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Eric Anderson wrote: > The largest file size per chunk in a cylinder group is calculated at=20 > newfs time, which determines also how many cylinder groups there should= =20 > be. I think the largest size I've seen was something in the 460MB-ish = > range, meaning any contiguous write above that would span more than one= =20 > cylinder group. Hmm, how did you manage to create a file system with such large cylinder = groups? I've experimented with smallnum-TB file systems and still=20 couldn't make them larger than around 190 MB (though I wasn't actively=20 trying, just observed how they turned out). --------------enigAF8FCB6DAC7E006ECCEEFF0A Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFG874GldnAQVacBcgRA1XJAJ9jISCU8p5tg387omks1VmQ5CL82wCeKmOT nf1/+zIzxUZdPm12/ii9SsE= =S1wz -----END PGP SIGNATURE----- --------------enigAF8FCB6DAC7E006ECCEEFF0A-- From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 13:19:20 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 25AC116A418; Fri, 21 Sep 2007 13:19:20 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from noop.in-addr.com (unknown [IPv6:2001:5c0:8fff:fffe::214d]) by mx1.freebsd.org (Postfix) with ESMTP id E406613C45B; Fri, 21 Sep 2007 13:19:19 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from gjp by noop.in-addr.com with local (Exim 4.54 (FreeBSD)) id 1IYiPf-000MU2-3D; Fri, 21 Sep 2007 09:19:19 -0400 Date: Fri, 21 Sep 2007 09:19:19 -0400 From: Gary Palmer To: Ivan Voras Message-ID: <20070921131919.GA46759@in-addr.com> Mail-Followup-To: Ivan Voras , freebsd-fs@freebsd.org References: <46F3A64C.4090507@fluffles.net> <46F3B4B0.40606@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Cc: freebsd-fs@freebsd.org Subject: Re: Writing contigiously to UFS2? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2007 13:19:20 -0000 On Fri, Sep 21, 2007 at 02:50:14PM +0200, Ivan Voras wrote: > Eric Anderson wrote: > > >The largest file size per chunk in a cylinder group is calculated at > >newfs time, which determines also how many cylinder groups there should > >be. I think the largest size I've seen was something in the 460MB-ish > >range, meaning any contiguous write above that would span more than one > >cylinder group. > > Hmm, how did you manage to create a file system with such large cylinder > groups? I've experimented with smallnum-TB file systems and still > couldn't make them larger than around 190 MB (though I wasn't actively > trying, just observed how they turned out). Presumably by using the -c parameter to newfs. The original poster might get some traction out of a combination of -c and -e parameters to newfs, although the fundamental behaviour will remain unchanged. From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 13:25:19 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D72FB16A41A for ; Fri, 21 Sep 2007 13:25:19 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id 8E80613C45A for ; Fri, 21 Sep 2007 13:25:19 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from root by ciao.gmane.org with local (Exim 4.43) id 1IYiVC-00075d-Dp for freebsd-fs@freebsd.org; Fri, 21 Sep 2007 15:25:02 +0200 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 21 Sep 2007 15:25:02 +0200 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 21 Sep 2007 15:25:02 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Ivan Voras Date: Fri, 21 Sep 2007 15:23:20 +0200 Lines: 28 Message-ID: References: <46F3A64C.4090507@fluffles.net> <46F3B4B0.40606@freebsd.org> <20070921131919.GA46759@in-addr.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-ripemd160; protocol="application/pgp-signature"; boundary="------------enigA920D43A35BA38F71F36274F" X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Thunderbird 1.5.0.12 (X11/20060911) In-Reply-To: <20070921131919.GA46759@in-addr.com> X-Enigmail-Version: 0.94.4.0 Sender: news Subject: Re: Writing contigiously to UFS2? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2007 13:25:19 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigA920D43A35BA38F71F36274F Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Gary Palmer wrote: > Presumably by using the -c parameter to newfs. Hm, I'll try it again later but I think I concluded that -c can be used=20 to lower the size of cgs, not to increase it. --------------enigA920D43A35BA38F71F36274F Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFG88XIldnAQVacBcgRA3m5AKDOAVIUCp+OlYX911PdjsoIBF1Q0QCfQdWE Ex6Wv/MOFhpycNxENV5Tqg4= =nwlK -----END PGP SIGNATURE----- --------------enigA920D43A35BA38F71F36274F-- From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 13:31:28 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5D81E16A4D1; Fri, 21 Sep 2007 13:31:28 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from noop.in-addr.com (unknown [IPv6:2001:5c0:8fff:fffe::214d]) by mx1.freebsd.org (Postfix) with ESMTP id 2580613C461; Fri, 21 Sep 2007 13:31:28 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from gjp by noop.in-addr.com with local (Exim 4.54 (FreeBSD)) id 1IYibP-000MbM-Ch; Fri, 21 Sep 2007 09:31:27 -0400 Date: Fri, 21 Sep 2007 09:31:27 -0400 From: Gary Palmer To: Ivan Voras Message-ID: <20070921133127.GB46759@in-addr.com> Mail-Followup-To: Ivan Voras , freebsd-fs@freebsd.org References: <46F3A64C.4090507@fluffles.net> <46F3B4B0.40606@freebsd.org> <20070921131919.GA46759@in-addr.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Cc: freebsd-fs@freebsd.org Subject: Re: Writing contigiously to UFS2? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2007 13:31:28 -0000 On Fri, Sep 21, 2007 at 03:23:20PM +0200, Ivan Voras wrote: > Gary Palmer wrote: > > >Presumably by using the -c parameter to newfs. > > Hm, I'll try it again later but I think I concluded that -c can be used > to lower the size of cgs, not to increase it. A CG is basically an inode table with a block allocation bitmap to keep track of what disk blocks are in use. You might have to use the -i parameter to increase the expected average file size. That should allow you to increase the CG size. Its been a LONG time since I looked at the UFS code, but I suspect the # of inodes per CG is probably capped. From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 14:27:15 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 23BCF16A417 for ; Fri, 21 Sep 2007 14:27:15 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (arm132.internetdsl.tpnet.pl [83.17.198.132]) by mx1.freebsd.org (Postfix) with ESMTP id AC5CB13C455 for ; Fri, 21 Sep 2007 14:27:14 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 704C045F44; Fri, 21 Sep 2007 16:27:12 +0200 (CEST) Received: from localhost (pjd.wheel.pl [10.0.1.1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 2482145EE5; Fri, 21 Sep 2007 16:27:07 +0200 (CEST) Date: Fri, 21 Sep 2007 16:25:40 +0200 From: Pawel Jakub Dawidek To: Johan =?iso-8859-1?Q?Str=F6m?= Message-ID: <20070921142540.GB5690@garage.freebsd.pl> References: <20070920115621.GF4517@garage.freebsd.pl> <8B5FB4B1-2398-491C-95F4-E79361606916@stromnet.se> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="VV4b6MQE+OnNyhkM" Content-Disposition: inline In-Reply-To: <8B5FB4B1-2398-491C-95F4-E79361606916@stromnet.se> User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 7.0-CURRENT i386 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-5.9 required=3.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.0.4 Cc: freebsd-fs@freebsd.org, zfs-discuss@opensolaris.org Subject: Re: ZFS (and quota) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2007 14:27:15 -0000 --VV4b6MQE+OnNyhkM Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable I'm CCing zfs-discuss@opensolaris.org, as this doesn't look like FreeBSD-specific problem. It looks there is a problem with block allocation(?) when we are near quota limit. tank/foo dataset has quota set to 10m: Without quota: FreeBSD: # dd if=3D/dev/zero of=3D/tank/test bs=3D512 count=3D20480 time: 0.7s Solaris: # dd if=3D/dev/zero of=3D/tank/test bs=3D512 count=3D20480 time: 4.5s With quota: FreeBSD: # dd if=3D/dev/zero of=3D/tank/foo/test bs=3D512 count=3D20480 dd: /tank/foo/test: Disc quota exceeded time: 306.5s Solaris: # dd if=3D/dev/zero of=3D/tank/foo/test bs=3D512 count=3D20480 write: Disc quota exceeded time: 602.7s CPU is almost entirely idle, but disk activity seems to be high. Any ideas? --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --VV4b6MQE+OnNyhkM Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFG89RkForvXbEpPzQRAh11AJ9aW0rUjUaXygtCzztm2i91nNdRtQCfV2Zn EvR4Pc+G1wI1BoKP3tdujgM= =pdDv -----END PGP SIGNATURE----- --VV4b6MQE+OnNyhkM-- From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 15:49:19 2007 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9A8FD16A420 for ; Fri, 21 Sep 2007 15:49:19 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: from mail.garage.freebsd.pl (arm132.internetdsl.tpnet.pl [83.17.198.132]) by mx1.freebsd.org (Postfix) with ESMTP id 01D5213C4A7 for ; Fri, 21 Sep 2007 15:49:18 +0000 (UTC) (envelope-from pjd@garage.freebsd.pl) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 25F6D45EE5; Fri, 21 Sep 2007 17:49:16 +0200 (CEST) Received: from localhost (154.81.datacomsa.pl [195.34.81.154]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 9884345CD9; Fri, 21 Sep 2007 17:49:02 +0200 (CEST) Date: Fri, 21 Sep 2007 17:47:33 +0200 From: Pawel Jakub Dawidek To: freebsd-fs@FreeBSD.org Message-ID: <20070921154733.GA9445@garage.freebsd.pl> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="YiEDa0DAkWCtVeE4" Content-Disposition: inline User-Agent: Mutt/1.4.2.3i X-PGP-Key-URL: http://people.freebsd.org/~pjd/pjd.asc X-OS: FreeBSD 7.0-CURRENT i386 X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-2.6 required=3.0 tests=BAYES_00 autolearn=ham version=3.0.4 Cc: zfs-discuss@opensolaris.org Subject: The ZFS-Man. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2007 15:49:19 -0000 --YiEDa0DAkWCtVeE4 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi. I gave a talk about ZFS during EuroBSDCon 2007, and because it won the the best talk award and some find it funny, here it is: http://youtube.com/watch?v=3Do3TGM0T1CvE a bit better version is here: http://people.freebsd.org/~pjd/misc/zfs/zfs-man.swf BTW. Inspired by ZFS demos from OpenSolaris page I created few demos of ZFS on FreeBSD: http://youtube.com/results?search_query=3Dfreebsd+zfs&search=3DSearch And better versions: http://people.freebsd.org/~pjd/misc/zfs/ --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --YiEDa0DAkWCtVeE4 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFG8+eVForvXbEpPzQRAr8OAJ9QgQdogxgxS0clfnB+2zFL+D0J2wCaAqJD nJfwktgF4q8bv64zduM5b5Q= =aCb6 -----END PGP SIGNATURE----- --YiEDa0DAkWCtVeE4-- From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 16:11:42 2007 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B176616A420 for ; Fri, 21 Sep 2007 16:11:42 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au [211.29.132.185]) by mx1.freebsd.org (Postfix) with ESMTP id 4A38D13C458 for ; Fri, 21 Sep 2007 16:11:42 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c220-239-235-248.carlnfd3.nsw.optusnet.com.au (c220-239-235-248.carlnfd3.nsw.optusnet.com.au [220.239.235.248]) by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id l8LGBbAn000989 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 22 Sep 2007 02:11:40 +1000 Date: Sat, 22 Sep 2007 02:11:37 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Ivan Voras In-Reply-To: Message-ID: <20070921230757.Q43374@delplex.bde.org> References: <46F3A64C.4090507@fluffles.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@FreeBSD.org Subject: Re: Writing contigiously to UFS2? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2007 16:11:42 -0000 On Fri, 21 Sep 2007, Ivan Voras wrote: > Fluffles wrote: > >> Even worse: data is being stored at weird locations, so that my energy >> efficient NAS project becomes crippled. Even with the first 400GB of data, >> it's storing that on the first 4 disks in my concat configuration, > >> In the past when testing geom_raid5 I've tried to tune newfs parameters so >> that it would write contiguously but still there were regular 2-phase >> writes which mean data was not written contiguously. I really dislike this >> behavior. > > I agree, this is my least favorite aspect of UFS (maybe together with > nonimplementation of extents), for various reasons. I feel it's time to start > heavy lobbying for finishing FreeBSD's implementations of XFS and raiserfs :) Why not improve the implementation of ffs? Cylinder groups are fundamental to ffs, and I think having too-many too-small ones is fairly fundamental, but the allocation policy across them can be anything, and large cylinder groups could be faked using small ones. The current (and very old?) allocation policy for extending files is to consider allocating the block in a new cg when the current cg has more than fs_maxbpg blocks allocated in it (newfs and tunefs parameter -e maxbpg: default 1/4 of the number of blocks in a cg = bpg). Then preference is given to the next cg with more than the average number of free blocks. This seems to be buggy. From ffs_blkpref_ufs1(): % if (indx % fs->fs_maxbpg == 0 || bap[indx - 1] == 0) { % if (lbn < NDADDR + NINDIR(fs)) { % cg = ino_to_cg(fs, ip->i_number); % return (cgbase(fs, cg) + fs->fs_frag); % } I think "indx" here is the index into an array of block pointers in the inode or an indirect block. So for extending large files it is always into an indirect block. It gets reset to 0 for each new indirect block. This makes its use in (indx % fs->fs_maxbpg == 0) dubious. The condition is satisfied whenever: - indx == 0, i.e., always at the start of a new indirect block. Not too bad, but not what we want if fs_maxbpg is much larger than the number of indexes in an indirect block. - index == nonzero multiple of number of indexes in an indirect block. This condition is never be satisfied if fs_maxbpg is larger than the number of indexes in an indirect block. This is the usual case for ffs2 (only 2K indexes in 16K-blocks, and fairly large cg's). On an ffs1 fs that I have handy, maxbpg is 2K and the number of indexes is 4K, so this condition is satisfied once. The (bap[indx - 1] == 0) condition causes a move to a new cg after every hole. This may help by leaving space to fill in the hole, but it is wrong if the hole will never be filled in or is small. This seems to be just a vestige of code that implemented the old rotdelay pessimization. Comments saying that we use fs_maxcontig near here are obviously vestiges of the pessimization. % /* % * Find a cylinder with greater than average number of % * unused data blocks. % */ % if (indx == 0 || bap[indx - 1] == 0) % startcg = % ino_to_cg(fs, ip->i_number) + lbn / fs->fs_maxbpg; At the start of an indirect block, and after a hole, we don't know where the previous block was so we use the cg of the inode advanced by the estimate (lbn / fs->fs_maxbpg) of how far we have advanced from the cg of the inode. I think this estimate is too primitive to work right even a small fraction of the time. Adjustment factors related to the number of maxbpg's per block of indexes and the fullness of the disk seem to be required. Keeping track of the cg of the previous block would be better. % else % startcg = dtog(fs, bap[indx - 1]) + 1; Now there is no problem finding the cg of the previous block. Note that we always add 1... % startcg %= fs->fs_ncg; % avgbfree = fs->fs_cstotal.cs_nbfree / fs->fs_ncg; % for (cg = startcg; cg < fs->fs_ncg; cg++) ... so the search gives maximal non-preference to the cg of the previous block. I think things would work much better if we considered the current cg, if any, first (current cg = one containing previous block), and we actually know that cg. This would be easy to try -- just don't add 1. Also try not adding the bad estimate (lbn / fs->fs_maxbpg), so that the search starts at the inode's cg in some cases -- then previous cg's will be reconsidered but hopefully the average limit will prevent them being used. Note that in the calculation of avgbfree, division by ncg gives a granularity of ncg, so there is an inertia of ncg blocks against moving to the next cg. A too-large ncg is a feature here. BTW, I recently found the bug that broke the allocation policy in FreeBSD's implementation of ext2fs. I thought that the bug was missing code/a too simple implementation (one without a search like the above), but it turned out to be just a bug. The search wasn't set up right, so the current cg was always preferred. Always preferring the current cg tends to give contiguous allocation of data blocks, and this works very well for small file systems, but for large file systems the data blocks end up too far away from inodes (since there is a limited number of inodes per cg and the per-cg inode and data block allocations fill up at different rates. Bruce From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 16:21:50 2007 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A070D16A469; Fri, 21 Sep 2007 16:21:50 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail18.syd.optusnet.com.au (mail18.syd.optusnet.com.au [211.29.132.199]) by mx1.freebsd.org (Postfix) with ESMTP id 0C5BB13C494; Fri, 21 Sep 2007 16:21:49 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c220-239-235-248.carlnfd3.nsw.optusnet.com.au (c220-239-235-248.carlnfd3.nsw.optusnet.com.au [220.239.235.248]) by mail18.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id l8LGLgEs019551 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 22 Sep 2007 02:21:42 +1000 Date: Sat, 22 Sep 2007 02:21:41 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Eric Anderson In-Reply-To: <46F3B4B0.40606@freebsd.org> Message-ID: <20070922021201.C43374@delplex.bde.org> References: <46F3A64C.4090507@fluffles.net> <46F3B4B0.40606@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@FreeBSD.org, Fluffles , Ivan Voras Subject: Re: Writing contigiously to UFS2? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2007 16:21:50 -0000 On Fri, 21 Sep 2007, Eric Anderson wrote: > I recommend trying msdos fs. On recent -CURRENT, it should perform fairly > well (akin to UFS2 I think), and if I recall correctly, has a more contiguous > block layout. It can give perfect contiguity for data blocks, but has serious slowness for non-sequential access to large files, and anyway "large" for msdosfs is only 4GB. Bruce From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 16:30:59 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1C47516A421; Fri, 21 Sep 2007 16:30:59 +0000 (UTC) (envelope-from anderson@freebsd.org) Received: from ns.trinitel.com (186.161.36.72.static.reverse.ltdomains.com [72.36.161.186]) by mx1.freebsd.org (Postfix) with ESMTP id DF57F13C468; Fri, 21 Sep 2007 16:30:58 +0000 (UTC) (envelope-from anderson@freebsd.org) Received: from proton.storspeed.com (209-163-168-124.static.twtelecom.net [209.163.168.124]) (authenticated bits=0) by ns.trinitel.com (8.14.1/8.14.1) with ESMTP id l8LGUrDH082328; Fri, 21 Sep 2007 11:30:55 -0500 (CDT) (envelope-from anderson@freebsd.org) Message-ID: <46F3F1BD.3060306@freebsd.org> Date: Fri, 21 Sep 2007 11:30:53 -0500 From: Eric Anderson User-Agent: Thunderbird 2.0.0.6 (Macintosh/20070728) MIME-Version: 1.0 To: Bruce Evans References: <46F3A64C.4090507@fluffles.net> <46F3B4B0.40606@freebsd.org> <20070922021201.C43374@delplex.bde.org> In-Reply-To: <20070922021201.C43374@delplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.8 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on ns.trinitel.com Cc: freebsd-fs@freebsd.org, Fluffles , Ivan Voras Subject: Re: Writing contigiously to UFS2? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2007 16:30:59 -0000 Bruce Evans wrote: > On Fri, 21 Sep 2007, Eric Anderson wrote: > >> I recommend trying msdos fs. On recent -CURRENT, it should perform >> fairly well (akin to UFS2 I think), and if I recall correctly, has a >> more contiguous block layout. > > It can give perfect contiguity for data blocks, but has serious slowness > for > non-sequential access to large files, and anyway "large" for msdosfs is > only 4GB. Oops - forgot about the 4GB limit. I was also assuming that the random read in a big file problem wasn't an issue due to the configuration noted by the original poster.. but maybe that's a bad assumption. Eric From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 16:45:36 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E8C2E16A417 for ; Fri, 21 Sep 2007 16:45:36 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id 9DBAD13C447 for ; Fri, 21 Sep 2007 16:45:36 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from list by ciao.gmane.org with local (Exim 4.43) id 1IYldB-000442-GZ for freebsd-fs@freebsd.org; Fri, 21 Sep 2007 16:45:29 +0000 Received: from 78-1-115-225.adsl.net.t-com.hr ([78.1.115.225]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 21 Sep 2007 16:45:29 +0000 Received: from ivoras by 78-1-115-225.adsl.net.t-com.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 21 Sep 2007 16:45:29 +0000 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Ivan Voras Date: Fri, 21 Sep 2007 18:45:11 +0200 Lines: 34 Message-ID: References: <20070921154733.GA9445@garage.freebsd.pl> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig1862C6828021DA8DCE31092A" X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: 78-1-115-225.adsl.net.t-com.hr User-Agent: Thunderbird 2.0.0.6 (Windows/20070728) In-Reply-To: <20070921154733.GA9445@garage.freebsd.pl> X-Enigmail-Version: 0.95.3 Sender: news Subject: Re: The ZFS-Man. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2007 16:45:37 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig1862C6828021DA8DCE31092A Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Pawel Jakub Dawidek wrote: > Hi. >=20 > I gave a talk about ZFS during EuroBSDCon 2007, and because it won the > the best talk award and some find it funny, here it is: >=20 > http://youtube.com/watch?v=3Do3TGM0T1CvE Just perfect! Thank you, I've disseminated the links wherever I can :) --------------enig1862C6828021DA8DCE31092A Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG8/UcldnAQVacBcgRAmwCAKCQqSX1oGNnKeq5Ms33oay1KpGnegCfbIkN H8h/Xtuz2TifHXJoGOmkrEc= =B2Fo -----END PGP SIGNATURE----- --------------enig1862C6828021DA8DCE31092A-- From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 18:06:45 2007 Return-Path: Delivered-To: freebsd-fs@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8A30416A41B for ; Fri, 21 Sep 2007 18:06:45 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (lurza.secnetix.de [83.120.8.8]) by mx1.freebsd.org (Postfix) with ESMTP id E069C13C48A for ; Fri, 21 Sep 2007 18:06:44 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (localhost [127.0.0.1]) by lurza.secnetix.de (8.14.1/8.14.1) with ESMTP id l8LHaogv098458; Fri, 21 Sep 2007 19:36:55 +0200 (CEST) (envelope-from oliver.fromme@secnetix.de) Received: (from olli@localhost) by lurza.secnetix.de (8.14.1/8.14.1/Submit) id l8LHaoJ6098457; Fri, 21 Sep 2007 19:36:50 +0200 (CEST) (envelope-from olli) Date: Fri, 21 Sep 2007 19:36:50 +0200 (CEST) Message-Id: <200709211736.l8LHaoJ6098457@lurza.secnetix.de> From: Oliver Fromme To: freebsd-fs@FreeBSD.ORG, etc@fluffles.net In-Reply-To: <46F24F2C.40205@fluffles.net> X-Newsgroups: list.freebsd-fs User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX) (FreeBSD/6.2-STABLE-20070808 (i386)) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.1.2 (lurza.secnetix.de [127.0.0.1]); Fri, 21 Sep 2007 19:36:56 +0200 (CEST) Cc: Subject: Re: Writing contigiously to UFS2? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: freebsd-fs@FreeBSD.ORG, etc@fluffles.net List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2007 18:06:45 -0000 Fluffles wrote: > I've setup a concat of 8 disks for my new NAS, using ataidle to spindown > the disks not needed. This allows me to save power and noise/heat by > running only the drives that are actually in use. > > My problem is UFS. UFS2 seems to write to 4 disks, even though all the > data written so far can easily fit on just one disk. What's going on > here? I looked at newfs parameters, but in the past was unable to make > newfs write contigiously. It seems UFS2 always writes to a new cylinder. > Is there any way to force UFS to write contigiously? Or at least limit > the problem? > > If i write 400GB to a 4TB volume consisting of 8x 500GB disks, i want > all data to be on the first disk. You should be able to achieve that by putting a gvirstor onto your drives, having the physical size of those eight drives. Then newfs that gvirstor device. I haven't used gvirstor myself, but if I understand it correctly, it should start filling its providers from the start, and only begin using the next one when the previous ones are all completely used. So it should do exactly what you want. http://wiki.freebsd.org/gvirstor Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "One of the main causes of the fall of the Roman Empire was that, lacking zero, they had no way to indicate successful termination of their C programs." -- Robert Firth From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 18:10:23 2007 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 69F9016A419; Fri, 21 Sep 2007 18:10:23 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail11.syd.optusnet.com.au (mail11.syd.optusnet.com.au [211.29.132.192]) by mx1.freebsd.org (Postfix) with ESMTP id F39DA13C4A5; Fri, 21 Sep 2007 18:10:22 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c220-239-235-248.carlnfd3.nsw.optusnet.com.au (c220-239-235-248.carlnfd3.nsw.optusnet.com.au [220.239.235.248]) by mail11.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id l8LIAJpc028903 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 22 Sep 2007 04:10:21 +1000 Date: Sat, 22 Sep 2007 04:10:19 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Gary Palmer In-Reply-To: <20070921133127.GB46759@in-addr.com> Message-ID: <20070922022524.X43853@delplex.bde.org> References: <46F3A64C.4090507@fluffles.net> <46F3B4B0.40606@freebsd.org> <20070921131919.GA46759@in-addr.com> <20070921133127.GB46759@in-addr.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@FreeBSD.org, Ivan Voras Subject: Re: Writing contigiously to UFS2? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2007 18:10:23 -0000 On Fri, 21 Sep 2007, Gary Palmer wrote: > On Fri, Sep 21, 2007 at 03:23:20PM +0200, Ivan Voras wrote: >> Gary Palmer wrote: >> >>> Presumably by using the -c parameter to newfs. >> >> Hm, I'll try it again later but I think I concluded that -c can be used >> to lower the size of cgs, not to increase it. Yes, it used to default to a small value, but that became very pessimal when disks became larger than a whole 1GB or so, so obrien changed it to default to the maximum possible value. I think it hasn't been changed back down. > A CG is basically an inode table with a block allocation bitmap to keep > track of what disk blocks are in use. You might have to use the -i > parameter to increase the expected average file size. That should > allow you to increase the CG size. Its been a LONG time since I looked > at the UFS code, but I suspect the # of inodes per CG is probably capped. The limit seems to be only that struct cg (mainly the struct hack stuff at the end) fits in a single block. The non-struct parts of this struct consist mainly of the inode, block and cluster bitmaps. The block bitmap is normally the largest by far, since it actually maps fragments. With 16K-blocks and 2K-frags, at most 128K frags = 256MB of disk can be mapped. I get 180MB in practice, with an inode bitmap size of only 3K, so there is not much to be gained by tuning -i but more to be gained by tuning -b and -f (several doublings are reasonable). However, I think small cg's are not a problem for huge files, except for bugs. Bruce From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 18:33:24 2007 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B640916A476 for ; Fri, 21 Sep 2007 18:33:24 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id 6B37413C478 for ; Fri, 21 Sep 2007 18:33:24 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from list by ciao.gmane.org with local (Exim 4.43) id 1IYnJU-0004Jk-V6 for freebsd-fs@freebsd.org; Fri, 21 Sep 2007 18:33:16 +0000 Received: from 78-1-115-225.adsl.net.t-com.hr ([78.1.115.225]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 21 Sep 2007 18:33:16 +0000 Received: from ivoras by 78-1-115-225.adsl.net.t-com.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 21 Sep 2007 18:33:16 +0000 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Ivan Voras Date: Fri, 21 Sep 2007 20:32:59 +0200 Lines: 33 Message-ID: References: <46F24F2C.40205@fluffles.net> <200709211736.l8LHaoJ6098457@lurza.secnetix.de> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig7508AE0FA1BB31F3ED2E0FF0" X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: 78-1-115-225.adsl.net.t-com.hr User-Agent: Thunderbird 2.0.0.6 (Windows/20070728) In-Reply-To: <200709211736.l8LHaoJ6098457@lurza.secnetix.de> X-Enigmail-Version: 0.95.3 Sender: news Subject: Re: Writing contigiously to UFS2? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2007 18:33:24 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig7508AE0FA1BB31F3ED2E0FF0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Oliver Fromme wrote: > I haven't used gvirstor myself, but if I understand it > correctly, it should start filling its providers from the > start, and only begin using the next one when the previous > ones are all completely used. So it should do exactly > what you want. Yes, with the side-effect of putting all cgs and their metadata on the beginning of the first drive. An obvious side-effect is that, writing to any drive other than the first will also touch the first drive. --------------enig7508AE0FA1BB31F3ED2E0FF0 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG9A5gldnAQVacBcgRAgNFAJoDiA6qcRbQD72e0exeo4dnmrtvggCg3gn+ AVEaLYlJLPxa/pdx6QNe1Ws= =7IIV -----END PGP SIGNATURE----- --------------enig7508AE0FA1BB31F3ED2E0FF0-- From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 20:08:32 2007 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4784216A417 for ; Fri, 21 Sep 2007 20:08:32 +0000 (UTC) (envelope-from tmcmahon2@yahoo.com) Received: from smtp104.plus.mail.re1.yahoo.com (smtp104.plus.mail.re1.yahoo.com [69.147.102.67]) by mx1.freebsd.org (Postfix) with SMTP id D265913C448 for ; Fri, 21 Sep 2007 20:08:31 +0000 (UTC) (envelope-from tmcmahon2@yahoo.com) Received: (qmail 69011 invoked from network); 21 Sep 2007 19:41:51 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Received:X-YMail-OSG:Message-ID:Date:From:User-Agent:MIME-Version:To:CC:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding; b=xMpUVXCp5roGGM10ug8Vldp0A+Lo3pWP49FstCJoHC3FQF+Y6uCS6g1MAJ18D29BT6uzhotQOSzpAhA7NT7UuhzO8L6qQemPVzknQz2LxC5JDhCAGsM2iWP1ayH6Exo0PxcYWHTURFrETIctQfgLczWLEdrxQ4azmGbx1D34Wws= ; Received: from unknown (HELO ?192.168.1.100?) (tmcmahon2@68.50.242.72 with plain) by smtp104.plus.mail.re1.yahoo.com with SMTP; 21 Sep 2007 19:41:50 -0000 X-YMail-OSG: cR17mbYVM1mK91.bu6feIpGmoZq9VUIzW.IvCfDFIZpH3xOxXo.sXbs_aqNAYVoTQ8_3STT0Ap0TV7dYTS4L2nZXfWCEf30OGTVK3Oe7uxg8IlJ4FWCNhqc8155S Message-ID: <46F41E7F.7050504@yahoo.com> Date: Fri, 21 Sep 2007 15:41:51 -0400 From: Torrey McMahon User-Agent: Thunderbird 2.0.0.6 (Windows/20070728) MIME-Version: 1.0 To: Jonathan Edwards References: <20070921154733.GA9445@garage.freebsd.pl> <5B5809AA-A1B2-4600-8974-C84DE9E7A05A@sun.com> <7C7432FE-F183-4A09-9432-75FE21F8A0F9@sun.com> In-Reply-To: <7C7432FE-F183-4A09-9432-75FE21F8A0F9@sun.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, zfs-discuss@opensolaris.org, Pawel Jakub Dawidek , eric kustarz Subject: Re: [zfs-discuss] The ZFS-Man. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2007 20:08:32 -0000 Jonathan Edwards wrote: > On Sep 21, 2007, at 14:57, eric kustarz wrote: > > >>> Hi. >>> >>> I gave a talk about ZFS during EuroBSDCon 2007, and because it won >>> the >>> the best talk award and some find it funny, here it is: >>> >>> http://youtube.com/watch?v=o3TGM0T1CvE >>> >>> a bit better version is here: >>> >>> http://people.freebsd.org/~pjd/misc/zfs/zfs-man.swf >>> >> Looks like Jeff has been working out :) >> > > my first thought too: > http://blogs.sun.com/bonwick/resource/images/bonwick.portrait.jpg > > funny - i always pictured this as UFS-man though: > http://www.benbakerphoto.com/business/47573_8C-after.jpg > > but what's going on with the sheep there? Got me but they do look kind of nervous. (Happy friday folks...) From owner-freebsd-fs@FreeBSD.ORG Sat Sep 22 12:37:28 2007 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 692A216A419; Sat, 22 Sep 2007 12:37:28 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail06.syd.optusnet.com.au (mail06.syd.optusnet.com.au [211.29.132.187]) by mx1.freebsd.org (Postfix) with ESMTP id DC64413C45A; Sat, 22 Sep 2007 12:37:27 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c220-239-235-248.carlnfd3.nsw.optusnet.com.au [220.239.235.248]) by mail06.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id l8MCbMgF017230 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 22 Sep 2007 22:37:25 +1000 Date: Sat, 22 Sep 2007 22:37:22 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans In-Reply-To: <20070921230757.Q43374@delplex.bde.org> Message-ID: <20070922212133.L90921@besplex.bde.org> References: <46F3A64C.4090507@fluffles.net> <20070921230757.Q43374@delplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@FreeBSD.org, Ivan Voras Subject: Re: Writing contigiously to UFS2? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Sep 2007 12:37:28 -0000 On Sat, 22 Sep 2007, Bruce Evans wrote: > The current (and very old?) allocation policy for extending files is > to consider allocating the block in a new cg when the current cg has > more than fs_maxbpg blocks allocated in it (newfs and tunefs parameter > -e maxbpg: default 1/4 of the number of blocks in a cg = bpg). Then > preference is given to the next cg with more than the average number > of free blocks. This seems to be buggy. From ffs_blkpref_ufs1(): Actually, it is almost as good as possible. Note that ffs_blkpref_*() only gives a preference, so it shouldn't try too hard. Also, it is only used if block reallocation is enabled (the default: sysctl ffs.vfs.doreallocblks=1). Then it gives delayed allocation. Block reallocation generally does a good job of realocating blocks contiguously. I don't know exactly where/how the allocation is done if block reallocation is not enabled, but certainly, maxbpg is not used then. > % if (indx % fs->fs_maxbpg == 0 || bap[indx - 1] == 0) { > % if (lbn < NDADDR + NINDIR(fs)) { > % cg = ino_to_cg(fs, ip->i_number); > % return (cgbase(fs, cg) + fs->fs_frag); > % } > > I think "indx" here is the index into an array of block pointers in > the inode or an indirect block. This is correct. > So for extending large files it is > always into an indirect block. It gets reset to 0 for each new indirect > block. This makes its use in (indx % fs->fs_maxbpg == 0) dubious. > The condition is satisfied whenever: > - indx == 0, i.e., always at the start of a new indirect block. Not > too bad, but not what we want if fs_maxbpg is much larger than the > number of indexes in an indirect block. Actually, this case is handled quite well later. > - index == nonzero multiple of number of indexes in an indirect block. > This condition is never be satisfied if fs_maxbpg is larger than > the number of indexes in an indirect block. This is the usual case > for ffs2 (only 2K indexes in 16K-blocks, and fairly large cg's). On > an ffs1 fs that I have handy, maxbpg is 2K and the number of indexes > is 4K, so this condition is satisfied once. This case is not handled well. The bug for it is mainly in newfs. >From newfs.c: % /* % * MAXBLKPG determines the maximum number of data blocks which are % * placed in a single cylinder group. The default is one indirect % * block worth of data blocks. % */ % #define MAXBLKPG(bsize) ((bsize) / sizeof(ufs2_daddr_t)) The comment is correct, but the code is wrong for ffs2. Then MAXBLKPG defaults to half an indirect block worth of data blocks. I just use the default, so my ffs1 fs has maxbpg instead of 4K. > The (bap[indx - 1] == 0) condition causes a move to a new cg after > every hole. Actually, this case is handled well later. > This may help by leaving space to fill in the hole, but > it is wrong if the hole will never be filled in or is small. This > seems to be just a vestige of code that implemented the old rotdelay > pessimization. Actually, it is still needed for using bap[indx - 1] at the end of function. > Comments saying that we use fs_maxcontig near here > are obviously vestiges of the pessimization. > > % /* > % * Find a cylinder with greater than average number of > % * unused data blocks. > % */ > % if (indx == 0 || bap[indx - 1] == 0) > % startcg = > % ino_to_cg(fs, ip->i_number) + lbn / > fs->fs_maxbpg; > > At the start of an indirect block, and after a hole, we don't know where > the previous block was so we use the cg of the inode advanced by the > estimate (lbn / fs->fs_maxbpg) of how far we have advanced from the cg > of the inode. I think this estimate is too primitive to work right even > a small fraction of the time. Adjustment factors related to the number > of maxbpg's per block of indexes and the fullness of the disk seem to > be required. Keeping track of the cg of the previous block would be better. Actually, this estimate works very well. We _want_ to change to a new cg after every maxpg blocks. The estimate gives the closest cg that is possible if all the blocks are allocated as contiguously as we want. If the disk is nearly full we will probably have to go further. Starting the search at the closest cg that we want gives a bias towards close cg's that are not too close. > % else > % startcg = dtog(fs, bap[indx - 1]) + 1; > > Now there is no problem finding the cg of the previous block. Note that > we always add 1... > > % startcg %= fs->fs_ncg; > % avgbfree = fs->fs_cstotal.cs_nbfree / fs->fs_ncg; > % for (cg = startcg; cg < fs->fs_ncg; cg++) > > ... so the search gives maximal non-preference to the cg of the previous > block. I think things would work much better if we considered the > current cg, if any, first (current cg = one containing previous block), and > we actually know that cg. This would be easy to try -- just don't add 1. > Also try not adding the bad estimate (lbn / fs->fs_maxbpg), so that the > search starts at the inode's cg in some cases -- then previous cg's will > be reconsidered but hopefully the average limit will prevent them being > used. Actually, adding 1 is correct in most cases. Here we think we have just allocated maxcontig blocks in the current cg, so we _want_ to advance to the next cg. The problem is that we don't really know that we have allocated that many blocks. We have lots of previous block numbers in bap[] and could inspect many of them, but we only inspect the previous one. The corresponding code in 4.4BSD is better -- it inspects the one some distance before the previous one. The corresponding diistance here is maxbpg. We could inspect the blocks at 1 previous and maxbpg previous to quickly estimate if we have allocated all of the previous maxbpg blocks in the same cylinder group. > Note that in the calculation of avgbfree, division by ncg gives a > granularity of ncg, so there is an inertia of ncg blocks against moving > to the next cg. A too-large ncg is a feature here. This feature shouldn't make much difference, but we don't want it if we are certain that we have just allocated maxbpg blocks in a cg. Analysis of block layouts for a 200MB file shows no large problems in this area, but some small ones. This is with some problems already fixed. 200MB is a bit small but gives data small enough to understand easily. The analysis is limited to ffs1 since I only have a layout- printing program for that. I don't use ffs2 and haven't fixed the "some" problems for it. Perhaps they are the ones that matter here. (For what they are, see below.) ffs1, no soft updates (all tests on an almost-new fs): % fs_bsize = 16384 % fs_fsize = 2048 % 4: lbn 0-11 blkno 1520-1615 % lbn [<1>indir]12-4107 blkno 1616-1623 % lbn 12-4107 blkno 1624-34391 Everything is perfectly configuous until here. Without my fixes, the first indirect block in the middle tends to be allocated discontiguously. Here lbn's have size fs_bsize = 16K, and blkno's have size fs_fsize = 2K; "4:" is just the inode number; "[indir>]" is an nth indirect block. % lbn [<2>indir]4108-16781323 blkno 189592-189599 Bug. cg's have size about 94000 in blkno units. We have skipped the entire second cg. % lbn [<1>indir]4108-8203 blkno 189600-189607 % lbn 4108-6155 blkno 189608-205991 All contiguous. % lbn 6156-8203 blkno 283640-300023 This is from the newfs bug (default maxbpg = half an indirect block's worth of blkno's). Here we advance to the next cg half way through the indirect block. The advance is only about 90000 blkno's so it correctly doesn't skip a cg. % lbn [<1>indir]8204-12299 blkno 377688-377695 % lbn 8204-10251 blkno 377696-394079 % lbn 10252-12299 blkno 471736-488119 % lbn [<1>indir]12300-16395 blkno 565784-565791 % lbn 12300-12799 blkno 565792-569791 The pattern continues with no problems except the default maxbpg being to small. This does almost what the OP wants -- with a huge disk, even huge files fit in a few cg's (lots of cg's but few compared with the total number). With tunefs -e , I think the layout would be perfectly contiguous except for the skip after the first cg. My fix is only for the first indirect block, so it doesn't make much difference for large files. With the default maxbpg, later indirect blocks are always allocated in a new cg anyway. Hopefully the "primitive" estimate prevents this so that all indirect blocks have a chance of being allocated contiguously, and other code cooperates by not moving them. ffs1, soft updates: % fs_bsize = 16384 % fs_fsize = 2048 % 5: lbn 0-11 blkno 34392-34487 For some reason, the file is started later in the first cg. % lbn [<1>indir]12-4107 blkno 34488-34495 % lbn 12-4107 blkno 34496-67263 Contiguous. Without my fix, soft updates seems to move the first indirect block further away, and thus is noticeably slower. % lbn [<2>indir]4108-16781323 blkno 285592-285599 Soft updates has skipped not just the second cg but the third one too. % lbn [<1>indir]4108-8203 blkno 285600-285607 % lbn 4108-6155 blkno 285608-301991 % lbn 6156-8203 blkno 377688-394071 % lbn [<1>indir]8204-12299 blkno 471736-471743 % lbn 8204-10251 blkno 471744-488127 % lbn 10252-12299 blkno 565784-582167 % lbn [<1>indir]12300-16395 blkno 659832-659839 % lbn 12300-12799 blkno 659840-663839 The pattern continues (no more skips). ffs1, no soft updates, maxbpg = 655360: % fs_bsize = 16384 % fs_fsize = 2048 % 4: lbn 0-11 blkno 1520-1615 % lbn [<1>indir]12-4107 blkno 1616-1623 % lbn 12-4107 blkno 1624-34391 % lbn [<2>indir]4108-16781323 blkno 95544-95551 % lbn [<1>indir]4108-8203 blkno 95552-95559 % lbn 4108-8203 blkno 95560-128327 % lbn [<1>indir]8204-12299 blkno 189592-189599 % lbn 8204-12299 blkno 189600-222367 % lbn [<1>indir]12300-16395 blkno 283640-283647 % lbn 12300-12799 blkno 283648-287647 The "primitive" estimate isn't helping -- a new cg is started for every indirect block. Bruce