From owner-freebsd-fs@FreeBSD.ORG  Sun May  1 00:09:22 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BE0FD1065700
	for <freebsd-fs@freebsd.org>; Sun,  1 May 2011 00:09:22 +0000 (UTC)
	(envelope-from mm@FreeBSD.org)
Received: from mail.vx.sk (mail.vx.sk [IPv6:2a01:4f8:100:1043::3])
	by mx1.freebsd.org (Postfix) with ESMTP id 1999F8FC0C
	for <freebsd-fs@freebsd.org>; Sun,  1 May 2011 00:09:22 +0000 (UTC)
Received: from core.vx.sk (localhost [127.0.0.1])
	by mail.vx.sk (Postfix) with ESMTP id 4C19314D14A;
	Sun,  1 May 2011 02:09:21 +0200 (CEST)
X-Virus-Scanned: amavisd-new at mail.vx.sk
Received: from mail.vx.sk ([127.0.0.1])
	by core.vx.sk (mail.vx.sk [127.0.0.1]) (amavisd-new, port 10024)
	with LMTP id S9qLTD8AEB-U; Sun,  1 May 2011 02:09:19 +0200 (CEST)
Received: from [10.9.8.1] (chello085216231078.chello.sk [85.216.231.78])
	by mail.vx.sk (Postfix) with ESMTPSA id 8D7C614D133;
	Sun,  1 May 2011 02:09:18 +0200 (CEST)
Message-ID: <4DBCA4AE.3090506@FreeBSD.org>
Date: Sun, 01 May 2011 02:09:18 +0200
From: Martin Matuska <mm@FreeBSD.org>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; sk;
	rv:1.8.1.23) Gecko/20090812 Lightning/0.9 Thunderbird/2.0.0.23
	Mnenhy/0.7.5.0
MIME-Version: 1.0
To: Pierre Lamy <pierre@userid.org>
References: <4DB8EF02.8060406@bk.ru>
	<ipf6i6$54v$1@dough.gmane.org>	<20110430001524.GA58845@icarus.home.lan>
	<4DBC2E46.9060404@userid.org>
In-Reply-To: <4DBC2E46.9060404@userid.org>
X-Enigmail-Version: 1.1.2
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Cc: freebsd-fs@freebsd.org, Volodymyr Kostyrko <c.kworr@gmail.com>
Subject: Re: ZFS v28 for 8.2-STABLE
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 May 2011 00:09:22 -0000

We plan to MFC v28.

But as this change is quite intrusive to the users, there is no way back
if you upgrade your pool (not upgrading bootcode = not able to boot =
saved by mfsBSD). It will happen when we think it is stable enough to be
in STABLE.

As of me, I am not using it in serious production yet (I am very happy
with v15 + latest patches), but my development servers with v28 seem
pretty stable.

I have updated patch to reflect latest changes (grab latest one):
http://people.freebsd.org/~mm/patches/zfs/v28/

As to your setup, have you tried using a partition as a log device?

File-based devices are generally considered experimental in all ZFS
implementations (including Solaris).

Dňa 30.04.2011 17:44, Pierre Lamy  wrote / napísal(a):
> On 4/29/2011 8:15 PM, Jeremy Chadwick wrote:
>> On Fri, Apr 29, 2011 at 11:20:21PM +0300, Volodymyr Kostyrko wrote:
>>> 28.04.2011 07:37, Ruslan Yakovlev wrote:
>>>> Does actually patch exist for 8.2-STABLE ?
>>>> I probe
>>>> http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20110317.patch.xz
>>>>
>>>>
>>>> Building failed with:
>>>> can't cd to /usr/src/cddl/usr.bin/zstreamdump
>>>> Also sys/cddl/compat/opensolaris/sys/sysmacros.h failed to patch.
>>>>
>>>> Current FreeBSD 8.2-STABLE #35 Mon Apr 18 03:40:38 EEST 2011 i386
>>>> periodically frozen on high load like backup by rsync or find -sx ...
>>>> (from default cron tasks).
>>> Well ZFSv28 should be very close to STABLE for now?
>>>
>>> http://lists.freebsd.org/pipermail/freebsd-current/2011-February/023152.html
>>>
>> It's now a matter of opinion.  The whole idea of ZFSv28 being committed
>> to HEAD was to be tested.  I haven't seen any indication of a progress
>> report provided for anything on HEAD that pertains to ZFSv28, have you?
>>
>> Furthermore, the FreeBSD Quarterly Status Report just came out on 04/27
>> for the months of January-March (almost a 2 month delay, sigh):
>>
>> 1737     04/27 10:58  Daniel Gerzo        ( 41K) FreeBSD Status Report
>> January-March, 2011
>>
>> http://www.freebsd.org/news/status/report-2011-01-2011-03.html
>>
>> Which states that ZFSv28 is "now available in CURRENT", which we've
>> known for months:
>>
>> http://www.freebsd.org/news/status/report-2011-01-2011-03.html#ZFSv28-available-in-FreeBSD-9-CURRENT
>>
>>
>> But again, no progress report, so nobody except those who follow
>> HEAD/CURRENT know what the progress is.  And that progress has not been
>> relayed to any of the non-HEAD/CURRENT lists.
>>
>> I'm a total hard-ass about this stuff, and have been for years, because
>> it all boils down to communication (or lack there-of).  It seems very
>> hasty to say "Yeah! MFC this!" when we (folks who only follow STABLE)
>> have absolutely no idea if what's in CURRENT is actually broken in some
>> way or if there are outstanding problems -- and if there are, what those
>> are so users can be aware of them in advance.
>>
> 
> Hello,
> 
> Here's a summary of my recent end-user work with ZFS on -current. I
> recently was lucky enough to purchase 2 NAS systems, which consist of 2
> cheap new PCs loaded with 6 HD, one is a simple gpt boot device 1x 1tb
> and 5x 2tb data drives. The mobo has 6 sata connectors but I needed to
> purchase an additional PCI-E sata adapter since the DVD also uses a sata
> port. The system has 4gb memory and a new inexpensive quad core AMD CPU.
> 
> I've been running it (recent -current) for a couple of weeks with heavy
> single-user use. 2.5tb/7.1tb.
> 
> The only problem I found, was that deleting a file-backed log device
> from a degraded pool would immediately panic the system. I'm not running
> stock -current so I didn't report it.
> 
> Resilvering seems absurdly slow, but since I won't be doing it much also
> didn't care. My NAS is side by side redundant, so if resilvering takes
> more than 2 days I would just replicate off of my other NAS.
> 
> Throughput without a log device was in the range of 30mb/sec (3% of my
> 1gb interface). Adding a file-backed log device on a UFS partition that
> is used for boot, resulted in a 10x jump, saturating the SATA bus that I
> was sending data from over the network. It spiked up to 30% of interface
> throughput/max bus speed for disk, and did not vary much. This resolved
> the issues I saw that a lot of other people have posted about on the
> internet, about very spiky data transfers. I first used a 40mb/sec
> throughput USB device as the log device, which showed a dramatic
> smoothness in data transfer, but still had ~15 seconds where no data
> would xfer, while it was flushed from USB to disk. After researching I
> discovered that I could use a file backed log device and this fixed all
> the problems about spiky data transfers.
> 
> Before that I had tuned the sysctl's as the poor out of the box settings
> were giving me very slow speeds (in the range of 1% network throughput,
> before log device). I played around with the vfs.zfs tunables but found
> that I did not need to after I added the log device, and the out of the
> box settings for that sysctl tree were just fine.
> 
> I had first set this up before CAM was added to -current as default, and
> did not use labels. Due to troubleshooting some unrelated disk issues, I
> ended up switching to CAM without problems, and subsequently labeled the
> disks (recreated the zpool after the labeling). I am now using CAM and
> AHCI without any issues.
> 
> Here are some personal notes about the tunables I set, I am sure they
> are not all helpful. I didn't add them one by one, I simply mass changed
> them and saw a positive result. Also noted are the commands I used and
> current system status.
> 
> sysctl -w net.inet.tcp.sendspace=373760
> sysctl -w net.inet.tcp.recvspace=373760
> sysctl -w net.local.stream.sendspace=82320
> sysctl -w net.local.stream.recvspace=82320
> sysctl -w vfs.zfs.prefetch_disable=1
> sysctl -w net.local.stream.recvspace=373760
> sysctl -w net.local.stream.sendspace=373760
> sysctl -w net.local.inflight=1
> sysctl -w net.inet.tcp.ecn.enable=1
> sysctl -w net.inet.flowtable.enable=0
> sysctl -w net.raw.recvspace=373760
> sysctl -w net.raw.sendspace=373760
> sysctl -w net.inet.tcp.local_slowstart_flightsize=10
> sysctl -a net.inet.tcp.delayed_ack=0
> sysctl -w kern.maxvnodes=600000
> sysctl -w net.local.dgram.recvspace=8192
> sysctl -w net.local.dgram.maxdgram=8192
> sysctl -w net.inet.tcp.slowstart_flightsize=10
> sysctl -w net.inet.tcp.path_mtu_discovery=0
> 
> <root.wheel@zfs-slave> [/var/preserve/root] # glabel label g_ada0 /dev/ada0
> <root.wheel@zfs-slave> [/var/preserve/root] # glabel label g_ada1 /dev/ada1
> <root.wheel@zfs-slave> [/var/preserve/root] # glabel label g_ada3 /dev/ada3
> <root.wheel@zfs-slave> [/var/preserve/root] # glabel label g_ada4 /dev/ada4
> <root.wheel@zfs-slave> [/var/preserve/root] # glabel label g_ada5 /dev/ada5
> 
> Labels so that later I will be able to more easily identify disks. My
> mobo has a single ata bus slave port for SATA. That disk would
> "disappear" from the box. Moving the drive to a master sata port
> resolved the issue (? very odd).
> 
> gnop create -S 4096 /dev/label/g_ada0
> mkdir /var/preserve/zfs
> dd if=/dev/zero of=/var/preserve/zfs/log_device bs=1m count=5000
>  zpool create -f tank raidz /dev/label/g_ada0.nop /dev/label/g_ada1
> /dev/label/g_ada3 /dev/label/g_ada4 /dev/label/g_ada5 log
> /var/preserve/zfs/log_device
> 
> The 4 above lines are to set the alignment to 4kb, to create a file
> backed log device, and create the pool.
> 
> zfs set atime=off tank
> 
> I decided not to use dedup, because my files don't have a lot of dup.
> They're mostly large media files, ISOs etc.
> 
> <root.wheel@zfs-slave> [/var/preserve/root] # zpool status
>   pool: tank
>  state: ONLINE
>  scan: none requested
> config:
> 
>         NAME                            STATE     READ WRITE CKSUM
>         tank                            ONLINE       0     0     0
>           raidz1-0                      ONLINE       0     0     0
>             label/g_ada0                ONLINE       0     0     0
>             label/g_ada1                ONLINE       0     0     0
>             label/g_ada3                ONLINE       0     0     0
>             label/g_ada4                ONLINE       0     0     0
>             label/g_ada5                ONLINE       0     0     0
>         logs
>           /var/preserve/zfs/log_device  ONLINE       0     0     0
> 
> errors: No known data errors
> <root.wheel@zfs-slave> [/var/preserve/root] #
> 
> <root.wheel@zfs-slave> [/var/preserve/root] # df
> Filesystem          Size    Used   Avail Capacity  Mounted on
> /dev/gpt/pyros-a    9.7G    3.3G    5.6G    37%    /
> /dev/gpt/pyros-c    884G    6.1G    808G     1%    /var
> tank                7.1T    2.5T    4.6T    35%    /tank
> <root.wheel@zfs-slave> [/var/preserve/root] #
> 
> 
> ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
> ada0: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device
> ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)
> ada0: Command Queueing enabled
> ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
> ada1 at ahcich2 bus 0 scbus3 target 0 lun 0
> ada1: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device
> ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> ada1: Command Queueing enabled
> ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
> ada2 at ahcich3 bus 0 scbus4 target 0 lun 0
> ada2: <ST31000520AS CC32> ATA-8 SATA 2.x device
> ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> ada2: Command Queueing enabled
> ada2: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
> ada3 at ahcich4 bus 0 scbus5 target 0 lun 0
> ada3: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device
> ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> ada3: Command Queueing enabled
> ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
> ada4 at ahcich5 bus 0 scbus6 target 0 lun 0
> ada4: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device
> ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> ada4: Command Queueing enabled
> ada4: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
> ada5 at ata1 bus 0 scbus8 target 0 lun 0
> ada5: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device
> ada5: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes)
> ada5: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
> 
> CPU: AMD Phenom(tm) II X4 920 Processor (2800.19-MHz K8-class CPU)
> ...
> real memory  = 4294967296 (4096 MB)
> avail memory = 3840598016 (3662 MB)
> 
> ZFS filesystem version 5
> ZFS storage pool version 28
> 
> 
> Best practices:
> 
> Tune the sysctls related to buffer sizes / queue depth.
> Label your disks before you build the zpool.
> Use gnop to 4kb align the disks. Only one disk in the pool needs this
> before you create it.
> Use CAM.
> *** USE A LOG DEVICE! ***
> 
> -Pierre
> 
> 
> 
> 
> 
> 
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

From owner-freebsd-fs@FreeBSD.ORG  Sun May  1 01:13:10 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BA0F9106564A
	for <fs@freebsd.org>; Sun,  1 May 2011 01:13:10 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
	[131.104.91.44])
	by mx1.freebsd.org (Postfix) with ESMTP id 6E8C18FC0A
	for <fs@freebsd.org>; Sun,  1 May 2011 01:13:10 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApwEAGmwvE2DaFvO/2dsb2JhbACEUaI+iHGrLo9qhH+BAQSOeY4+
X-IronPort-AV: E=Sophos;i="4.64,295,1301889600"; d="scan'208";a="120040066"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 30 Apr 2011 21:01:08 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 81376B3F6F;
	Sat, 30 Apr 2011 21:01:08 -0400 (EDT)
Date: Sat, 30 Apr 2011 21:01:08 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Kostik Belousov <kostikbel@gmail.com>
Message-ID: <149943048.820546.1304211668413.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20110430223412.GS48734@deviant.kiev.zoral.com.ua>
MIME-Version: 1.0
Content-Type: multipart/mixed; 
	boundary="----=_Part_820545_1204398810.1304211668411"
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: rmacklem@freebsd.org, fs@freebsd.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 May 2011 01:13:10 -0000

------=_Part_820545_1204398810.1304211668411
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

> I just netbooted fresh GENERIC (with irrelevant local patch) over the
> pxe, and got the following:
> 
> # df -h
> Filesystem Size Used Avail Capacity Mounted on
> 192.168.102.110:/usr/home/kostik/build/bsd/DEV/netboot/x -267G 130G
> -539G -32% /
> 
> On the server side, it is up-to-date stable/8 with oldnfs server,
> export is
> /dev/ada1p2 1.8T 129G 1.5T 8% /usr/home
> 
> Do we have some long-typed var lurking in new nfs client code,
> instead of off_t ? I am almost sure this is nfs problem, since I
> booted
> i386 in the same setup month ago, and did not had the compaints from
> sendmail about low space on spool (which is why I noted this issue
> now).
> 
> amd64 kernel (with nfscl loaded as module) correctly reports
> 192.168.102.110:/usr/home/kostik/build/bsd/DEV/netboot/x 1.8T 129G
> 1.5T 8% /
Oops, I never noticed that the "struct statfs" fields had been bumped
to 64bits. I've attached a patch for the client. Could you please test
it? (I'll look in case the server has a similar problem.)

Thanks for reporting it, rick

------=_Part_820545_1204398810.1304211668411
Content-Type: text/x-patch; name=statfs.patch
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename=statfs.patch

LS0tIGZzL25mc2NsaWVudC9uZnNfY2xwb3J0LmMuc2F2CTIwMTEtMDQtMzAgMjA6MTY6MzkuMDAw
MDAwMDAwIC0wNDAwCisrKyBmcy9uZnNjbGllbnQvbmZzX2NscG9ydC5jCTIwMTEtMDQtMzAgMjA6
NDU6MTYuMDAwMDAwMDAwIC0wNDAwCkBAIC0zOSw2ICszOSw3IEBAIF9fRkJTRElEKCIkRnJlZUJT
RDogaGVhZC9zeXMvZnMvbmZzY2xpZW4KICAqIGJlIHRoZSBlYXNpZXN0IHdheSB0byBoYW5kbGUg
dGhlIHBvcnQuCiAgKi8KICNpbmNsdWRlIDxzeXMvaGFzaC5oPgorI2luY2x1ZGUgPHN5cy9saW1p
dHMuaD4KICNpbmNsdWRlIDxmcy9uZnMvbmZzcG9ydC5oPgogI2luY2x1ZGUgPG5ldGluZXQvaWZf
ZXRoZXIuaD4KICNpbmNsdWRlIDxuZXQvaWZfdHlwZXMuaD4KQEAgLTgzOCwyMCArODM5LDE0IEBA
IHZvaWQKIG5mc2NsX2xvYWRzYmluZm8oc3RydWN0IG5mc21vdW50ICpubXAsIHN0cnVjdCBuZnNz
dGF0ZnMgKnNmcCwgdm9pZCAqc3RhdGZzKQogewogCXN0cnVjdCBzdGF0ZnMgKnNicCA9IChzdHJ1
Y3Qgc3RhdGZzICopc3RhdGZzOwotCW5mc3F1YWRfdCB0cXVhZDsKIAogCWlmIChubXAtPm5tX2Zs
YWcgJiAoTkZTTU5UX05GU1YzIHwgTkZTTU5UX05GU1Y0KSkgewogCQlzYnAtPmZfYnNpemUgPSBO
RlNfRkFCTEtTSVpFOwotCQl0cXVhZC5xdmFsID0gc2ZwLT5zZl90Ynl0ZXM7Ci0JCXNicC0+Zl9i
bG9ja3MgPSAobG9uZykodHF1YWQucXZhbCAvICgodV9xdWFkX3QpTkZTX0ZBQkxLU0laRSkpOwot
CQl0cXVhZC5xdmFsID0gc2ZwLT5zZl9mYnl0ZXM7Ci0JCXNicC0+Zl9iZnJlZSA9IChsb25nKSh0
cXVhZC5xdmFsIC8gKCh1X3F1YWRfdClORlNfRkFCTEtTSVpFKSk7Ci0JCXRxdWFkLnF2YWwgPSBz
ZnAtPnNmX2FieXRlczsKLQkJc2JwLT5mX2JhdmFpbCA9IChsb25nKSh0cXVhZC5xdmFsIC8gKCh1
X3F1YWRfdClORlNfRkFCTEtTSVpFKSk7Ci0JCXRxdWFkLnF2YWwgPSBzZnAtPnNmX3RmaWxlczsK
LQkJc2JwLT5mX2ZpbGVzID0gKHRxdWFkLmx2YWxbMF0gJiAweDdmZmZmZmZmKTsKLQkJdHF1YWQu
cXZhbCA9IHNmcC0+c2ZfZmZpbGVzOwotCQlzYnAtPmZfZmZyZWUgPSAodHF1YWQubHZhbFswXSAm
IDB4N2ZmZmZmZmYpOworCQlzYnAtPmZfYmxvY2tzID0gc2ZwLT5zZl90Ynl0ZXMgLyBORlNfRkFC
TEtTSVpFOworCQlzYnAtPmZfYmZyZWUgPSBzZnAtPnNmX2ZieXRlcyAvIE5GU19GQUJMS1NJWkU7
CisJCXNicC0+Zl9iYXZhaWwgPSBzZnAtPnNmX2FieXRlcyAvIE5GU19GQUJMS1NJWkU7CisJCXNi
cC0+Zl9maWxlcyA9IHNmcC0+c2ZfdGZpbGVzOworCQlzYnAtPmZfZmZyZWUgPSAoc2ZwLT5zZl9m
ZmlsZXMgJiBPRkZfTUFYKTsKIAl9IGVsc2UgaWYgKChubXAtPm5tX2ZsYWcgJiBORlNNTlRfTkZT
VjQpID09IDApIHsKIAkJc2JwLT5mX2JzaXplID0gKGludDMyX3Qpc2ZwLT5zZl9ic2l6ZTsKIAkJ
c2JwLT5mX2Jsb2NrcyA9IChpbnQzMl90KXNmcC0+c2ZfYmxvY2tzOwo=
------=_Part_820545_1204398810.1304211668411--

From owner-freebsd-fs@FreeBSD.ORG  Sun May  1 02:05:08 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 551EE106566B;
	Sun,  1 May 2011 02:05:08 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id A151B8FC12;
	Sun,  1 May 2011 02:05:07 +0000 (UTC)
Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua
	[10.1.1.148])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p4124KCG074095
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sun, 1 May 2011 05:04:20 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id
	p4124HdV023240; Sun, 1 May 2011 05:04:17 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p4124HvV023239; 
	Sun, 1 May 2011 05:04:17 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Sun, 1 May 2011 05:04:17 +0300
From: Kostik Belousov <kostikbel@gmail.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Message-ID: <20110501020417.GW48734@deviant.kiev.zoral.com.ua>
References: <20110430223412.GS48734@deviant.kiev.zoral.com.ua>
	<149943048.820546.1304211668413.JavaMail.root@erie.cs.uoguelph.ca>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="drH2ShMMYOwzF+Tg"
Content-Disposition: inline
In-Reply-To: <149943048.820546.1304211668413.JavaMail.root@erie.cs.uoguelph.ca>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-3.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,
	DNS_FROM_OPENWHOIS autolearn=no version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: rmacklem@freebsd.org, fs@freebsd.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 May 2011 02:05:08 -0000


--drH2ShMMYOwzF+Tg
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, Apr 30, 2011 at 09:01:08PM -0400, Rick Macklem wrote:
> > I just netbooted fresh GENERIC (with irrelevant local patch) over the
> > pxe, and got the following:
> >=20
> > # df -h
> > Filesystem Size Used Avail Capacity Mounted on
> > 192.168.102.110:/usr/home/kostik/build/bsd/DEV/netboot/x -267G 130G
> > -539G -32% /
> >=20
> > On the server side, it is up-to-date stable/8 with oldnfs server,
> > export is
> > /dev/ada1p2 1.8T 129G 1.5T 8% /usr/home
> >=20
> > Do we have some long-typed var lurking in new nfs client code,
> > instead of off_t ? I am almost sure this is nfs problem, since I
> > booted
> > i386 in the same setup month ago, and did not had the compaints from
> > sendmail about low space on spool (which is why I noted this issue
> > now).
> >=20
> > amd64 kernel (with nfscl loaded as module) correctly reports
> > 192.168.102.110:/usr/home/kostik/build/bsd/DEV/netboot/x 1.8T 129G
> > 1.5T 8% /
> Oops, I never noticed that the "struct statfs" fields had been bumped
> to 64bits. I've attached a patch for the client. Could you please test
> it? (I'll look in case the server has a similar problem.)
>=20
> Thanks for reporting it, rick

Thank you for the quick fixed. Patch is fine.

192.168.102.110:/usr/home/kostik/build/bsd/DEV/netboot/x    1.7T    130G   =
 1.5T     8%    /

--drH2ShMMYOwzF+Tg
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iEYEARECAAYFAk28v6EACgkQC3+MBN1Mb4irHACg86T0P5gwk/djC8ZcZK4NsF0p
lFsAoKLODsjse8CT/Lh2pCJwtMqGekbL
=6PO1
-----END PGP SIGNATURE-----

--drH2ShMMYOwzF+Tg--

From owner-freebsd-fs@FreeBSD.ORG  Sun May  1 03:00:43 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9A7D9106566C
	for <freebsd-fs@freebsd.org>; Sun,  1 May 2011 03:00:43 +0000 (UTC)
	(envelope-from des@des.no)
Received: from smtp.des.no (smtp.des.no [194.63.250.102])
	by mx1.freebsd.org (Postfix) with ESMTP id 5757B8FC08
	for <freebsd-fs@freebsd.org>; Sun,  1 May 2011 03:00:43 +0000 (UTC)
Received: from ds4.des.no (des.no [84.49.246.2])
	by smtp.des.no (Postfix) with ESMTP id 91F7C1FFC35;
	Sun,  1 May 2011 03:00:41 +0000 (UTC)
Received: by ds4.des.no (Postfix, from userid 1001)
	id A08DC84495; Sun,  1 May 2011 05:00:38 +0200 (CEST)
From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To: Rick Macklem <rmacklem@uoguelph.ca>
References: <1591833752.819973.1304207152724.JavaMail.root@erie.cs.uoguelph.ca>
Date: Sun, 01 May 2011 05:00:38 +0200
In-Reply-To: <1591833752.819973.1304207152724.JavaMail.root@erie.cs.uoguelph.ca>
	(Rick Macklem's message of "Sat, 30 Apr 2011 19:45:52 -0400 (EDT)")
Message-ID: <8662pvf7l5.fsf@ds4.des.no>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org
Subject: Re: RFC: make the experimental NFS subsystem the default one
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 May 2011 03:00:43 -0000

Rick Macklem <rmacklem@uoguelph.ca> writes:
> "Dag-Erling Sm=C3=B8rgrav" <des@des.no> writes:
>> case "`mount -d -a -t nfs 2> /dev/null`" in
>> *mount_nfs*)
>> # Handle absent nfs client support
>> load_kld -m nfs nfsclient || return 1
>> ;;
>> esac
> Yep, I spotted that, but haven`t had a chance to reproduce it and test
> a fix yet. My first attempt at fixing it will be to change the line to:

The simplest fix is to add a mount_oldnfs case to the switch so the
script knows the old NFS stack is already loaded.

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no

From owner-freebsd-fs@FreeBSD.ORG  Sun May  1 11:36:48 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B648D1065670;
	Sun,  1 May 2011 11:36:48 +0000 (UTC)
	(envelope-from alexander@leidinger.net)
Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de
	[217.11.53.44])
	by mx1.freebsd.org (Postfix) with ESMTP id 714298FC18;
	Sun,  1 May 2011 11:36:48 +0000 (UTC)
Received: from outgoing.leidinger.net (p5B1559A4.dip.t-dialin.net
	[91.21.89.164])
	by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id BED3D84400D;
	Sun,  1 May 2011 13:36:33 +0200 (CEST)
Received: from unknown (IO.Leidinger.net [192.168.2.110])
	by outgoing.leidinger.net (Postfix) with ESMTP id 5A5F5119D;
	Sun,  1 May 2011 13:36:30 +0200 (CEST)
Date: Sun, 1 May 2011 13:36:27 +0200
From: Alexander Leidinger <Alexander@Leidinger.net>
To: "Emil Smolenski" <am@raisa.eu.org>
Message-ID: <20110501133627.00006616@unknown>
In-Reply-To: <op.vn2iid1qk84lxj@arrow>
References: <op.vn2iid1qk84lxj@arrow>
X-Mailer: Claws Mail 3.7.8cvs47 (GTK+ 2.16.6; i586-pc-mingw32msvc)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-EBL-MailScanner-Information: Please contact the ISP for more information
X-EBL-MailScanner-ID: BED3D84400D.A25D0
X-EBL-MailScanner: Found to be clean
X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN,
	SpamAssassin (not cached, score=-0.923, required 6,
	autolearn=disabled, ALL_TRUSTED -1.00, TW_ZD 0.08)
X-EBL-MailScanner-From: alexander@leidinger.net
X-EBL-MailScanner-Watermark: 1304854595.26307@WuxNArPge6heNDqm83txWA
X-EBL-Spam-Status: No
Cc: freebsd-fs@freebsd.org, dfr@FreeBSD.org, jhb@FreeBSD.org
Subject: Re: [ZFS] Booting from zpool created on 4k-sector drive
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 May 2011 11:36:48 -0000

On Tue, 21 Dec 2010 15:29:01 +0100 "Emil Smolenski" <am@raisa.eu.org>
wrote:

> Hello,
> 
> There is a hack to force zpool creation with minimum sector size
> equal to 4k:
> 
> # gnop create -S 4096 ${DEV0}
> # zpool create tank ${DEV0}.nop
> # zpool export tank
> # gnop destroy ${DEV0}.nop
> # zpool import tank
> 
> Zpool created this way is much faster on problematic 4k sector
> drives which lies about its sector size (like WD EARS). This hack
> works perfectly fine when system is running. Gnop layer is created
> only for "zpool create" command -- ZFS stores information about
> sector size in its metadata. After zpool creation one can export the
> pool, remove gnop layer and reimport the pool. Difference can be seen
> in the output from the zdb command:
> 
> - on 512 sector device (2**9 = 512):
> % zdb tank |grep ashift
> ashift=9
> 
> - on 4096 sector device (2**12 = 4096):
> % zdb tank |grep ashift
> ashift=12
> 
> This change is permanent. The only possibility to change the value
> of ashift is: zpool destroy/create and restoring pool from backup.
> 
> But there is one problem: I cannot boot from such pool. Error message:
> 
> ZFS: i/o error - all block copies unavailable
> ZFS: can't read MOS
> ZFS: unexpected object set type 0

FYI: I can boot successfully from a ZFS v28 pool which was created like
this in a GPT partition (tested with 9-current).

Bye,
Alexander.

-- 
http://www.Leidinger.net    Alexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org       netchild @ FreeBSD.org  : PGP ID = 72077137

From owner-freebsd-fs@FreeBSD.ORG  Sun May  1 13:15:17 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id ADB87106564A
	for <fs@freebsd.org>; Sun,  1 May 2011 13:15:17 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from fallbackmx07.syd.optusnet.com.au
	(fallbackmx07.syd.optusnet.com.au [211.29.132.9])
	by mx1.freebsd.org (Postfix) with ESMTP id 358A28FC13
	for <fs@freebsd.org>; Sun,  1 May 2011 13:15:16 +0000 (UTC)
Received: from mail08.syd.optusnet.com.au (mail08.syd.optusnet.com.au
	[211.29.132.189])
	by fallbackmx07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	p41C4vUM003116 for <fs@freebsd.org>; Sun, 1 May 2011 22:04:57 +1000
Received: from c122-106-155-58.carlnfd1.nsw.optusnet.com.au
	(c122-106-155-58.carlnfd1.nsw.optusnet.com.au [122.106.155.58])
	by mail08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	p41C4q2e020107
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sun, 1 May 2011 22:04:53 +1000
Date: Sun, 1 May 2011 22:04:52 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Rick Macklem <rmacklem@uoguelph.ca>
In-Reply-To: <149943048.820546.1304211668413.JavaMail.root@erie.cs.uoguelph.ca>
Message-ID: <20110501184904.S975@besplex.bde.org>
References: <149943048.820546.1304211668413.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: rmacklem@freebsd.org, fs@freebsd.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 May 2011 13:15:17 -0000

On Sat, 30 Apr 2011, Rick Macklem wrote:

> Oops, I never noticed that the "struct statfs" fields had been bumped
> to 64bits. I've attached a patch for the client. Could you please test
> it? (I'll look in case the server has a similar problem.)

Sigh, bugs in this area are very old and still present.

% --- fs/nfsclient/nfs_clport.c.sav	2011-04-30 20:16:39.000000000 -0400
% +++ fs/nfsclient/nfs_clport.c	2011-04-30 20:45:16.000000000 -0400
% @@ -39,6 +39,7 @@ __FBSDID("$FreeBSD: head/sys/fs/nfsclien
%   * be the easiest way to handle the port.
%   */
%  #include <sys/hash.h>
% +#include <sys/limits.h>

Only needed to implement a bug.

%  #include <fs/nfs/nfsport.h>
%  #include <netinet/if_ether.h>
%  #include <net/if_types.h>
% @@ -838,20 +839,14 @@ void
%  nfscl_loadsbinfo(struct nfsmount *nmp, struct nfsstatfs *sfp, void *statfs)
%  {
%  	struct statfs *sbp = (struct statfs *)statfs;
% -	nfsquad_t tquad;
% 
%  	if (nmp->nm_flag & (NFSMNT_NFSV3 | NFSMNT_NFSV4)) {
%  		sbp->f_bsize = NFS_FABLKSIZE;
% -		tquad.qval = sfp->sf_tbytes;
% -		sbp->f_blocks = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE));
% -		tquad.qval = sfp->sf_fbytes;
% -		sbp->f_bfree = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE));
% -		tquad.qval = sfp->sf_abytes;
% -		sbp->f_bavail = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE));
% -		tquad.qval = sfp->sf_tfiles;
% -		sbp->f_files = (tquad.lval[0] & 0x7fffffff);
% -		tquad.qval = sfp->sf_ffiles;
% -		sbp->f_ffree = (tquad.lval[0] & 0x7fffffff);

This mail is too short to describe all the bugs on the above.  The old nfs
client still has the following ones:

- bogus variable tquad
- bogus and broken masking for f_files by 0x7fffffff.  v3 can pass us a
   count >= 2**31.  The bogus masking breaks such counts.  When f_files
   was only long, we had to do something for values larger than LONG_MAX.
   We should have clamped to LONG_MAX.  (See cvtstatfs() which does this
   now for the corresponding problem for ostatfs().)  Instead, we bogusly
   cast.  0x7fffffff is just a misspelling of LONG_MAX which happens to
   be correct for 32-bit 2's complement longs.  That combined with the
   server also being limited to 32 bits is the one case where the cast
   works as intended, and even then it is quite broken -- it just loses
   the top bit of values between 2**31 and 2**32-1.  Perhaps the protocol
   prohibits such values, but at least FreeBSD servers take null care not
   to send them -- see below.
- bogus and even more broken masking for f_ffree.  This was broken even
   when f_ffree was long and long was 32 bits.  Then the mask just
   destroys the sign bit, which a non-broken server will have passed us as
   the top bit in a 32-bit unsigned value.

% +		sbp->f_blocks = sfp->sf_tbytes / NFS_FABLKSIZE;
% +		sbp->f_bfree = sfp->sf_fbytes / NFS_FABLKSIZE;
% +		sbp->f_bavail = sfp->sf_abytes / NFS_FABLKSIZE;

The conversion for f_bavail still has sign extension bugs.  f_bavail
can be negative on the server.  A non-broken (FreeBSD) server passes
us this negative value as a uint64_t value with the top bit set.  It
will be >= 2**63 as an unsigned value and dividing by NFS_FABSLBKSIZE =
2**9 makes it between 2**54 and 2**55-1; all trace of its signeness is
lost, exccept we know that garbage values in the range 2**54 to 2**55-1
mean this overflow error.

The old nfs client still has this bug.

Old versions of the old nfs client had broken scaling which paniced
trying to use the values near 2**54 given by the above overflow bugs.
See statfs_scale_blocks() for non-broken scaling for the corresponding
problem for ostatfs().

Someone broke the old FreeBSD nfs server to work around the broken FreeBSD
nfs client.  It remains broken :-(.  This bug is missing in the new nfs
server -- it just passes the server's f_bavail :-), without paying quite
enough attention to the sign bit.

There is of cause a portability problem.  We need to export negative
values from FreeBSD servers to FreeBSD clients without breaking other
combinations.  The combination of the new nfs server with really old
old nfs clients is broken :-).  NetBSD's nfsclient has explicit code
to try to handle this problem.  I couldn't see how it could work --
negative values must be passed in some way, and there is nothing better
than passing them as (large) positive values mod 2**64.  IIRC, NetBSD
changes the values but this cannot work since it loses info.  Hmm,
there are 3 fields to use (f_blocks, f_bfree and f_bavail).  These
provide some redundancy, but neither NetBSD's code nor anything that
I could think of worked to recover ffs's negative avail counts from
nonnegative values in these fields, and frobbing these fields would
be unportable anyway.  Both the nfs protocol and POSIX's statvfs() (?)
API and types seem to be incapable of handing ffs's negative f_bavail
counts (POSIX only has unsigned block counts...).

% +		sbp->f_files = sfp->sf_tfiles;

Now correct, almost.  As for the other fields, it tacitly assumes that
the type of the lvalue is larger than the type of the lvalue.  Both
happen to be 64 bits here.  For the signed fields, there assumption
is strictly incorrect, since the lvalue is 64 bit signed while the
rvalue is 64 bits unsigned.  By "not paying quite enough attention to
the sign bit" in the above, I mean that it is tacitly assumed that
we can start with a 64-bit signed value, convert it to u_quad_t
(another type error -- should not assume that u_quad_t is uint64_t),
pass it through the nfs protocol, convert back to int64_t (or forget
to convert back, as above), and recover the original value.  This
deserves special care since it abuses the protocol.

% +		sbp->f_ffree = (sfp->sf_ffiles & OFF_MAX);

Any masking here is logically wrong, and in practice just destroys the
sign bit, as described above for the 0x7fffffff mask with old 32 bit
systems.  Masking with OFF_MAX has additional logic errors.  OFF_MAX
is the maximum value for an off_t, but none of the types here has
anything to do with off_t.

%  	} else if ((nmp->nm_flag & NFSMNT_NFSV4) == 0) {
%  		sbp->f_bsize = (int32_t)sfp->sf_bsize;
%  		sbp->f_blocks = (int32_t)sfp->sf_blocks;

I think this is just the v2 case.  The old nfs client uses essentially
the same bogus casts.  No casts should be used (clamping should be
used), but if we use casts it may be possible to use non-bogus ones.
I think these are just no casts for the unsigned fields but int32_t
for the signed ones.  The v2 protocol is limited to 32 bits, and we
can easily represent any 32-bit value since we have 64-bit fields for
the lvalues.  We just need to be careful with the sign bit (in the
31th bit of an unsigned value in the sfp fields), but can keep the
31th bit as an unsigned bit without problems now that the statfs fields
are 64 bits.  Casting for the unsigned fields now just breaks the value
unnecessarily if the protocol manages to pass the 31th bit as a value
bit for such fields.

Servers should pay even more attention to unrepresentable bits than to
sign bis, but pay considerably less.  Both the old and the new nfs server
blindly truncate f_bfree, etc., to 32 bits in the v2 case (except the old
nfs server corrupts negative f_bavail to 0).  (For v3, they tacitly assume
that no truncation occurs on conversion to 64 bits.)

The old nfs server also gratuitously breaks the file counts (f_files
etc.) for the v3 case.  It should use txdr_hyper(), but uses exdr_unsigned()
plus extra code to lose 32 bits.  This is fixed in NetBSD and in the
new nfs server.

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Sun May  1 13:38:14 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0A5611065672;
	Sun,  1 May 2011 13:38:14 +0000 (UTC)
	(envelope-from pawel@dawidek.net)
Received: from mail.garage.freebsd.pl (60.wheelsystems.com [83.12.187.60])
	by mx1.freebsd.org (Postfix) with ESMTP id AB1328FC08;
	Sun,  1 May 2011 13:38:13 +0000 (UTC)
Received: by mail.garage.freebsd.pl (Postfix, from userid 65534)
	id 2DAA145CD9; Sun,  1 May 2011 15:38:12 +0200 (CEST)
Received: from localhost (89-73-195-149.dynamic.chello.pl [89.73.195.149])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.garage.freebsd.pl (Postfix) with ESMTP id A02CD4569A;
	Sun,  1 May 2011 15:38:06 +0200 (CEST)
Date: Sun, 1 May 2011 15:37:52 +0200
From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
To: Alexander Leidinger <Alexander@Leidinger.net>
Message-ID: <20110501133752.GC3245@garage.freebsd.pl>
References: <4DBBB20A.5050102@FreeBSD.org>
	<20110430072831.GA65598@icarus.home.lan>
	<20110501000656.00007ea1@unknown>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="DKU6Jbt7q3WqK7+M"
Content-Disposition: inline
In-Reply-To: <20110501000656.00007ea1@unknown>
X-OS: FreeBSD 9.0-CURRENT amd64
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on 
	mail.garage.freebsd.pl
X-Spam-Level: 
X-Spam-Status: No, score=-0.6 required=4.5 tests=BAYES_00,RCVD_IN_SORBS_DUL 
	autolearn=no version=3.0.4
Cc: freebsd-fs@freebsd.org, Alexander Motin <mav@FreeBSD.org>
Subject: Re: TRIM clustering
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 May 2011 13:38:14 -0000


--DKU6Jbt7q3WqK7+M
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, May 01, 2011 at 12:06:56AM +0200, Alexander Leidinger wrote:
> On Sat, 30 Apr 2011 00:28:31 -0700 Jeremy Chadwick
> <freebsd@jdc.parodius.com> wrote:
>=20
> > On Sat, Apr 30, 2011 at 09:54:02AM +0300, Alexander Motin wrote:
>=20
> > Other notes: TRIM needs to be supported on swap as well, and in my
> > opinion this is just as important as it being in UFS.  I'm not sure
> > how one would implement that.
>=20
> This brings up the question if a ZFS cache (where the contents do not
> survive a reboot) is completely TRIMmed before used (and normally
> trimmed during use)...

It is not trimmed at all.

--=20
Pawel Jakub Dawidek                       http://www.wheelsystems.com
FreeBSD committer                         http://www.FreeBSD.org
Am I Evil? Yes, I Am!                     http://yomoli.com

--DKU6Jbt7q3WqK7+M
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (FreeBSD)

iEYEARECAAYFAk29Yi8ACgkQForvXbEpPzSdbgCfZWoNZPhqrb4cIvpQM2hXuSGP
ib4An0i7263o4rbpc1BS9OkH8cQo6XXS
=8eY2
-----END PGP SIGNATURE-----

--DKU6Jbt7q3WqK7+M--

From owner-freebsd-fs@FreeBSD.ORG  Sun May  1 14:37:20 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 24711106566B;
	Sun,  1 May 2011 14:37:20 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
	[131.104.91.44])
	by mx1.freebsd.org (Postfix) with ESMTP id 9EC008FC0C;
	Sun,  1 May 2011 14:37:19 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApwEAKhvvU2DaFvO/2dsb2JhbACEUaJBiHGpbo9dgSqDVYEBBI55jj4
X-IronPort-AV: E=Sophos;i="4.64,297,1301889600"; d="scan'208";a="120066265"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 01 May 2011 10:37:18 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 7E351B3F25;
	Sun,  1 May 2011 10:37:18 -0400 (EDT)
Date: Sun, 1 May 2011 10:37:18 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <506337690.827521.1304260638431.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20110501184904.S975@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: rmacklem@freebsd.org, fs@freebsd.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 May 2011 14:37:20 -0000

> 
> % + sbp->f_blocks = sfp->sf_tbytes / NFS_FABLKSIZE;
> % + sbp->f_bfree = sfp->sf_fbytes / NFS_FABLKSIZE;
> % + sbp->f_bavail = sfp->sf_abytes / NFS_FABLKSIZE;
> 
> The conversion for f_bavail still has sign extension bugs. f_bavail
> can be negative on the server. A non-broken (FreeBSD) server passes
> us this negative value as a uint64_t value with the top bit set.

Well, both RFC1813 (NFSv3) and RFC3530 (NFSv4) specify the value on
the wire (sf_abytes) as uint64_t. Therefore a negative value can't
be represented safely and non-FreeBSD clients/servers would be
confused by cheating and putting the negative value on the wire.
(I see you mention this further down.)
The new server is broken in that it does not
check for a negative value. It seems that the best approach for the
server would be to send a 0 when f_bavail < 0. What else can you do
without "cheating" and representing the value in a way that would be
non-interoperable with non-BSD NFS clients?

I agree the above is broken for the case where the high order bit
of sf_abytes is set. How about the following code?

  sbp->f_bavail = (sfp->f_abytes & OFF_MAX) / NFS_FABLKSIZE;

(Yea, I see later in the message that you don't think
 OFF_MAX is the appropriate
 way to represent the largest positive value that can be stored
 in int64_t. As you'll see below, I don't know the correct way to
 express this constant and would be happy to hear how to do it?
 See below for more on this.)

> 
> Someone broke the old FreeBSD nfs server to work around the broken
> FreeBSD
> nfs client. It remains broken :-(. This bug is missing in the new nfs
> server -- it just passes the server's f_bavail :-), without paying
> quite
> enough attention to the sign bit.
> 
> There is of cause a portability problem. We need to export negative
> values from FreeBSD servers to FreeBSD clients without breaking other
> combinations. The combination of the new nfs server with really old
> old nfs clients is broken :-). NetBSD's nfsclient has explicit code
> to try to handle this problem. I couldn't see how it could work --
> negative values must be passed in some way, and there is nothing
> better
> than passing them as (large) positive values mod 2**64. IIRC, NetBSD
> changes the values but this cannot work since it loses info. Hmm,
> there are 3 fields to use (f_blocks, f_bfree and f_bavail). These
> provide some redundancy, but neither NetBSD's code nor anything that
> I could think of worked to recover ffs's negative avail counts from
> nonnegative values in these fields, and frobbing these fields would
> be unportable anyway. Both the nfs protocol and POSIX's statvfs() (?)
> API and types seem to be incapable of handing ffs's negative f_bavail
> counts (POSIX only has unsigned block counts...).
> 

Well, as I noted above, all I think can be done is have the server reply
0 for the case where f_bavail is negative. (If the specs don't support
negative values, that's all there is to it, I think?)

> % + sbp->f_files = sfp->sf_tfiles;
> 
> Now correct, almost. As for the other fields, it tacitly assumes that
> the type of the lvalue is larger than the type of the lvalue. Both
> happen to be 64 bits here. For the signed fields, there assumption
> is strictly incorrect, since the lvalue is 64 bit signed while the
> rvalue is 64 bits unsigned. By "not paying quite enough attention to
> the sign bit" in the above, I mean that it is tacitly assumed that
> we can start with a 64-bit signed value, convert it to u_quad_t
> (another type error -- should not assume that u_quad_t is uint64_t),
> pass it through the nfs protocol, convert back to int64_t (or forget
> to convert back, as above), and recover the original value. This
> deserves special care since it abuses the protocol.
> 
Well, there are a LOT of places where the code uses u_quad_t to represent
what is now uint64_t. What can I say. I wrote this code over about 10years
(based on even much older code) and, being an old K&R C guy assumed that
u_quad_t was the way to declare an unsigned 64bit value. If/when u_quad_t
doesn't define an unsigned 64bit value like uint64_t does, I'll need a
lot of warning, because I have a LOT of editting to do.

Other than that, the RFCs specify sf_tfiles as uint64_t and
"struct statfs" has f_files as a uint64_t. So, unless there are plans to
make it signed on FreeBSD, I don't see a problem here?

> % + sbp->f_ffree = (sfp->sf_ffiles & OFF_MAX);
> 
> Any masking here is logically wrong, and in practice just destroys the
> sign bit, as described above for the 0x7fffffff mask with old 32 bit
> systems. Masking with OFF_MAX has additional logic errors. OFF_MAX
> is the maximum value for an off_t, but none of the types here has
> anything to do with off_t.
> 

Ok, sf_ffiles is defined as uint64_t on the wire. Therefore there is
no sign bit. The problem is that it could be a larger positive value
than FreeBSD supports. All I wanted this code to do is make it the
largest positive value that will fit in int64_t. (I used OFF_MAX
because you suggested in a previous email that that was preferable
to 0x7fffffffffffffffLLU for nm_maxfilesize. I don't see anything
like INT64_MAX, UINT64_MAX in FreeBSD's limits.h)
Would

   if (sfp->sf_ffiles > UINT64_MAX)
       sbp->f_ffree = INT64_MAX;
   else
       sbp->f_ffree = sfp->sf_ffiles;

- except there isn't a UINT64_MAX, INT64_MAX defined in sys/*.h as
  far as I can see. How do I express these constants? Do I have to
  convert 0x7ffffffffffffff to decimal and use that?

> % } else if ((nmp->nm_flag & NFSMNT_NFSV4) == 0) {
> % sbp->f_bsize = (int32_t)sfp->sf_bsize;
> % sbp->f_blocks = (int32_t)sfp->sf_blocks;
> 
> I think this is just the v2 case. The old nfs client uses essentially
> the same bogus casts. No casts should be used (clamping should be
> used), but if we use casts it may be possible to use non-bogus ones.
> I think these are just no casts for the unsigned fields but int32_t
> for the signed ones. The v2 protocol is limited to 32 bits, and we
> can easily represent any 32-bit value since we have 64-bit fields for
> the lvalues. We just need to be careful with the sign bit (in the
> 31th bit of an unsigned value in the sfp fields), but can keep the
> 31th bit as an unsigned bit without problems now that the statfs
> fields
> are 64 bits. Casting for the unsigned fields now just breaks the value
> unnecessarily if the protocol manages to pass the 31th bit as a value
> bit for such fields.
> 
Ok, I could take the casts off. I think the effect would be that, for the
case where sf_bavail has its high order (bit 31) set, it will be seen as
a larger positive value. (sf_bavail is u_int32_t) This would be correct
per the RFCs, since RFC1094 defines the fields as uint32_t. Now, if
servers were "cheating" and putting the negative values in the field on
the wire, it will change the semantics a bit.

I'll admit I tend to feel that the safest thing is to just leave it
the way it is, since no one is complaining about the semantics and I'd
rather not "break" anything by fixing the semantics to agree with thr RFC.

> Servers should pay even more attention to unrepresentable bits than to
> sign bis, but pay considerably less. Both the old and the new nfs
> server
> blindly truncate f_bfree, etc., to 32 bits in the v2 case (except the
> old
> nfs server corrupts negative f_bavail to 0).

As above, I have to disagree with this. If the RFCs say it can't be
negative, then sending negative values as 0 is all that can be done,
as far as I can see. (I think the old server got this case correct
and the new server needs to be fixed.)

> (For v3, they tacitly
> assume
> that no truncation occurs on conversion to 64 bits.)
> 
> The old nfs server also gratuitously breaks the file counts (f_files
> etc.) for the v3 case. It should use txdr_hyper(), but uses
> exdr_unsigned()
> plus extra code to lose 32 bits. This is fixed in NetBSD and in the
> new nfs server.
> 
At this point, I like to leave the old server unchanged. That way, if
anyone runs into problems w.r.t. differences in semantics (even if they
seem to violate the RFC), they can switch to the old server while things
get sorted out.

rick

From owner-freebsd-fs@FreeBSD.ORG  Sun May  1 15:17:44 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8F2FE1065674;
	Sun,  1 May 2011 15:17:44 +0000 (UTC)
	(envelope-from bfriesen@simple.dallas.tx.us)
Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74])
	by mx1.freebsd.org (Postfix) with ESMTP id 5755D8FC14;
	Sun,  1 May 2011 15:17:44 +0000 (UTC)
Received: from freddy.simplesystems.org (freddy.simplesystems.org
	[65.66.246.65])
	by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id
	p41FHgSU024474; Sun, 1 May 2011 10:17:42 -0500 (CDT)
Date: Sun, 1 May 2011 10:17:42 -0500 (CDT)
From: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
X-X-Sender: bfriesen@freddy.simplesystems.org
To: Alexander Motin <mav@freebsd.org>
In-Reply-To: <4DBBE985.9000701@FreeBSD.org>
Message-ID: <alpine.GSO.2.01.1105011008340.20825@freddy.simplesystems.org>
References: <20110430072831.GA65598@icarus.home.lan>
	<mailpost.1304156092.7598560.10429.mailing.freebsd.fs@FreeBSD.cs.nctu.edu.tw>
	<4DBBE985.9000701@FreeBSD.org>
User-Agent: Alpine 2.01 (GSO 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2
	(blade.simplesystems.org [65.66.246.90]);
	Sun, 01 May 2011 10:17:42 -0500 (CDT)
Cc: freebsd-fs@freebsd.org
Subject: Re: TRIM clustering
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 May 2011 15:17:44 -0000

On Sat, 30 Apr 2011, Alexander Motin wrote:
>>
>> well not all devices take it as a hit.. The suggestion of some sort of
>> clustering is a good one but it should be tunable.
>
> I believe any device should benefit from receiving single 128K request
> instead of 8*16k. Just because of command processing overhead. Am I wrong?

Since I have not seen it mentioned in this discussion thread yet, it 
is worth pointing out that if TRIM has already been issued for a block 
that the filesystem can not re-use that space for storage until the 
TRIM request is completed.  Otherwise in-use blocks might get TRIMmed, 
resulting in filesystem destruction.

If the system should spontaneously reboot, then there may be a 
mismatch between the filesystem's notion of free blocks and the FLASH 
device's notion of free blocks.  In fact, if the kernel panics, the 
device may continue trimming blocks after the system is gone (because 
power is still on).

Bob
-- 
Bob Friesenhahn
bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

From owner-freebsd-fs@FreeBSD.ORG  Sun May  1 15:22:44 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 08EAB1065670;
	Sun,  1 May 2011 15:22:44 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
	[131.104.91.36])
	by mx1.freebsd.org (Postfix) with ESMTP id ABA558FC0C;
	Sun,  1 May 2011 15:22:43 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApwEADZ6vU2DaFvO/2dsb2JhbACEUaJDiHGpaY9dgSqDVYEBBI55jj4
X-IronPort-AV: E=Sophos;i="4.64,298,1301889600"; d="scan'208";a="119234064"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-annu-pri.mail.uoguelph.ca with ESMTP; 01 May 2011 11:22:42 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id BC6B3B3F3B;
	Sun,  1 May 2011 11:22:42 -0400 (EDT)
Date: Sun, 1 May 2011 11:22:42 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <63264466.828351.1304263362674.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <506337690.827521.1304260638431.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: rmacklem@freebsd.org, fs@freebsd.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 May 2011 15:22:44 -0000

> Would
> 
> if (sfp->sf_ffiles > UINT64_MAX)
> sbp->f_ffree = INT64_MAX;
> else
> sbp->f_ffree = sfp->sf_ffiles;
> 

Oops, I shouldn't have called this UINT64_MAX. What I meant was the
same value as INT64_MAX, but of uint64_t. Something like:
   (uint64_t)INT64_MAX
OR
   0x7fffffffffffffff

rick

From owner-freebsd-fs@FreeBSD.ORG  Sun May  1 15:48:12 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 974EB1065670
	for <fs@freebsd.org>; Sun,  1 May 2011 15:48:12 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta11.emeryville.ca.mail.comcast.net
	(qmta11.emeryville.ca.mail.comcast.net [76.96.27.211])
	by mx1.freebsd.org (Postfix) with ESMTP id 7DD388FC17
	for <fs@freebsd.org>; Sun,  1 May 2011 15:48:12 +0000 (UTC)
Received: from omta21.emeryville.ca.mail.comcast.net ([76.96.30.88])
	by qmta11.emeryville.ca.mail.comcast.net with comcast
	id eFTZ1g0011u4NiLABFb11p; Sun, 01 May 2011 15:35:01 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta21.emeryville.ca.mail.comcast.net with comcast
	id eFb01g00Z1t3BNj8hFb1l1; Sun, 01 May 2011 15:35:01 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 84BFE9B418; Sun,  1 May 2011 08:35:00 -0700 (PDT)
Date: Sun, 1 May 2011 08:35:00 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Message-ID: <20110501153500.GA99593@icarus.home.lan>
References: <20110501184904.S975@besplex.bde.org>
	<506337690.827521.1304260638431.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <506337690.827521.1304260638431.JavaMail.root@erie.cs.uoguelph.ca>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: rmacklem@freebsd.org, fs@freebsd.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 May 2011 15:48:12 -0000

[snip]
On Sun, May 01, 2011 at 10:37:18AM -0400, Rick Macklem wrote:
> > % + sbp->f_ffree = (sfp->sf_ffiles & OFF_MAX);
> > 
> > Any masking here is logically wrong, and in practice just destroys the
> > sign bit, as described above for the 0x7fffffff mask with old 32 bit
> > systems. Masking with OFF_MAX has additional logic errors. OFF_MAX
> > is the maximum value for an off_t, but none of the types here has
> > anything to do with off_t.
> > 
> 
> Ok, sf_ffiles is defined as uint64_t on the wire. Therefore there is
> no sign bit. The problem is that it could be a larger positive value
> than FreeBSD supports. All I wanted this code to do is make it the
> largest positive value that will fit in int64_t. (I used OFF_MAX
> because you suggested in a previous email that that was preferable
> to 0x7fffffffffffffffLLU for nm_maxfilesize. I don't see anything
> like INT64_MAX, UINT64_MAX in FreeBSD's limits.h)
> Would
> 
>    if (sfp->sf_ffiles > UINT64_MAX)
>        sbp->f_ffree = INT64_MAX;
>    else
>        sbp->f_ffree = sfp->sf_ffiles;
> 
> - except there isn't a UINT64_MAX, INT64_MAX defined in sys/*.h as
>   far as I can see. How do I express these constants? Do I have to
>   convert 0x7ffffffffffffff to decimal and use that?

Aren't these effectively defined in <sys/limits.h> as UQUAD_MAX and
QUAD_MAX?  These get translated/pulled in from <machine/_limits.h>,
which varies per architecture.  This looks like the translation based on
looking at the respective include files per arch:

i386: UQUAD_MAX == __UQUAD_MAX == __ULLONG_MAX == 0xffffffffffffffffULL
i386: QUAD_MAX  == __QUAD_MAX  == __LLONG_MAX  == 0x7fffffffffffffffLL

amd64: UQUAD_MAX == __UQUAD_MAX == __ULONG_MAX  == 0xffffffffffffffffUL
amd64: QUAD_MAX  == __QUAD_MAX  == __LONG_MAX   == 0x7fffffffffffffffL

There are some #ifdef's in <sys/limits.h> around some of these
declarations which I don't understand (like __BSD_VISIBLE), but I would
imagine the above declarations would do what you want.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Sun May  1 15:55:40 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DCDC8106566B;
	Sun,  1 May 2011 15:55:40 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
	[131.104.91.44])
	by mx1.freebsd.org (Postfix) with ESMTP id 836E88FC0A;
	Sun,  1 May 2011 15:55:40 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApwEAEWBvU2DaFvO/2dsb2JhbACEUaJDskiPXYEqg1WBAQSOeY4+
X-IronPort-AV: E=Sophos;i="4.64,298,1301889600"; d="scan'208";a="120069861"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 01 May 2011 11:55:39 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id AA54AB3F3B;
	Sun,  1 May 2011 11:55:39 -0400 (EDT)
Date: Sun, 1 May 2011 11:55:39 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
Message-ID: <1298790394.829218.1304265339603.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20110501153500.GA99593@icarus.home.lan>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: rmacklem@freebsd.org, fs@freebsd.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 May 2011 15:55:40 -0000

> >
> > - except there isn't a UINT64_MAX, INT64_MAX defined in sys/*.h as
> >   far as I can see. How do I express these constants? Do I have to
> >   convert 0x7ffffffffffffff to decimal and use that?
> 
> Aren't these effectively defined in <sys/limits.h> as UQUAD_MAX and
> QUAD_MAX? These get translated/pulled in from <machine/_limits.h>,
> which varies per architecture. This looks like the translation based
> on
> looking at the respective include files per arch:
> 
> i386: UQUAD_MAX == __UQUAD_MAX == __ULLONG_MAX ==
> 0xffffffffffffffffULL
> i386: QUAD_MAX == __QUAD_MAX == __LLONG_MAX == 0x7fffffffffffffffLL
> 
> amd64: UQUAD_MAX == __UQUAD_MAX == __ULONG_MAX == 0xffffffffffffffffUL
> amd64: QUAD_MAX == __QUAD_MAX == __LONG_MAX == 0x7fffffffffffffffL
> 
> There are some #ifdef's in <sys/limits.h> around some of these
> declarations which I don't understand (like __BSD_VISIBLE), but I
> would
> imagine the above declarations would do what you want.
> 
Yep. And as far as I can see, OFF_MAX is defined exactly the same way
for all arches. The only difference is the comments:
  /* max value for a quad_t */
vs
  /* max value for an off_t */

The post seemed to indicate that OFF_MAX wasn't the correct type and,
later in it, that u_quad_t (the comment would presumably also apply
to quad_t?) shouldn't be assumed the same as uint64_t.

I'm happy to use anything that works, so if QUAD_MAX is preferable to
OFF_MAX, I'll happily use it, rick

From owner-freebsd-fs@FreeBSD.ORG  Sun May  1 16:23:55 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 740D5106566C;
	Sun,  1 May 2011 16:23:55 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au
	[211.29.132.185])
	by mx1.freebsd.org (Postfix) with ESMTP id 102658FC0A;
	Sun,  1 May 2011 16:23:54 +0000 (UTC)
Received: from c122-106-155-58.carlnfd1.nsw.optusnet.com.au
	(c122-106-155-58.carlnfd1.nsw.optusnet.com.au [122.106.155.58])
	by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	p41GNojl001848
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Mon, 2 May 2011 02:23:52 +1000
Date: Mon, 2 May 2011 02:23:50 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
In-Reply-To: <20110501153500.GA99593@icarus.home.lan>
Message-ID: <20110502015700.Q2013@besplex.bde.org>
References: <20110501184904.S975@besplex.bde.org>
	<506337690.827521.1304260638431.JavaMail.root@erie.cs.uoguelph.ca>
	<20110501153500.GA99593@icarus.home.lan>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: rmacklem@freebsd.org, fs@freebsd.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 May 2011 16:23:55 -0000

On Sun, 1 May 2011, Jeremy Chadwick wrote:

> [snip]
> On Sun, May 01, 2011 at 10:37:18AM -0400, Rick Macklem wrote:
>>> % + sbp->f_ffree = (sfp->sf_ffiles & OFF_MAX);
>>>
>>> Any masking here is logically wrong, and in practice just destroys the
>>> sign bit, as described above for the 0x7fffffff mask with old 32 bit

This got a bit tangled.  I will reply more to the older reply.

>> - except there isn't a UINT64_MAX, INT64_MAX defined in sys/*.h as
>>   far as I can see. How do I express these constants? Do I have to
>>   convert 0x7ffffffffffffff to decimal and use that?

UINT64_MAX, etc., are defined in <sys/stdint.h>, which doesn't even need
to be included explicitly, since it is (bogusly) standard namespace
pollution in <sys/systm.h>.  This namespace pollution gives the bizarre
situation that you have to include <sys/limits.h> to get the limits for
basic types, but you get the limits for the fix-with types whether you
want them or not, except in rare cases where <sys/systm.h> is not needed
for other reasons.

> Aren't these effectively defined in <sys/limits.h> as UQUAD_MAX and
> QUAD_MAX?  These get translated/pulled in from <machine/_limits.h>,
> which varies per architecture.  This looks like the translation based on
> looking at the respective include files per arch:

No.  UQUAD_MAX and QUAD_MAX are historical mistakes.  quad_t should be
4 times as large as a machine register, but for compatibility it must
be precisely 64 bits.  This makes it just an obfuscation in new code
(code newer than 199 when int64_t became Standard with C99).

> i386: UQUAD_MAX == __UQUAD_MAX == __ULLONG_MAX == 0xffffffffffffffffULL
> i386: QUAD_MAX  == __QUAD_MAX  == __LLONG_MAX  == 0x7fffffffffffffffLL
>
> amd64: UQUAD_MAX == __UQUAD_MAX == __ULONG_MAX  == 0xffffffffffffffffUL
> amd64: QUAD_MAX  == __QUAD_MAX  == __LONG_MAX   == 0x7fffffffffffffffL
>
> There are some #ifdef's in <sys/limits.h> around some of these
> declarations which I don't understand (like __BSD_VISIBLE), but I would
> imagine the above declarations would do what you want.

These are just ways of spelling 2**64-1 and 2**63-1.  For all fixed-with
types, macros for the limits aren't really needed, since they are
almost machine-dependent so you can almost write them as hex constants
even more easily than you can remember where their macros are defined.
But there are subleties for their types.  These are visible in the
above definitions.  On amd64, their basic types are unsigned long and
long, respectively, while on i386 their types are unsigned long long
and long long. respectively.  Also, type suffixes on the hex constants
may be necessary for technical and bogonial reasons.  Type suffixes
should not be needed for unsigned constants, but header files must use
them even then to prevent warnings from cc -std89 for literal constants
larger than ULONG_MAX (gcc warns about this because C90 doesn;t support
integer types larger than unsigned long).  Type suffixes are needed for
signed constants like QUAD_MAX to make the constant have a signed type
instead of the default of unsigned int (for constants larger than UINT_MAX).
Most code using the constants doesn't care about these subtleties, so it
can use its own literal constant.

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Sun May  1 16:32:39 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 257B51065679
	for <freebsd-fs@freebsd.org>; Sun,  1 May 2011 16:32:39 +0000 (UTC)
	(envelope-from bfriesen@simple.dallas.tx.us)
Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74])
	by mx1.freebsd.org (Postfix) with ESMTP id DEB198FC17
	for <freebsd-fs@freebsd.org>; Sun,  1 May 2011 16:32:38 +0000 (UTC)
Received: from freddy.simplesystems.org (freddy.simplesystems.org
	[65.66.246.65])
	by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id
	p41GWXeu024710; Sun, 1 May 2011 11:32:33 -0500 (CDT)
Date: Sun, 1 May 2011 11:32:33 -0500 (CDT)
From: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
X-X-Sender: bfriesen@freddy.simplesystems.org
To: Rick Macklem <rmacklem@uoguelph.ca>
In-Reply-To: <640208384.682241.1303948694525.JavaMail.root@erie.cs.uoguelph.ca>
Message-ID: <alpine.GSO.2.01.1105011124260.20825@freddy.simplesystems.org>
References: <640208384.682241.1303948694525.JavaMail.root@erie.cs.uoguelph.ca>
User-Agent: Alpine 2.01 (GSO 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2
	(blade.simplesystems.org [65.66.246.90]);
	Sun, 01 May 2011 11:32:34 -0500 (CDT)
Cc: freebsd-fs@freebsd.org
Subject: Re: make the experimental NFS subsystem the default one
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 May 2011 16:32:39 -0000

On Wed, 27 Apr 2011, Rick Macklem wrote:
>
> I don't know anything about ZFS, but I would think that, if you see a
> major performance improvement, that ZFS isn't committing stuff to logs
> so that data won't be lost.
>
> Maybe the ZFS folks can comment? (I don't remember seeing the details
> of what you change? If you sent a patch, sorry, but I've misplaced it.)

Zfs will loose as much as 5 seconds worth of data (and maybe even 10 
seconds) if the data is written slowly and/or the server has quite a 
lot of RAM.  It commits data in order so the written data will be 
completely coherent for that snapshot in time, but the result may 
still be completely corrupted from the client's perspective.  5 (or 
10!) seconds of data could be quite a lot of data, and could represent 
entire new directory trees, or large directory trees which were 
removed.  Individual file content could be overwritten hundreds of 
times before the point where the server arbitrarily decides to commit 
it.

If the server bounces, its data won't match what the client thinks it 
should have.

Bob
-- 
Bob Friesenhahn
bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

From owner-freebsd-fs@FreeBSD.ORG  Sun May  1 16:44:53 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D8B3C1065675;
	Sun,  1 May 2011 16:44:53 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
	[131.104.91.36])
	by mx1.freebsd.org (Postfix) with ESMTP id 7E62F8FC12;
	Sun,  1 May 2011 16:44:53 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApwEAPiMvU2DaFvO/2dsb2JhbACEUaJCsmGPVoEqg1WBAQSOeY4+
X-IronPort-AV: E=Sophos;i="4.64,298,1301889600"; d="scan'208";a="119237914"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-annu-pri.mail.uoguelph.ca with ESMTP; 01 May 2011 12:44:52 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id AE643B4163;
	Sun,  1 May 2011 12:44:52 -0400 (EDT)
Date: Sun, 1 May 2011 12:44:52 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <1211771823.830180.1304268292625.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20110502015700.Q2013@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: rmacklem@freebsd.org, fs@freebsd.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 May 2011 16:44:53 -0000

> 
> >> - except there isn't a UINT64_MAX, INT64_MAX defined in sys/*.h as
> >>   far as I can see. How do I express these constants? Do I have to
> >>   convert 0x7ffffffffffffff to decimal and use that?
> 
> UINT64_MAX, etc., are defined in <sys/stdint.h>, which doesn't even
> need
> to be included explicitly, since it is (bogusly) standard namespace
> pollution in <sys/systm.h>. This namespace pollution gives the bizarre
> situation that you have to include <sys/limits.h> to get the limits
> for
> basic types, but you get the limits for the fix-with types whether you
> want them or not, except in rare cases where <sys/systm.h> is not
> needed
> for other reasons.
> 
Ok, now I see them (in machine/include/_stdint.h). Appologies for the
noise. I grep'd sys/sys and couldn't find anything called (U)INT64_MAX.

Now, remembering that sf_abytes is uint64_t per the RFCs, what do people
think of either of these?

  if (sfp->sf_abytes > INT64_MAX)
      sbp->f_bavail = INT64_MAX;
  else
      sbp->f_bavail = sfp->sf_abytes / NFS_FABLKSIZE;

Or should I try and do the division to see if the large
value in sf_abytes will fit in INT64_MAX after the division? Something
like:
  int64_t tmp;

  tmp = sfp->sf_abytes;
  tmp /= NFS_FABLKSIZE;
  if (tmp < 0)
     sbp->f_bavail = INT64_MAX;
  else
     sbp->f_bavail = tmp;

Neither tested, of course, rick

From owner-freebsd-fs@FreeBSD.ORG  Sun May  1 17:39:13 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 264F0106566B;
	Sun,  1 May 2011 17:39:13 +0000 (UTC)
	(envelope-from bfriesen@simple.dallas.tx.us)
Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74])
	by mx1.freebsd.org (Postfix) with ESMTP id D96A98FC17;
	Sun,  1 May 2011 17:39:12 +0000 (UTC)
Received: from freddy.simplesystems.org (freddy.simplesystems.org
	[65.66.246.65])
	by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id
	p41HQqSu024922; Sun, 1 May 2011 12:26:52 -0500 (CDT)
Date: Sun, 1 May 2011 12:26:52 -0500 (CDT)
From: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
X-X-Sender: bfriesen@freddy.simplesystems.org
To: Rick Macklem <rmacklem@uoguelph.ca>
In-Reply-To: <1211771823.830180.1304268292625.JavaMail.root@erie.cs.uoguelph.ca>
Message-ID: <alpine.GSO.2.01.1105011215510.20825@freddy.simplesystems.org>
References: <1211771823.830180.1304268292625.JavaMail.root@erie.cs.uoguelph.ca>
User-Agent: Alpine 2.01 (GSO 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2
	(blade.simplesystems.org [65.66.246.90]);
	Sun, 01 May 2011 12:26:52 -0500 (CDT)
Cc: rmacklem@freebsd.org, fs@freebsd.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 May 2011 17:39:13 -0000

On Sun, 1 May 2011, Rick Macklem wrote:
>
> Or should I try and do the division to see if the large
> value in sf_abytes will fit in INT64_MAX after the division? Something
> like:
>  int64_t tmp;
>
>  tmp = sfp->sf_abytes;
>  tmp /= NFS_FABLKSIZE;
>  if (tmp < 0)
>     sbp->f_bavail = INT64_MAX;
>  else
>     sbp->f_bavail = tmp;

That one seems better because it preserves more of the value, but 
perhaps this is better because it does not depend on 
undocumented/undefined behavior (also untested):

   uint64_t tmp;
   tmp = sfp->sf_abytes / NFS_FABLKSIZE;
   if (tmp > (uint64_t) INT64_MAX)
     sbp->f_bavail = INT64_MAX;
   else
     sbp->f_bavail = tmp;

Bob
-- 
Bob Friesenhahn
bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

From owner-freebsd-fs@FreeBSD.ORG  Sun May  1 17:52:51 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 59C87106564A;
	Sun,  1 May 2011 17:52:51 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au
	[211.29.132.185])
	by mx1.freebsd.org (Postfix) with ESMTP id D65188FC17;
	Sun,  1 May 2011 17:52:50 +0000 (UTC)
Received: from c122-106-155-58.carlnfd1.nsw.optusnet.com.au
	(c122-106-155-58.carlnfd1.nsw.optusnet.com.au [122.106.155.58])
	by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	p41Hqle7007758
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Mon, 2 May 2011 03:52:48 +1000
Date: Mon, 2 May 2011 03:52:47 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Rick Macklem <rmacklem@uoguelph.ca>
In-Reply-To: <506337690.827521.1304260638431.JavaMail.root@erie.cs.uoguelph.ca>
Message-ID: <20110502022441.H2013@besplex.bde.org>
References: <506337690.827521.1304260638431.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: rmacklem@freebsd.org, fs@freebsd.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 May 2011 17:52:51 -0000

On Sun, 1 May 2011, Rick Macklem wrote:

>>
>> % + sbp->f_blocks = sfp->sf_tbytes / NFS_FABLKSIZE;
>> % + sbp->f_bfree = sfp->sf_fbytes / NFS_FABLKSIZE;
>> % + sbp->f_bavail = sfp->sf_abytes / NFS_FABLKSIZE;
>>
>> The conversion for f_bavail still has sign extension bugs. f_bavail
>> can be negative on the server. A non-broken (FreeBSD) server passes
>> us this negative value as a uint64_t value with the top bit set.
>
> Well, both RFC1813 (NFSv3) and RFC3530 (NFSv4) specify the value on
> the wire (sf_abytes) as uint64_t. Therefore a negative value can't
> be represented safely and non-FreeBSD clients/servers would be
> confused by cheating and putting the negative value on the wire.
> (I see you mention this further down.)

But it can be represented.  FreeBSD servers always put it on the wire
(if the server file system has a negative value) until the old nfs
server broke it.  I can only find a few FreeBSD clients that aren't
confused by this:
- most or all clients work for the v2 case, because v2 doesn't need
   to scale for f_bavail, and copying the 31st (unsigned) bit to
   the 31st (signed) bit mostly works (except everthing breaks once
   the absolute values exceed 2**31-1 or 2**32.1).
- most clients are broken for the v3 case.  For negative f_bavail,
   sign-extension/overflow bugs in the scaling give a value of about
   2**54.  Assigning this to a 32-bit f_bavail gives unobvious garbage;
   assigning this to a 64-bit f_bavail gives obvious garbage.
- my FreeBSD-~5.2 v3 client handles negative f_bavail correctly (by
   scaling a signed value).  It doesn't fix f_ffree.  (I see negative
   f_bavail quite often but never run into the reserve for f_ffree.)

BTW, how does scaling of block counts by NFS_FABLKSIZE in the v3 (and
v4?) cases work?  I can only see it in clients.  Servers seem to start
with block counts and never convert to byte counts.

> The new server is broken in that it does not
> check for a negative value. It seems that the best approach for the
> server would be to send a 0 when f_bavail < 0. What else can you do

Hrmph.  It is servers that check and send a 0 when f_bavail < 0 that
are broken.

> without "cheating" and representing the value in a way that would be
> non-interoperable with non-BSD NFS clients?

I don't know.  See the NetBSD client for some ideas.  Note that for blocks
there are 2 "free" fields, f_bfree and f_bavail, while for files there is
only 1 (f_ffree).  You would think that the redundancy for blocks would
allow passing a negative value as the difference of 2 nonnegative ones,
but I couldn't make this work.  For FreeBSD clients that can handle this,
is it possible to negotiate the handling with the server?

> I agree the above is broken for the case where the high order bit
> of sf_abytes is set. How about the following code?
>
>  sbp->f_bavail = (sfp->f_abytes & OFF_MAX) / NFS_FABLKSIZE;

Doesn't work at all.  The byte count is typically a small negative
value, say -512.  This should be scaled to -1.  But
sbp->f_bavail = (uint64_t)-512 = 0xfffffffffffffe00.  Discarding just
1 top bit from this makes little difference to it.  No amount of
discarding top bits works correctly,  The value must be negated:

 	sbp->f_bavail = (int64_t)sbp->f_bavail / NFS_FABLKSIZE;

See the scaling function in vfs_syscalls.c for a worse method.  (1
technical difference: it wants to handle all fields using the same
unsigned max count, so it uses the negative of f_bavail and needs a
little more code for this.  1 unportability: it scales all the
fields by right shifting, but right shifting of negative values is
not guaranteed to handle the sign buit the same as division by a
power of 2.)


> (Yea, I see later in the message that you don't think
> OFF_MAX is the appropriate
> way to represent the largest positive value that can be stored
> in int64_t. As you'll see below, I don't know the correct way to
> express this constant and would be happy to hear how to do it?
> See below for more on this.)
>
> ...
>
> Other than that, the RFCs specify sf_tfiles as uint64_t and
> "struct statfs" has f_files as a uint64_t. So, unless there are plans to
> make it signed on FreeBSD, I don't see a problem here?

The problem is for f_bavail and f_ffree in statfs.  These are
intentionally signed to support ffs putting negative values in
them.  I think the protocol specifies uint64_t for sf_abytes and
sf_ffiles, so there is a minor theoretical problem even if negative
values aren't supported by nfs.  (sf_abytes might be 2**64-512.
Perhaps this is actually physically possible using a sparse mapping.
After scaling by NFS_FABLKSIZE, we can represent this value despite
using a signed type, but we have to know that it really is large
unsigned and not negative.  sf_ffiles might be 2**64-1, but this is
physically impossible.)

>> % + sbp->f_ffree = (sfp->sf_ffiles & OFF_MAX);
>>
>> Any masking here is logically wrong, and in practice just destroys the
>> sign bit, as described above for the 0x7fffffff mask with old 32 bit
>> systems. Masking with OFF_MAX has additional logic errors. OFF_MAX
>> is the maximum value for an off_t, but none of the types here has
>> anything to do with off_t.
>
> Ok, sf_ffiles is defined as uint64_t on the wire. Therefore there is
> no sign bit. The problem is that it could be a larger positive value
> than FreeBSD supports. All I wanted this code to do is make it the

Everything is 64 bits, so there is no problem in practice.  The signed
type for f_ffiles gives a problem in theory -- it can only represent
63-bit unsigned value, but the wire has 64.  But more than 2**63-1
files is physically impossible, so there is no problem in practice.

You can either assume this, or write a maze of code to handle various
combinations of type sizes.  I prefer to not write actual code for
this.  A couple of assertions that the sizes are still 64 bits should
be enough.

> largest positive value that will fit in int64_t. (I used OFF_MAX
> because you suggested in a previous email that that was preferable
> to 0x7fffffffffffffffLLU for nm_maxfilesize. I don't see anything
> like INT64_MAX, UINT64_MAX in FreeBSD's limits.h)

These are in <sys/systm.h> via standard namespace pollution -- see another
reply.

> Would
>
>   if (sfp->sf_ffiles > UINT64_MAX)
>       sbp->f_ffree = INT64_MAX;
>   else
>       sbp->f_ffree = sfp->sf_ffiles;

s/ffree/ffiles/, and a few other fixes from a later reply (s/UINT64_MAX/
INT64_MAX).

INT64_MAX is currently the correct limit, but breaks automatically if
someone changes the type of f_ffree.  I have some fancy macros (never
fully implemented in actual code) to determine the limits from the
types (sizeof(sbp->f_ffree) gives the number of bits provided it is
a fixed-width type...).

> - except there isn't a UINT64_MAX, INT64_MAX defined in sys/*.h as
>  far as I can see. How do I express these constants? Do I have to
>  convert 0x7ffffffffffffff to decimal and use that?

Avoid them if possible.  You should only need them if you clamp the
values.

>> % } else if ((nmp->nm_flag & NFSMNT_NFSV4) == 0) {
>> % sbp->f_bsize = (int32_t)sfp->sf_bsize;
>> % sbp->f_blocks = (int32_t)sfp->sf_blocks;
>>
>> I think this is just the v2 case. The old nfs client uses essentially
>> the same bogus casts. No casts should be used (clamping should be
>> used), but if we use casts it may be possible to use non-bogus ones.
>> I think these are just no casts for the unsigned fields but int32_t
>> for the signed ones. The v2 protocol is limited to 32 bits, and we
>> can easily represent any 32-bit value since we have 64-bit fields for
>> the lvalues. We just need to be careful with the sign bit (in the
>> 31th bit of an unsigned value in the sfp fields), but can keep the
>> 31th bit as an unsigned bit without problems now that the statfs
>> fields
>> are 64 bits. Casting for the unsigned fields now just breaks the value
>> unnecessarily if the protocol manages to pass the 31th bit as a value
>> bit for such fields.
>>
> Ok, I could take the casts off. I think the effect would be that, for the
> case where sf_bavail has its high order (bit 31) set, it will be seen as
> a larger positive value. (sf_bavail is u_int32_t) This would be correct
> per the RFCs, since RFC1094 defines the fields as uint32_t. Now, if
> servers were "cheating" and putting the negative values in the field on
> the wire, it will change the semantics a bit.

The v2 case is much closer to hitting the limits, since we can now
easily have a server with >= 2**32 512-blocks and someone might want
to use the v2 protocol for it.  ino_t is still 32 bits so file counts
can't exceed the v2 protocol limits yet, but that will change soon.

> I'll admit I tend to feel that the safest thing is to just leave it
> the way it is, since no one is complaining about the semantics and I'd
> rather not "break" anything by fixing the semantics to agree with thr RFC.
>
>> Servers should pay even more attention to unrepresentable bits than to
>> sign bis, but pay considerably less. Both the old and the new nfs
>> server
>> blindly truncate f_bfree, etc., to 32 bits in the v2 case (except the
>> old
>> nfs server corrupts negative f_bavail to 0).
>
> As above, I have to disagree with this. If the RFCs say it can't be
> negative, then sending negative values as 0 is all that can be done,
> as far as I can see. (I think the old server got this case correct
> and the new server needs to be fixed.)

The corruption also involves sending positive values as 0.  E.g., 4G on
the server becomes 0 on the client after blind truncation to 32 bits.
Clamping to UINT32_MAX or INT32_MAX would reduce problems from this.
Only the server can do the clamping, since the client has no way to tell
whether 0 really means 0.

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Sun May  1 18:12:51 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A18D2106564A;
	Sun,  1 May 2011 18:12:51 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au
	[211.29.132.185])
	by mx1.freebsd.org (Postfix) with ESMTP id 4098B8FC14;
	Sun,  1 May 2011 18:12:50 +0000 (UTC)
Received: from c122-106-155-58.carlnfd1.nsw.optusnet.com.au
	(c122-106-155-58.carlnfd1.nsw.optusnet.com.au [122.106.155.58])
	by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	p41IClBc016305
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Mon, 2 May 2011 04:12:49 +1000
Date: Mon, 2 May 2011 04:12:47 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Rick Macklem <rmacklem@uoguelph.ca>
In-Reply-To: <1211771823.830180.1304268292625.JavaMail.root@erie.cs.uoguelph.ca>
Message-ID: <20110502035720.F2645@besplex.bde.org>
References: <1211771823.830180.1304268292625.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: rmacklem@freebsd.org, fs@freebsd.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 May 2011 18:12:51 -0000

On Sun, 1 May 2011, Rick Macklem wrote:

>> UINT64_MAX, etc., are defined in <sys/stdint.h>, which doesn't even
>> need
>> to be included explicitly, since it is (bogusly) standard namespace
>> pollution in <sys/systm.h>. This namespace pollution gives the bizarre
>> ...

> Ok, now I see them (in machine/include/_stdint.h). Appologies for the
> noise. I grep'd sys/sys and couldn't find anything called (U)INT64_MAX.
>
> Now, remembering that sf_abytes is uint64_t per the RFCs, what do people
> think of either of these?
>
>  if (sfp->sf_abytes > INT64_MAX)
>      sbp->f_bavail = INT64_MAX;
>  else
>      sbp->f_bavail = sfp->sf_abytes / NFS_FABLKSIZE;

You don't need to do anything at runtime, since everything is 64 bits
and f_bavail is a block count while sf_abytes is a byte count.  1 bit
is lost to the sign bit in f_bavail, but 9 bits are gained by scaling
by NFS_FABLKSIZE, leaving 8 bits to spare.

Calculating the limit at runtime would give INT64_MAX / NFS_FABSBLKSIZE,
or perhaps 1 more than that (to round up instead of down).  You might
still want to use an out-of-band limit like INT64_MAX for technical
reasons, but that risks more bugs (for example, anything converting
INT64_MAX / NFS_FABSBLKSIZE + 1 "back" to a byte count would overflow
and anything converting INT64_MAX "back" to a byte count would overflow
even uint64_t.

> Or should I try and do the division to see if the large
> value in sf_abytes will fit in INT64_MAX after the division? Something
> like:

Runtime tests have the advantage of continuing to work if someone changes
the types, provided they are robust, but making them robust is too hard
here.  Robust test's can't simply use INT64_MAX, since INT64_MAX is only
the max if the type is int64_t...

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Sun May  1 20:27:25 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1E1DC106566B;
	Sun,  1 May 2011 20:27:25 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
	[131.104.91.44])
	by mx1.freebsd.org (Postfix) with ESMTP id A28378FC0A;
	Sun,  1 May 2011 20:27:24 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApwEALjBvU2DaFvO/2dsb2JhbACEUaJCiHGpBI9JhH+BAQSOeY4+
X-IronPort-AV: E=Sophos;i="4.64,299,1301889600"; d="scan'208";a="120083406"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 01 May 2011 16:27:23 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 953DCB3FB5;
	Sun,  1 May 2011 16:27:23 -0400 (EDT)
Date: Sun, 1 May 2011 16:27:23 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <733531363.835298.1304281643548.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20110502035720.F2645@besplex.bde.org>
MIME-Version: 1.0
Content-Type: multipart/mixed; 
	boundary="----=_Part_835297_766810430.1304281643545"
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: rmacklem@freebsd.org, kib@freebsd.org, fs@freebsd.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 May 2011 20:27:25 -0000

------=_Part_835297_766810430.1304281643545
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

> On Sun, 1 May 2011, Rick Macklem wrote:
> 
> >> UINT64_MAX, etc., are defined in <sys/stdint.h>, which doesn't even
> >> need
> >> to be included explicitly, since it is (bogusly) standard namespace
> >> pollution in <sys/systm.h>. This namespace pollution gives the
> >> bizarre
> >> ...
> 
> > Ok, now I see them (in machine/include/_stdint.h). Appologies for
> > the
> > noise. I grep'd sys/sys and couldn't find anything called
> > (U)INT64_MAX.
> >
> > Now, remembering that sf_abytes is uint64_t per the RFCs, what do
> > people
> > think of either of these?
> >
> >  if (sfp->sf_abytes > INT64_MAX)
> >      sbp->f_bavail = INT64_MAX;
> >  else
> >      sbp->f_bavail = sfp->sf_abytes / NFS_FABLKSIZE;
> 
> You don't need to do anything at runtime, since everything is 64 bits
> and f_bavail is a block count while sf_abytes is a byte count. 1 bit
> is lost to the sign bit in f_bavail, but 9 bits are gained by scaling
> by NFS_FABLKSIZE, leaving 8 bits to spare.
> 
> Calculating the limit at runtime would give INT64_MAX /
> NFS_FABSBLKSIZE,
> or perhaps 1 more than that (to round up instead of down). You might
> still want to use an out-of-band limit like INT64_MAX for technical
> reasons, but that risks more bugs (for example, anything converting
> INT64_MAX / NFS_FABSBLKSIZE + 1 "back" to a byte count would overflow
> and anything converting INT64_MAX "back" to a byte count would
> overflow
> even uint64_t.
> 
> > Or should I try and do the division to see if the large
> > value in sf_abytes will fit in INT64_MAX after the division?
> > Something
> > like:
> 
> Runtime tests have the advantage of continuing to work if someone
> changes
> the types, provided they are robust, but making them robust is too
> hard
> here. Robust test's can't simply use INT64_MAX, since INT64_MAX is
> only
> the max if the type is int64_t...
> 
Ok, I realized the code in the last post was pretty bogus:-) My only
excuse was that I typed it as I was running out the door...

So, I played with it a bit and the attached patch seems to work for
i386. For the fields that are uint64_t in struct statfs, it just
divides/assigns. For the int64_t field that takes the divided value
(f_bavail) I did the division/assignment to a uint64_t tmp and then
assigned that to f_bavail. (Since any value that fits in uint64_t is
a positive value for int64_t after being divided by 2 or more, it will
always be positive.) For the other int64_t one, I just check for "> INT64_MAX"
and set it to INT64_MAX for that case, so it doesn't go negative.

Anyhow, the updated patch is attached and maybe kib@ can test it?

Thanks for the help with this. I realize I got rather confused during
the discussion, rick

------=_Part_835297_766810430.1304281643545
Content-Type: text/x-patch; name=statfs.patch
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename=statfs.patch

LS0tIGZzL25mc2NsaWVudC9uZnNfY2xwb3J0LmMuc2F2CTIwMTEtMDQtMzAgMjA6MTY6MzkuMDAw
MDAwMDAwIC0wNDAwCisrKyBmcy9uZnNjbGllbnQvbmZzX2NscG9ydC5jCTIwMTEtMDUtMDEgMTY6
MTE6MTguMDAwMDAwMDAwIC0wNDAwCkBAIC04MzgsMjAgKzgzOCwxOSBAQCB2b2lkCiBuZnNjbF9s
b2Fkc2JpbmZvKHN0cnVjdCBuZnNtb3VudCAqbm1wLCBzdHJ1Y3QgbmZzc3RhdGZzICpzZnAsIHZv
aWQgKnN0YXRmcykKIHsKIAlzdHJ1Y3Qgc3RhdGZzICpzYnAgPSAoc3RydWN0IHN0YXRmcyAqKXN0
YXRmczsKLQluZnNxdWFkX3QgdHF1YWQ7CisJdWludDY0X3QgdG1wOwogCiAJaWYgKG5tcC0+bm1f
ZmxhZyAmIChORlNNTlRfTkZTVjMgfCBORlNNTlRfTkZTVjQpKSB7CiAJCXNicC0+Zl9ic2l6ZSA9
IE5GU19GQUJMS1NJWkU7Ci0JCXRxdWFkLnF2YWwgPSBzZnAtPnNmX3RieXRlczsKLQkJc2JwLT5m
X2Jsb2NrcyA9IChsb25nKSh0cXVhZC5xdmFsIC8gKCh1X3F1YWRfdClORlNfRkFCTEtTSVpFKSk7
Ci0JCXRxdWFkLnF2YWwgPSBzZnAtPnNmX2ZieXRlczsKLQkJc2JwLT5mX2JmcmVlID0gKGxvbmcp
KHRxdWFkLnF2YWwgLyAoKHVfcXVhZF90KU5GU19GQUJMS1NJWkUpKTsKLQkJdHF1YWQucXZhbCA9
IHNmcC0+c2ZfYWJ5dGVzOwotCQlzYnAtPmZfYmF2YWlsID0gKGxvbmcpKHRxdWFkLnF2YWwgLyAo
KHVfcXVhZF90KU5GU19GQUJMS1NJWkUpKTsKLQkJdHF1YWQucXZhbCA9IHNmcC0+c2ZfdGZpbGVz
OwotCQlzYnAtPmZfZmlsZXMgPSAodHF1YWQubHZhbFswXSAmIDB4N2ZmZmZmZmYpOwotCQl0cXVh
ZC5xdmFsID0gc2ZwLT5zZl9mZmlsZXM7Ci0JCXNicC0+Zl9mZnJlZSA9ICh0cXVhZC5sdmFsWzBd
ICYgMHg3ZmZmZmZmZik7CisJCXNicC0+Zl9ibG9ja3MgPSBzZnAtPnNmX3RieXRlcyAvIE5GU19G
QUJMS1NJWkU7CisJCXNicC0+Zl9iZnJlZSA9IHNmcC0+c2ZfZmJ5dGVzIC8gTkZTX0ZBQkxLU0la
RTsKKwkJdG1wID0gc2ZwLT5zZl9hYnl0ZXMgLyBORlNfRkFCTEtTSVpFOworCQlzYnAtPmZfYmF2
YWlsID0gdG1wOworCQlzYnAtPmZfZmlsZXMgPSBzZnAtPnNmX3RmaWxlczsKKwkJaWYgKHNmcC0+
c2ZfZmZpbGVzID4gSU5UNjRfTUFYKQorCQkJc2JwLT5mX2ZmcmVlID0gSU5UNjRfTUFYOworCQll
bHNlCisJCQlzYnAtPmZfZmZyZWUgPSBzZnAtPnNmX2ZmaWxlczsKIAl9IGVsc2UgaWYgKChubXAt
Pm5tX2ZsYWcgJiBORlNNTlRfTkZTVjQpID09IDApIHsKIAkJc2JwLT5mX2JzaXplID0gKGludDMy
X3Qpc2ZwLT5zZl9ic2l6ZTsKIAkJc2JwLT5mX2Jsb2NrcyA9IChpbnQzMl90KXNmcC0+c2ZfYmxv
Y2tzOwo=
------=_Part_835297_766810430.1304281643545--

From owner-freebsd-fs@FreeBSD.ORG  Sun May  1 20:43:30 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 54182106564A;
	Sun,  1 May 2011 20:43:30 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
	[131.104.91.44])
	by mx1.freebsd.org (Postfix) with ESMTP id E9DE18FC0A;
	Sun,  1 May 2011 20:43:29 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApwEADjFvU2DaFvO/2dsb2JhbACEUaJDiHGoe49JgSqDVYEBBI55jj4
X-IronPort-AV: E=Sophos;i="4.64,299,1301889600"; d="scan'208";a="120084270"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 01 May 2011 16:43:29 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 3007BB3F36;
	Sun,  1 May 2011 16:43:29 -0400 (EDT)
Date: Sun, 1 May 2011 16:43:29 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <135141673.835577.1304282609097.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20110502022441.H2013@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: rmacklem@freebsd.org, fs@freebsd.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 May 2011 20:43:30 -0000

> On Sun, 1 May 2011, Rick Macklem wrote:
> 
> >>
> >> % + sbp->f_blocks = sfp->sf_tbytes / NFS_FABLKSIZE;
> >> % + sbp->f_bfree = sfp->sf_fbytes / NFS_FABLKSIZE;
> >> % + sbp->f_bavail = sfp->sf_abytes / NFS_FABLKSIZE;
> >>
> >> The conversion for f_bavail still has sign extension bugs. f_bavail
> >> can be negative on the server. A non-broken (FreeBSD) server passes
> >> us this negative value as a uint64_t value with the top bit set.
> >
> > Well, both RFC1813 (NFSv3) and RFC3530 (NFSv4) specify the value on
> > the wire (sf_abytes) as uint64_t. Therefore a negative value can't
> > be represented safely and non-FreeBSD clients/servers would be
> > confused by cheating and putting the negative value on the wire.
> > (I see you mention this further down.)
> 
> But it can be represented. FreeBSD servers always put it on the wire
> (if the server file system has a negative value) until the old nfs
> server broke it. I can only find a few FreeBSD clients that aren't
> confused by this:
> - most or all clients work for the v2 case, because v2 doesn't need
> to scale for f_bavail, and copying the 31st (unsigned) bit to
> the 31st (signed) bit mostly works (except everthing breaks once
> the absolute values exceed 2**31-1 or 2**32.1).
> - most clients are broken for the v3 case. For negative f_bavail,
> sign-extension/overflow bugs in the scaling give a value of about
> 2**54. Assigning this to a 32-bit f_bavail gives unobvious garbage;
> assigning this to a 64-bit f_bavail gives obvious garbage.
> - my FreeBSD-~5.2 v3 client handles negative f_bavail correctly (by
> scaling a signed value). It doesn't fix f_ffree. (I see negative
> f_bavail quite often but never run into the reserve for f_ffree.)
> 
Well my concern isn't w.r.t. FreeBSD clients, but other ones. I'll
start a discussion on freebsd-fs@ about whether a FreeBSD server
should "cheat" and put negative values (which other clients will
think are large positive values) on the wire or try and conform
strictly to the RFC.

> BTW, how does scaling of block counts by NFS_FABLKSIZE in the v3 (and
> v4?) cases work? I can only see it in clients. Servers seem to start
> with block counts and never convert to byte counts.
> 

It must be somewhere, since they are uint64_t byte counts on the wire,
except for NFSv2, which used block counts of the block size provided
in the same response.

> > The new server is broken in that it does not
> > check for a negative value. It seems that the best approach for the
> > server would be to send a 0 when f_bavail < 0. What else can you do
> 
> Hrmph. It is servers that check and send a 0 when f_bavail < 0 that
> are broken.
> 
> > without "cheating" and representing the value in a way that would be
> > non-interoperable with non-BSD NFS clients?
> 
> I don't know. See the NetBSD client for some ideas. Note that for
> blocks
> there are 2 "free" fields, f_bfree and f_bavail, while for files there
> is
> only 1 (f_ffree). You would think that the redundancy for blocks would
> allow passing a negative value as the difference of 2 nonnegative
> ones,
> but I couldn't make this work. For FreeBSD clients that can handle
> this,
> is it possible to negotiate the handling with the server?

Not that I know of. The spec writers got pretty irate when someone
suggested that, for NFSv4, there should be a "vendorId", so clients
could use that to handle things differently. Their outlook was that
everyone should play by the same rules.

I'll try and make my Solaris10 box get to -ve frees and then see what
it puts on the wire. After that, I'll start a discussion on freebsd-fs@
about how they think a FreeBSD server should behave when f_bavail and/or
f_ffree are negative.

rick

From owner-freebsd-fs@FreeBSD.ORG  Sun May  1 20:47:04 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4CA301065672;
	Sun,  1 May 2011 20:47:04 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
	[131.104.91.36])
	by mx1.freebsd.org (Postfix) with ESMTP id E2DFA8FC16;
	Sun,  1 May 2011 20:47:03 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApwEAGTGvU2DaFvO/2dsb2JhbACEUaJDiHGodY9IgSqDVYEBBI55jj4
X-IronPort-AV: E=Sophos;i="4.64,299,1301889600"; d="scan'208";a="119249997"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-annu-pri.mail.uoguelph.ca with ESMTP; 01 May 2011 16:47:02 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id F331CB3F4C;
	Sun,  1 May 2011 16:47:02 -0400 (EDT)
Date: Sun, 1 May 2011 16:47:02 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
Message-ID: <956418604.835643.1304282822974.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <alpine.GSO.2.01.1105011215510.20825@freddy.simplesystems.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: rmacklem@freebsd.org, fs@freebsd.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 May 2011 20:47:04 -0000

> On Sun, 1 May 2011, Rick Macklem wrote:
> >
> > Or should I try and do the division to see if the large
> > value in sf_abytes will fit in INT64_MAX after the division?
> > Something
> > like:
> >  int64_t tmp;
> >
> >  tmp = sfp->sf_abytes;
> >  tmp /= NFS_FABLKSIZE;
> >  if (tmp < 0)
> >     sbp->f_bavail = INT64_MAX;
> >  else
> >     sbp->f_bavail = tmp;
> 
> That one seems better because it preserves more of the value, but
> perhaps this is better because it does not depend on
> undocumented/undefined behavior (also untested):
> 
> uint64_t tmp;
> tmp = sfp->sf_abytes / NFS_FABLKSIZE;
> if (tmp > (uint64_t) INT64_MAX)
> sbp->f_bavail = INT64_MAX;
> else
> sbp->f_bavail = tmp;
> 
That's basically what I went with for the updated patch, except I
didn't put in the "if (tmp > (uint64_t) INT64_MAX)" since once
you divide sf_abytes by 2 or more it is guaranteed to be less than
or equal INT64_MAX.

rick

From owner-freebsd-fs@FreeBSD.ORG  Sun May  1 21:25:12 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DACEC1065672
	for <freebsd-fs@freebsd.org>; Sun,  1 May 2011 21:25:12 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
	[131.104.91.36])
	by mx1.freebsd.org (Postfix) with ESMTP id 9F8438FC12
	for <freebsd-fs@freebsd.org>; Sun,  1 May 2011 21:25:12 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApwEAJvOvU2DaFvO/2dsb2JhbACEUaJDpFiNAo9IgSqDVYEBBI55hnyHQg
X-IronPort-AV: E=Sophos;i="4.64,299,1301889600"; d="scan'208";a="119251971"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-annu-pri.mail.uoguelph.ca with ESMTP; 01 May 2011 17:25:11 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id CCD1CB3F2D
	for <freebsd-fs@freebsd.org>; Sun,  1 May 2011 17:25:11 -0400 (EDT)
Date: Sun, 1 May 2011 17:25:11 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: FreeBSD FS <freebsd-fs@freebsd.org>
Message-ID: <1404795089.836227.1304285111779.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Subject: RFC: NFS server handling of negative f_bavail?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 01 May 2011 21:25:12 -0000

Hi,

I recently discovered that there seems to be an issue w.r.t.
the f_bavail and f_ffree fields of "struct statfs" since they
are signed values that can be negative.

The RFCs for NFSv3 and NFSv3 define these fields as unsigned
byte counts when they go on the wire. I read that as implying
that negative values can't be represented for them? I tried
a quick test on Solaris10, but I couldn't get the fields to go
negative (they appear to be unsigned in their "struct statvfs"),
so I couldn't find out what it would have done for negative values.

I can think of 2 ways to go:
1 - Have the server reply 0 for these fields when VFS_STATFS()
    passes negative values up.
    This would seem to conform to the RFCs and seems least likely
    to confuse non-BSD clients.
OR
2 - Put the signed value in the uint64_t on the wire. The risk
    here is that some clients will assume it's a large positive
    value.

I admit I don't see the client knowing that the value is negative
instead of 0 as being a big issue for an NFS client mount and am
leaning towards #1, but I'm not familiar with what utilities
might care about the value being negative?

Anyhow, any comments? rick

From owner-freebsd-fs@FreeBSD.ORG  Mon May  2 00:47:51 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 437DE106574E
	for <freebsd-fs@freebsd.org>; Mon,  2 May 2011 00:47:51 +0000 (UTC)
	(envelope-from ambrosehua@gmail.com)
Received: from mail-qy0-f175.google.com (mail-qy0-f175.google.com
	[209.85.216.175])
	by mx1.freebsd.org (Postfix) with ESMTP id D45158FC16
	for <freebsd-fs@freebsd.org>; Mon,  2 May 2011 00:47:50 +0000 (UTC)
Received: by qyk35 with SMTP id 35so1181268qyk.13
	for <multiple recipients>; Sun, 01 May 2011 17:47:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:in-reply-to:references:date
	:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=v+7zZ90snEZJdCGWaWVgG4iLnU/cqcGUVWBtVWFe/68=;
	b=oQmqqfWalaJ1Dw4zRdJ+seo1fKVR+L4wbaiWgWzAa3BnL9Ghz29NtutEpccIY8KK1v
	ZPTdgyxiwmRqcZePgoaAsB7ROjQLXRe5IQVl218KdNJHTwJEgP/OKkbd4+gQRi6/nZi4
	IUJJobmaXTAAq3j2IBcD4PCNDMLsNtOaxQW5g=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	b=QNrGYsTvSKTQSDDEpiWkr5C0kZ5gtBWXUuJIRL3zpIwLAyqpvEhLkZn0NEgAXZmfrJ
	GG+soN/qN4DadkUmAJOm/05MPi1O++YSiYQiK8ptYQO3UJDWoDI5Zcas4Rdem4b7aBTt
	BkzxkX3q+dj7YrDiSuC2bhFfBU7jd2hTPJl/0=
MIME-Version: 1.0
Received: by 10.229.77.142 with SMTP id g14mr17835qck.10.1304297269847; Sun,
	01 May 2011 17:47:49 -0700 (PDT)
Received: by 10.229.18.68 with HTTP; Sun, 1 May 2011 17:47:49 -0700 (PDT)
In-Reply-To: <20110501133627.00006616@unknown>
References: <op.vn2iid1qk84lxj@arrow>
	<20110501133627.00006616@unknown>
Date: Mon, 2 May 2011 08:47:49 +0800
Message-ID: <BANLkTinJmdsjoRfj-4VBOSo2frj9b85q_g@mail.gmail.com>
From: ambrosehuang ambrose <ambrosehua@gmail.com>
To: Alexander Leidinger <Alexander@leidinger.net>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org, dfr@freebsd.org, Emil Smolenski <am@raisa.eu.org>
Subject: Re: [ZFS] Booting from zpool created on 4k-sector drive
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 May 2011 00:47:51 -0000

Here is my trick:
       1 Download the ZFS V28 patch for 8-stable,
       2 patch the 8-stable ,
       3 make buildkernel,
       4 then you will get gptzfsboot, zfsloader, pmbr
       5 install pmbr according to wiki/GPTboot
       6 replace your old gptzfsboot, zfsloader with new ones;
       then you can work around this. It works for me( 3 WD10ears +
ZFS V15 + 8-stable)

2011/5/1 Alexander Leidinger <Alexander@leidinger.net>:
> On Tue, 21 Dec 2010 15:29:01 +0100 "Emil Smolenski" <am@raisa.eu.org>
> wrote:
>
>> Hello,
>>
>> There is a hack to force zpool creation with minimum sector size
>> equal to 4k:
>>
>> # gnop create -S 4096 ${DEV0}
>> # zpool create tank ${DEV0}.nop
>> # zpool export tank
>> # gnop destroy ${DEV0}.nop
>> # zpool import tank
>>
>> Zpool created this way is much faster on problematic 4k sector
>> drives which lies about its sector size (like WD EARS). This hack
>> works perfectly fine when system is running. Gnop layer is created
>> only for "zpool create" command -- ZFS stores information about
>> sector size in its metadata. After zpool creation one can export the
>> pool, remove gnop layer and reimport the pool. Difference can be seen
>> in the output from the zdb command:
>>
>> - on 512 sector device (2**9 =3D 512):
>> % zdb tank |grep ashift
>> ashift=3D9
>>
>> - on 4096 sector device (2**12 =3D 4096):
>> % zdb tank |grep ashift
>> ashift=3D12
>>
>> This change is permanent. The only possibility to change the value
>> of ashift is: zpool destroy/create and restoring pool from backup.
>>
>> But there is one problem: I cannot boot from such pool. Error message:
>>
>> ZFS: i/o error - all block copies unavailable
>> ZFS: can't read MOS
>> ZFS: unexpected object set type 0
>
> FYI: I can boot successfully from a ZFS v28 pool which was created like
> this in a GPT partition (tested with 9-current).
>
> Bye,
> Alexander.
>
> --
> http://www.Leidinger.net =A0 =A0Alexander @ Leidinger.net: PGP ID =3D B00=
63FE7
> http://www.FreeBSD.org =A0 =A0 =A0 netchild @ FreeBSD.org =A0: PGP ID =3D=
 72077137
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>

From owner-freebsd-fs@FreeBSD.ORG  Mon May  2 11:06:58 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 226F6106567A
	for <freebsd-fs@FreeBSD.org>; Mon,  2 May 2011 11:06:58 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 109898FC1E
	for <freebsd-fs@FreeBSD.org>; Mon,  2 May 2011 11:06:58 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p42B6wpP064075
	for <freebsd-fs@FreeBSD.org>; Mon, 2 May 2011 11:06:58 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p42B6v47064073
	for freebsd-fs@FreeBSD.org; Mon, 2 May 2011 11:06:57 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 2 May 2011 11:06:57 GMT
Message-Id: <201105021106.p42B6v47064073@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
	owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@FreeBSD.org>
To: freebsd-fs@FreeBSD.org
Cc: 
Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 May 2011 11:06:58 -0000

Note: to view an individual PR, use:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=(number).

The following is a listing of current problems submitted by FreeBSD users.
These represent problem reports covering all versions including
experimental development code and obsolete releases.


S Tracker      Resp.      Description
--------------------------------------------------------------------------------
p kern/156545  fs         [ufs] mv could break UFS on SMP systems
o kern/156193  fs         [ufs] [hang] UFS snapshot hangs && deadlocks processes
o kern/156168  fs         [nfs] [panic] Kernel panic under concurrent access ove
o kern/156039  fs         [nullfs] [unionfs] nullfs + unionfs do not compose, re
o kern/155615  fs         [zfs] zfs v28 broken on sparc64 -current
o kern/155587  fs         [zfs] [panic] kernel panic with zfs
o kern/155484  fs         [ufs] GPT + UFS boot don't work well together
o kern/155411  fs         [regression] [8.2-release] [tmpfs]: mount: tmpfs : No 
o kern/155199  fs         [ext2fs] ext3fs mounted as ext2fs gives I/O errors
o bin/155104   fs         [zfs][patch] use /dev prefix by default when importing
o kern/154930  fs         [zfs] cannot delete/unlink file from full volume -> EN
o kern/154828  fs         [msdosfs] Unable to create directories on external USB
o kern/154491  fs         [smbfs] smb_co_lock: recursive lock for object 1
o kern/154447  fs         [zfs] [panic] Occasional panics - solaris assert somew
f kern/154228  fs         [md] md getting stuck in wdrain state
o kern/153996  fs         [zfs] zfs root mount error while kernel is not located
o kern/153847  fs         [nfs] [panic] Kernel panic from incorrect m_free in nf
o kern/153753  fs         [zfs] ZFS v15 - grammatical error when attempting to u
o kern/153716  fs         [zfs] zpool scrub time remaining is incorrect
o kern/153695  fs         [patch] [zfs] Booting from zpool created on 4k-sector 
o kern/153680  fs         [xfs] 8.1 failing to mount XFS partitions
o kern/153520  fs         [zfs] Boot from GPT ZFS root on HP BL460c G1 unstable
o kern/153418  fs         [zfs] [panic] Kernel Panic occurred writing to zfs vol
o kern/153351  fs         [zfs] locking directories/files in ZFS
o bin/153258   fs         [patch][zfs] creating ZVOLs requires `refreservation' 
s kern/153173  fs         [zfs] booting from a gzip-compressed dataset doesn't w
o kern/153126  fs         [zfs] vdev failure, zpool=peegel type=vdev.too_small
p kern/152488  fs         [tmpfs] [patch] mtime of file updated when only inode 
o kern/152022  fs         [nfs] nfs service hangs with linux client [regression]
o kern/151942  fs         [zfs] panic during ls(1) zfs snapshot directory
o kern/151905  fs         [zfs] page fault under load in /sbin/zfs
o kern/151845  fs         [smbfs] [patch] smbfs should be upgraded to support Un
o bin/151713   fs         [patch] Bug in growfs(8) with respect to 32-bit overfl
o kern/151648  fs         [zfs] disk wait bug
o kern/151629  fs         [fs] [patch] Skip empty directory entries during name 
o kern/151330  fs         [zfs] will unshare all zfs filesystem after execute a 
o kern/151326  fs         [nfs] nfs exports fail if netgroups contain duplicate 
o kern/151251  fs         [ufs] Can not create files on filesystem with heavy us
o kern/151226  fs         [zfs] can't delete zfs snapshot
o kern/151111  fs         [zfs] vnodes leakage during zfs unmount
o kern/150503  fs         [zfs] ZFS disks are UNAVAIL and corrupted after reboot
o kern/150501  fs         [zfs] ZFS vdev failure vdev.bad_label on amd64
o kern/150390  fs         [zfs] zfs deadlock when arcmsr reports drive faulted
o kern/150336  fs         [nfs] mountd/nfsd became confused; refused to reload n
o kern/150207  fs         zpool(1): zpool import -d /dev tries to open weird dev
o kern/149208  fs         mksnap_ffs(8) hang/deadlock
o kern/149173  fs         [patch] [zfs] make OpenSolaris <sys/nvpair.h> installa
f kern/149022  fs         [hang] File system operations hangs with suspfs state
o kern/149015  fs         [zfs] [patch] misc fixes for ZFS code to build on Glib
o kern/149014  fs         [zfs] [patch] declarations in ZFS libraries/utilities 
o kern/149013  fs         [zfs] [patch] make ZFS makefiles use the libraries fro
o kern/148504  fs         [zfs] ZFS' zpool does not allow replacing drives to be
o kern/148490  fs         [zfs]: zpool attach - resilver bidirectionally, and re
o kern/148368  fs         [zfs] ZFS hanging forever on 8.1-PRERELEASE
o bin/148296   fs         [zfs] [loader] [patch] Very slow probe in /usr/src/sys
o kern/148204  fs         [nfs] UDP NFS causes overload
o kern/148138  fs         [zfs] zfs raidz pool commands freeze
o kern/147903  fs         [zfs] [panic] Kernel panics on faulty zfs device
o kern/147881  fs         [zfs] [patch] ZFS "sharenfs" doesn't allow different "
o kern/147790  fs         [zfs] zfs set acl(mode|inherit) fails on existing zfs
o kern/147560  fs         [zfs] [boot] Booting 8.1-PRERELEASE raidz system take 
o kern/147420  fs         [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt 
o kern/146941  fs         [zfs] [panic] Kernel Double Fault - Happens constantly
o kern/146786  fs         [zfs] zpool import hangs with checksum errors
o kern/146708  fs         [ufs] [panic] Kernel panic in softdep_disk_write_compl
o kern/146528  fs         [zfs] Severe memory leak in ZFS on i386
o kern/146502  fs         [nfs] FreeBSD 8 NFS Client Connection to Server
s kern/145712  fs         [zfs] cannot offline two drives in a raidz2 configurat
o kern/145411  fs         [xfs] [panic] Kernel panics shortly after mounting an 
o bin/145309   fs         bsdlabel: Editing disk label invalidates the whole dev
o kern/145272  fs         [zfs] [panic] Panic during boot when accessing zfs on 
o kern/145246  fs         [ufs] dirhash in 7.3 gratuitously frees hashes when it
o kern/145238  fs         [zfs] [panic] kernel panic on zpool clear tank
o kern/145229  fs         [zfs] Vast differences in ZFS ARC behavior between 8.0
o kern/145189  fs         [nfs] nfsd performs abysmally under load
o kern/144929  fs         [ufs] [lor] vfs_bio.c + ufs_dirhash.c
p kern/144447  fs         [zfs] sharenfs fsunshare() & fsshare_main() non functi
o kern/144416  fs         [panic] Kernel panic on online filesystem optimization
s kern/144415  fs         [zfs] [panic] kernel panics on boot after zfs crash
o kern/144234  fs         [zfs] Cannot boot machine with recent gptzfsboot code 
o kern/143825  fs         [nfs] [panic] Kernel panic on NFS client
o bin/143572   fs         [zfs] zpool(1): [patch] The verbose output from iostat
o kern/143212  fs         [nfs] NFSv4 client strange work ...
o kern/143184  fs         [zfs] [lor] zfs/bufwait LOR
o kern/142914  fs         [zfs] ZFS performance degradation over time
o kern/142878  fs         [zfs] [vfs] lock order reversal
o kern/142597  fs         [ext2fs] ext2fs does not work on filesystems with real
o kern/142489  fs         [zfs] [lor] allproc/zfs LOR
o kern/142466  fs         Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re
o kern/142306  fs         [zfs] [panic] ZFS drive (from OSX Leopard) causes two 
o kern/142068  fs         [ufs] BSD labels are got deleted spontaneously
o kern/141897  fs         [msdosfs] [panic] Kernel panic. msdofs: file name leng
o kern/141463  fs         [nfs] [panic] Frequent kernel panics after upgrade fro
o kern/141305  fs         [zfs] FreeBSD ZFS+sendfile severe performance issues (
o kern/141091  fs         [patch] [nullfs] fix panics with DIAGNOSTIC enabled
o kern/141086  fs         [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS
o kern/141010  fs         [zfs] "zfs scrub" fails when backed by files in UFS2
o kern/140888  fs         [zfs] boot fail from zfs root while the pool resilveri
o kern/140661  fs         [zfs] [patch] /boot/loader fails to work on a GPT/ZFS-
o kern/140640  fs         [zfs] snapshot crash
o kern/140134  fs         [msdosfs] write and fsck destroy filesystem integrity
o kern/140068  fs         [smbfs] [patch] smbfs does not allow semicolon in file
o kern/139725  fs         [zfs] zdb(1) dumps core on i386 when examining zpool c
o kern/139715  fs         [zfs] vfs.numvnodes leak on busy zfs
p bin/139651   fs         [nfs] mount(8): read-only remount of NFS volume does n
o kern/139597  fs         [patch] [tmpfs] tmpfs initializes va_gen but doesn't u
o kern/139564  fs         [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo
o kern/139407  fs         [smbfs] [panic] smb mount causes system crash if remot
o kern/138662  fs         [panic] ffs_blkfree: freeing free block
o kern/138421  fs         [ufs] [patch] remove UFS label limitations
o kern/138202  fs         mount_msdosfs(1) see only 2Gb
o kern/136968  fs         [ufs] [lor] ufs/bufwait/ufs (open)
o kern/136945  fs         [ufs] [lor] filedesc structure/ufs (poll)
o kern/136944  fs         [ffs] [lor] bufwait/snaplk (fsync)
o kern/136873  fs         [ntfs] Missing directories/files on NTFS volume
o kern/136865  fs         [nfs] [patch] NFS exports atomic and on-the-fly atomic
p kern/136470  fs         [nfs] Cannot mount / in read-only, over NFS
o kern/135546  fs         [zfs] zfs.ko module doesn't ignore zpool.cache filenam
o kern/135469  fs         [ufs] [panic] kernel crash on md operation in ufs_dirb
o kern/135050  fs         [zfs] ZFS clears/hides disk errors on reboot
o kern/134491  fs         [zfs] Hot spares are rather cold...
o kern/133676  fs         [smbfs] [panic] umount -f'ing a vnode-based memory dis
o kern/133174  fs         [msdosfs] [patch] msdosfs must support utf-encoded int
o kern/132960  fs         [ufs] [panic] panic:ffs_blkfree: freeing free frag
o kern/132397  fs         reboot causes filesystem corruption (failure to sync b
o kern/132331  fs         [ufs] [lor] LOR ufs and syncer
o kern/132237  fs         [msdosfs] msdosfs has problems to read MSDOS Floppy
o kern/132145  fs         [panic] File System Hard Crashes
o kern/131441  fs         [unionfs] [nullfs] unionfs and/or nullfs not combineab
o kern/131360  fs         [nfs] poor scaling behavior of the NFS server under lo
o kern/131342  fs         [nfs] mounting/unmounting of disks causes NFS to fail
o bin/131341   fs         makefs: error "Bad file descriptor"  on the mount poin
o kern/130920  fs         [msdosfs] cp(1) takes 100% CPU time while copying file
o kern/130210  fs         [nullfs] Error by check nullfs
o kern/129760  fs         [nfs] after 'umount -f' of a stale NFS share FreeBSD l
o kern/129488  fs         [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: 
o kern/129231  fs         [ufs] [patch] New UFS mount (norandom) option - mostly
o kern/129152  fs         [panic] non-userfriendly panic when trying to mount(8)
o kern/127787  fs         [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs
o bin/127270   fs         fsck_msdosfs(8) may crash if BytesPerSec is zero
o kern/127029  fs         [panic] mount(8): trying to mount a write protected zi
o kern/126287  fs         [ufs] [panic] Kernel panics while mounting an UFS file
o kern/125895  fs         [ffs] [panic] kernel: panic: ffs_blkfree: freeing free
s kern/125738  fs         [zfs] [request] SHA256 acceleration in ZFS
o kern/123939  fs         [msdosfs] corrupts new files
o kern/122380  fs         [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash
o bin/122172   fs         [fs]: amd(8) automount daemon dies on 6.3-STABLE i386,
o bin/121898   fs         [nullfs] pwd(1)/getcwd(2) fails with Permission denied
o bin/121366   fs         [zfs] [patch] Automatic disk scrubbing from periodic(8
o bin/121072   fs         [smbfs] mount_smbfs(8) cannot normally convert the cha
f kern/120991  fs         [panic] [ffs] [snapshot] System crashes when manipulat
o kern/120483  fs         [ntfs] [patch] NTFS filesystem locking changes
o kern/120482  fs         [ntfs] [patch] Sync style changes between NetBSD and F
o kern/118912  fs         [2tb] disk sizing/geometry problem with large array
o kern/118713  fs         [minidump] [patch] Display media size required for a k
o bin/118249   fs         [ufs] mv(1): moving a directory changes its mtime
o kern/118107  fs         [ntfs] [panic] Kernel panic when accessing a file at N
o kern/117954  fs         [ufs] dirhash on very large directories blocks the mac
o bin/117315   fs         [smbfs] mount_smbfs(8) and related options can't mount
o kern/117314  fs         [ntfs] Long-filename only NTFS fs'es cause kernel pani
o kern/117158  fs         [zfs] zpool scrub causes panic if geli vdevs detach on
o bin/116980   fs         [msdosfs] [patch] mount_msdosfs(8) resets some flags f
o conf/116931  fs         lack of fsck_cd9660 prevents mounting iso images with 
o kern/116583  fs         [ffs] [hang] System freezes for short time when using 
o kern/116170  fs         [panic] Kernel panic when mounting /tmp
o bin/115361   fs         [zfs] mount(8) gets into a state where it won't set/un
o kern/114955  fs         [cd9660] [patch] [request] support for mask,dirmask,ui
o kern/114847  fs         [ntfs] [patch] [request] dirmask support for NTFS ala 
o kern/114676  fs         [ufs] snapshot creation panics: snapacct_ufs2: bad blo
o bin/114468   fs         [patch] [request] add -d option to umount(8) to detach
o kern/113852  fs         [smbfs] smbfs does not properly implement DFS referral
o bin/113838   fs         [patch] [request] mount(8): add support for relative p
o bin/113049   fs         [patch] [request] make quot(8) use getopt(3) and show 
o kern/112658  fs         [smbfs] [patch] smbfs and caching problems (resolves b
o kern/111843  fs         [msdosfs] Long Names of files are incorrectly created 
o kern/111782  fs         [ufs] dump(8) fails horribly for large filesystems
s bin/111146   fs         [2tb] fsck(8) fails on 6T filesystem
o kern/109024  fs         [msdosfs] [iconv] mount_msdosfs: msdosfs_iconv: Operat
o kern/109010  fs         [msdosfs] can't mv directory within fat32 file system
o bin/107829   fs         [2TB] fdisk(8): invalid boundary checking in fdisk / w
o kern/106107  fs         [ufs] left-over fsck_snapshot after unfinished backgro
o kern/106030  fs         [ufs] [panic] panic in ufs from geom when a dead disk 
o kern/104406  fs         [ufs] Processes get stuck in "ufs" state under persist
o kern/104133  fs         [ext2fs] EXT2FS module corrupts EXT2/3 filesystems
o kern/103035  fs         [ntfs] Directories in NTFS mounted disc images appear 
o kern/101324  fs         [smbfs] smbfs sometimes not case sensitive when it's s
o kern/99290   fs         [ntfs] mount_ntfs ignorant of cluster sizes
s bin/97498    fs         [request] newfs(8) has no option to clear the first 12
o kern/97377   fs         [ntfs] [patch] syntax cleanup for ntfs_ihash.c
o kern/95222   fs         [cd9660] File sections on ISO9660 level 3 CDs ignored
o kern/94849   fs         [ufs] rename on UFS filesystem is not atomic
o bin/94810    fs         fsck(8) incorrectly reports 'file system marked clean'
o kern/94769   fs         [ufs] Multiple file deletions on multi-snapshotted fil
o kern/94733   fs         [smbfs] smbfs may cause double unlock
o kern/93942   fs         [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D
o kern/92272   fs         [ffs] [hang] Filling a filesystem while creating a sna
o kern/91134   fs         [smbfs] [patch] Preserve access and modification time 
a kern/90815   fs         [smbfs] [patch] SMBFS with character conversions somet
o kern/88657   fs         [smbfs] windows client hang when browsing a samba shar
o kern/88555   fs         [panic] ffs_blkfree: freeing free frag on AMD 64
o kern/88266   fs         [smbfs] smbfs does not implement UIO_NOCOPY and sendfi
o bin/87966    fs         [patch] newfs(8): introduce -A flag for newfs to enabl
o kern/87859   fs         [smbfs] System reboot while umount smbfs.
o kern/86587   fs         [msdosfs] rm -r /PATH fails with lots of small files
o bin/85494    fs         fsck_ffs: unchecked use of cg_inosused macro etc.
o kern/80088   fs         [smbfs] Incorrect file time setting on NTFS mounted vi
o bin/74779    fs         Background-fsck checks one filesystem twice and omits 
o kern/73484   fs         [ntfs] Kernel panic when doing `ls` from the client si
o bin/73019    fs         [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino
o kern/71774   fs         [ntfs] NTFS cannot "see" files on a WinXP filesystem
o bin/70600    fs         fsck(8) throws files away when it can't grow lost+foun
o kern/68978   fs         [panic] [ufs] crashes with failing hard disk, loose po
o kern/65920   fs         [nwfs] Mounted Netware filesystem behaves strange
o kern/65901   fs         [smbfs] [patch] smbfs fails fsx write/truncate-down/tr
o kern/61503   fs         [smbfs] mount_smbfs does not work as non-root
o kern/55617   fs         [smbfs] Accessing an nsmb-mounted drive via a smb expo
o kern/51685   fs         [hang] Unbounded inode allocation causes kernel to loc
o kern/51583   fs         [nullfs] [patch] allow to work with devices and socket
o kern/36566   fs         [smbfs] System reboot with dead smb mount and umount
o kern/33464   fs         [ufs] soft update inconsistencies after system crash
o bin/27687    fs         fsck(8) wrapper is not properly passing options to fsc
o kern/18874   fs         [2TB] 32bit NFS servers export wrong negative values t

222 problems total.


From owner-freebsd-fs@FreeBSD.ORG  Mon May  2 14:02:39 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 691001065674;
	Mon,  2 May 2011 14:02:39 +0000 (UTC) (envelope-from jh@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 3FB618FC16;
	Mon,  2 May 2011 14:02:39 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p42E2do7033580;
	Mon, 2 May 2011 14:02:39 GMT (envelope-from jh@freefall.freebsd.org)
Received: (from jh@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p42E2dhJ033574;
	Mon, 2 May 2011 14:02:39 GMT (envelope-from jh)
Date: Mon, 2 May 2011 14:02:39 GMT
Message-Id: <201105021402.p42E2dhJ033574@freefall.freebsd.org>
To: michael.reynolds@gmail.com, jh@FreeBSD.org, freebsd-fs@FreeBSD.org
From: jh@FreeBSD.org
Cc: 
Subject: Re: kern/116170: [panic] Kernel panic when mounting /tmp
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 May 2011 14:02:39 -0000

Synopsis: [panic] Kernel panic when mounting /tmp

State-Changed-From-To: open->feedback
State-Changed-By: jh
State-Changed-When: Mon May 2 14:02:38 UTC 2011
State-Changed-Why: 
Can you still reproduce this on a supported release?

http://www.freebsd.org/cgi/query-pr.cgi?pr=116170

From owner-freebsd-fs@FreeBSD.ORG  Mon May  2 16:09:26 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3A218106566C;
	Mon,  2 May 2011 16:09:26 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au
	[211.29.132.184])
	by mx1.freebsd.org (Postfix) with ESMTP id B68508FC1B;
	Mon,  2 May 2011 16:09:24 +0000 (UTC)
Received: from c122-106-155-58.carlnfd1.nsw.optusnet.com.au
	(c122-106-155-58.carlnfd1.nsw.optusnet.com.au [122.106.155.58])
	by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	p42G9J5P005602
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 3 May 2011 02:09:21 +1000
Date: Tue, 3 May 2011 02:09:19 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Rick Macklem <rmacklem@uoguelph.ca>
In-Reply-To: <733531363.835298.1304281643548.JavaMail.root@erie.cs.uoguelph.ca>
Message-ID: <20110503013724.I2001@besplex.bde.org>
References: <733531363.835298.1304281643548.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: rmacklem@FreeBSD.org, kib@FreeBSD.org, fs@FreeBSD.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 May 2011 16:09:26 -0000

On Sun, 1 May 2011, Rick Macklem wrote:

> Ok, I realized the code in the last post was pretty bogus:-) My only
> excuse was that I typed it as I was running out the door...
>
> So, I played with it a bit and the attached patch seems to work for
> i386. For the fields that are uint64_t in struct statfs, it just
> divides/assigns. For the int64_t field that takes the divided value
> (f_bavail) I did the division/assignment to a uint64_t tmp and then
> assigned that to f_bavail. (Since any value that fits in uint64_t is
> a positive value for int64_t after being divided by 2 or more, it will
> always be positive.) For the other int64_t one, I just check for "> INT64_MAX"
> and set it to INT64_MAX for that case, so it doesn't go negative.

Sorry, I don't like this.  Going through tmp makes no difference since
all values are reduced below INT64_MAX by dividing by just 2.  "Negative"
values are still converted to garbage positive values.

> Anyhow, the updated patch is attached and maybe kib@ can test it?

% --- fs/nfsclient/nfs_clport.c.sav	2011-04-30 20:16:39.000000000 -0400
% +++ fs/nfsclient/nfs_clport.c	2011-05-01 16:11:18.000000000 -0400
% @@ -838,20 +838,19 @@ void
%  nfscl_loadsbinfo(struct nfsmount *nmp, struct nfsstatfs *sfp, void *statfs)
%  {
%  	struct statfs *sbp = (struct statfs *)statfs;
% -	nfsquad_t tquad;
% +	uint64_t tmp;
% 
%  	if (nmp->nm_flag & (NFSMNT_NFSV3 | NFSMNT_NFSV4)) {
%  		sbp->f_bsize = NFS_FABLKSIZE;
% -		tquad.qval = sfp->sf_tbytes;
% -		sbp->f_blocks = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE));
% -		tquad.qval = sfp->sf_fbytes;
% -		sbp->f_bfree = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE));
% -		tquad.qval = sfp->sf_abytes;
% -		sbp->f_bavail = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE));
% -		tquad.qval = sfp->sf_tfiles;
% -		sbp->f_files = (tquad.lval[0] & 0x7fffffff);
% -		tquad.qval = sfp->sf_ffiles;
% -		sbp->f_ffree = (tquad.lval[0] & 0x7fffffff);
% +		sbp->f_blocks = sfp->sf_tbytes / NFS_FABLKSIZE;
% +		sbp->f_bfree = sfp->sf_fbytes / NFS_FABLKSIZE;

OK.

% +		tmp = sfp->sf_abytes / NFS_FABLKSIZE;
% +		sbp->f_bavail = tmp;

The division made it less than 2**55, and kept it nonnegative.  Going through
tmp doesn't change this.

But I still want to use my code to support negative values:

 		sbp->f_bavail = (int64_t)sfp->sf_abytes / NFS_FABLKSIZE;

If the 63rd bit is set, it must mean that the server is an
non-broken^non-conforming FreeBSD one trying to send a negative value,
since file systems with 2 >= 2**63 bytes available are physical impossible
Even if the file system is virtual and growable so that it has no real
limits, it should probably limit itself to much less than 2**63 to avoid
testing whether clients can handle such large values.

% +		sbp->f_files = sfp->sf_tfiles;
% +		if (sfp->sf_ffiles > INT64_MAX)
% +			sbp->f_ffree = INT64_MAX;
% +		else
% +			sbp->f_ffree = sfp->sf_ffiles;

This gives correct-as-possible clamping for large unsigned values, but
gives a garbage large positive value for "negative" values.  Again, negative
values are physically impossible, so if the 63rd bit is set then it must
mean that the server is a FreeBSD one trying to send a negative value.
So I prefer to use my (untested in this case code to support negative
values:

Sloppy version: just assign and depend on 2's complement magic that isn't
guaranteed to be there, and on the type sizes being the same:
 		sbp->f_ffree = sfp->sf_ffiles;

More careful version: first make sure that the 2's complement magic is there:
 		sbp->f_ffree = (int64_t)sfp->sf_ffiles;

%  	} else if ((nmp->nm_flag & NFSMNT_NFSV4) == 0) {
%  		sbp->f_bsize = (int32_t)sfp->sf_bsize;
%  		sbp->f_blocks = (int32_t)sfp->sf_blocks;

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Mon May  2 16:46:19 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B8189106566B;
	Mon,  2 May 2011 16:46:19 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au
	[211.29.132.185])
	by mx1.freebsd.org (Postfix) with ESMTP id 552898FC13;
	Mon,  2 May 2011 16:46:18 +0000 (UTC)
Received: from c122-106-155-58.carlnfd1.nsw.optusnet.com.au
	(c122-106-155-58.carlnfd1.nsw.optusnet.com.au [122.106.155.58])
	by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	p42GkGvE021696
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 3 May 2011 02:46:16 +1000
Date: Tue, 3 May 2011 02:46:16 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Rick Macklem <rmacklem@uoguelph.ca>
In-Reply-To: <135141673.835577.1304282609097.JavaMail.root@erie.cs.uoguelph.ca>
Message-ID: <20110503020940.N2001@besplex.bde.org>
References: <135141673.835577.1304282609097.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: rmacklem@FreeBSD.org, fs@FreeBSD.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 May 2011 16:46:19 -0000

On Sun, 1 May 2011, Rick Macklem wrote:

>>[negative f_bavail and f_ffree]
> Well my concern isn't w.r.t. FreeBSD clients, but other ones. I'll
> start a discussion on freebsd-fs@ about whether a FreeBSD server
> should "cheat" and put negative values (which other clients will
> think are large positive values) on the wire or try and conform
> strictly to the RFC.
>
>> BTW, how does scaling of block counts by NFS_FABLKSIZE in the v3 (and
>> v4?) cases work? I can only see it in clients. Servers seem to start
>> with block counts and never convert to byte counts.
>
> It must be somewhere, since they are uint64_t byte counts on the wire,
> except for NFSv2, which used block counts of the block size provided
> in the same response.

I'm not sure how I missed this.  The multiplications are there.  They
have the usual (potential) overflow bugs and the usual (actual) sign
extension bugs.  See below.

I now see how to fix most of the overflow problems for the v2 case very
easily by scaling on the server, so that clients see only small values.
This should be portable.

Here is the new nfs server code for this:

 	if (nd->nd_flag & ND_NFSV2) {
 		NFSM_BUILD(tl, u_int32_t *, NFSX_V2STATFS);
 		*tl++ = txdr_unsigned(NFS_V2MAXDATA);
 		*tl++ = txdr_unsigned(sf->f_bsize);
 		*tl++ = txdr_unsigned(sf->f_blocks);
 		*tl++ = txdr_unsigned(sf->f_bfree);
 		*tl = txdr_unsigned(sf->f_bavail);

This just reads server fs values from struct statfs, blindly truncates
them, and puts them on the wire.  With just 1 more line -- a call to
statfs_scale_blocks(sf, UINT32_MAX), it can adjust sf so that all the
values fit on the wire.  Or more safely, it can get values that fit
in the 32-bit longs on old FreeBSD clients and in the bogusly-cast-to-int32_t
values in current FreeBSD clients by calling statfs_scale_blocks(sf,
INT32_MAX).  This should be portable.  The adjustments may scale the
block size from 16384 to a very large value, but clients should already
be able to handle the "any" value.  Already, the v2 block size value is
rarely NFS_FABLKSIZE and rarely what it used to be since it is under
server control.  It used to be usually 4096 for ffs, but it is now usually
16384 for ffs, and can easily be 65536 for ffs.  Above 65536 there might
be more problems but 65536 works up to 128 TB with an int32_t max (2**31-1
blocks times 2**16 bsize = 2**47 - 2**16).  Even ffs's default block size
of 16K works up to 32 TB.  File systems of size >= 32TB are still rare
and are more rarely used with v2, so perhaps the overflows haven't occurred
for anyone yet.

 	} else {
 		NFSM_BUILD(tl, u_int32_t *, NFSX_V3STATFS);
 		tval = (u_quad_t)sf->f_blocks;
 		tval *= (u_quad_t)sf->f_bsize;

This of course does the inverse of the scaling done by the client.

This could be written as:
 		tval = sf->f_blocks * sf->f_bsize;
The casts have no effect, since eveything is already 64 bits.  You may
as well assume this, since you have to assume lots about the types and
values for this code to work at all.  For example, suppose
sf->f_blocks >= 2**63 (so that full 64-bitness including no space for
a sign bit is actually needed for a block count).  Then multiplying
by a 64-bit sf->f_blocks may overflow, and you need to do a uint128_t
multiplication to avoid overflow.  This is difficult since uint128_t
ins not supported in C on any arch in FreeBSD.  Then the 128-bit values
won't fit on the wire, and the need to be scaled as in the v2 case.
But the v3 case doesn't seem to pass f_bsize, so it can't do this scaling
and would need to clamp.

 		txdr_hyper(tval, tl); tl += 2;
 		tval = (u_quad_t)sf->f_bfree;
 		tval *= (u_quad_t)sf->f_bsize;
 		txdr_hyper(tval, tl); tl += 2;
 		tval = (u_quad_t)sf->f_bavail;
 		tval *= (u_quad_t)sf->f_bsize;

The type errors are more serious for this signed field.  Suppose
sf->f_bsize is -1.  Then tval is initially 0xFFFFFFFFFFFFFFFF.
Suppose sf->f_bsize is 16K.  Then the multiplication overflows.
IIRC, the the result is implementation-defined (not undefined for
unsigned's) and is normally (uint64_t)-16K = 0xFFFFFFFFFFFFC000.
This is the right value for passing -16K as a large unsigned value.
Careful code would generate this value without using an overflowing
multiplication.

 		txdr_hyper(tval, tl); tl += 2;
 		tval = (u_quad_t)sf->f_files;
 		txdr_hyper(tval, tl); tl += 2;
 		tval = (u_quad_t)sf->f_ffree;
 		txdr_hyper(tval, tl); tl += 2;
 		tval = (u_quad_t)sf->f_ffree;
 		txdr_hyper(tval, tl); tl += 2;
 		*tl = 0;
 	}

> I'll try and make my Solaris10 box get to -ve frees and then see what
> it puts on the wire. After that, I'll start a discussion on freebsd-fs@
> about how they think a FreeBSD server should behave when f_bavail and/or
> f_ffree are negative.

The result on Solaris would be interesting.  Does Solaris still support
ffs?  You said later that you couldn't get it to generate negative values.

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Mon May  2 19:15:19 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4B8D01065673;
	Mon,  2 May 2011 19:15:19 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
	[131.104.91.36])
	by mx1.freebsd.org (Postfix) with ESMTP id EC2558FC12;
	Mon,  2 May 2011 19:15:18 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApwEALABv02DaFvO/2dsb2JhbACEUaIyiHGoF5A6gSqDVYEBBI55jj4
X-IronPort-AV: E=Sophos;i="4.64,303,1301889600"; d="scan'208";a="119343900"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-annu-pri.mail.uoguelph.ca with ESMTP; 02 May 2011 15:15:18 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 06A5EB3F7E;
	Mon,  2 May 2011 15:15:18 -0400 (EDT)
Date: Mon, 2 May 2011 15:15:18 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <433279102.889960.1304363717963.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20110503020940.N2001@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: rmacklem@FreeBSD.org, fs@FreeBSD.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 May 2011 19:15:19 -0000

> 
> > I'll try and make my Solaris10 box get to -ve frees and then see
> > what
> > it puts on the wire. After that, I'll start a discussion on
> > freebsd-fs@
> > about how they think a FreeBSD server should behave when f_bavail
> > and/or
> > f_ffree are negative.
> 
> The result on Solaris would be interesting. Does Solaris still support
> ffs? You said later that you couldn't get it to generate negative
> values.
> 
It has some variation of FFS with logging, which is what I use. Writing
a file as root fails with "no space" when "df" reports about 7000blocks
free. (I have no idea why it stops at around 7000. Something to do with
the log, maybe?)

Anyhow, it doesn't report negative values and all the fields in what
they call "struct statfvs" are unsigned numbers, including bavail.

rick

From owner-freebsd-fs@FreeBSD.ORG  Mon May  2 19:43:53 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6FF70106566B;
	Mon,  2 May 2011 19:43:53 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
	[131.104.91.36])
	by mx1.freebsd.org (Postfix) with ESMTP id 04B018FC0A;
	Mon,  2 May 2011 19:43:52 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApwEAKgIv02DaFvO/2dsb2JhbACEUaIziHGoNZA/gSqDVYEBBI55jj4
X-IronPort-AV: E=Sophos;i="4.64,304,1301889600"; d="scan'208";a="119347033"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-annu-pri.mail.uoguelph.ca with ESMTP; 02 May 2011 15:43:52 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 4329FB3E95;
	Mon,  2 May 2011 15:43:52 -0400 (EDT)
Date: Mon, 2 May 2011 15:43:52 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <413547662.892660.1304365432183.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20110503013724.I2001@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: rmacklem@FreeBSD.org, kib@FreeBSD.org, fs@FreeBSD.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 May 2011 19:43:53 -0000

> On Sun, 1 May 2011, Rick Macklem wrote:
> 
> > Ok, I realized the code in the last post was pretty bogus:-) My only
> > excuse was that I typed it as I was running out the door...
> >
> > So, I played with it a bit and the attached patch seems to work for
> > i386. For the fields that are uint64_t in struct statfs, it just
> > divides/assigns. For the int64_t field that takes the divided value
> > (f_bavail) I did the division/assignment to a uint64_t tmp and then
> > assigned that to f_bavail. (Since any value that fits in uint64_t is
> > a positive value for int64_t after being divided by 2 or more, it
> > will
> > always be positive.) For the other int64_t one, I just check for ">
> > INT64_MAX"
> > and set it to INT64_MAX for that case, so it doesn't go negative.
> 
> Sorry, I don't like this. Going through tmp makes no difference since
> all values are reduced below INT64_MAX by dividing by just 2.
> "Negative"
> values are still converted to garbage positive values.
> 
> > Anyhow, the updated patch is attached and maybe kib@ can test it?
> 
> % --- fs/nfsclient/nfs_clport.c.sav 2011-04-30 20:16:39.000000000
> -0400
> % +++ fs/nfsclient/nfs_clport.c 2011-05-01 16:11:18.000000000 -0400
> % @@ -838,20 +838,19 @@ void
> % nfscl_loadsbinfo(struct nfsmount *nmp, struct nfsstatfs *sfp, void
> *statfs)
> % {
> % struct statfs *sbp = (struct statfs *)statfs;
> % - nfsquad_t tquad;
> % + uint64_t tmp;
> %
> % if (nmp->nm_flag & (NFSMNT_NFSV3 | NFSMNT_NFSV4)) {
> % sbp->f_bsize = NFS_FABLKSIZE;
> % - tquad.qval = sfp->sf_tbytes;
> % - sbp->f_blocks = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE));
> % - tquad.qval = sfp->sf_fbytes;
> % - sbp->f_bfree = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE));
> % - tquad.qval = sfp->sf_abytes;
> % - sbp->f_bavail = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE));
> % - tquad.qval = sfp->sf_tfiles;
> % - sbp->f_files = (tquad.lval[0] & 0x7fffffff);
> % - tquad.qval = sfp->sf_ffiles;
> % - sbp->f_ffree = (tquad.lval[0] & 0x7fffffff);
> % + sbp->f_blocks = sfp->sf_tbytes / NFS_FABLKSIZE;
> % + sbp->f_bfree = sfp->sf_fbytes / NFS_FABLKSIZE;
> 
> OK.
> 
> % + tmp = sfp->sf_abytes / NFS_FABLKSIZE;
> % + sbp->f_bavail = tmp;
> 
> The division made it less than 2**55, and kept it nonnegative. Going
> through
> tmp doesn't change this.
> 
Agreed. The "tmp" was left over from when I had "if (tmp > INT64_MAX)",
which I realized I didn't need. I can just change it to:
    sbp->f_bavail = sfp->sf_abytes / NFS_FABLKSIZE;

> But I still want to use my code to support negative values:
> 
> sbp->f_bavail = (int64_t)sfp->sf_abytes / NFS_FABLKSIZE;
> 
> If the 63rd bit is set, it must mean that the server is an
> non-broken^non-conforming FreeBSD one trying to send a negative value,
> since file systems with 2 >= 2**63 bytes available are physical
> impossible
> Even if the file system is virtual and growable so that it has no real
> limits, it should probably limit itself to much less than 2**63 to
> avoid
> testing whether clients can handle such large values.
> 

Well, since the RFCs don't say that, I think it shouldn't be assumed.
(I could assume that having the 63rd but set just means a server doesn't
 know the exact answer and chooses to say "lots are free", but the truth
 is, neither of us know.)

I've asked the question over on freebsd-fs@ and I'll wait to see what
everyone thinks w.r.t. RFC conformance vs hiding negative values in
the fields. If the collective agrees with you, I don't mind the code
assuming that 63rd bit set means negative.

> % + sbp->f_files = sfp->sf_tfiles;
> % + if (sfp->sf_ffiles > INT64_MAX)
> % + sbp->f_ffree = INT64_MAX;
> % + else
> % + sbp->f_ffree = sfp->sf_ffiles;
> 
> This gives correct-as-possible clamping for large unsigned values, but
> gives a garbage large positive value for "negative" values. Again,
> negative
> values are physically impossible, so if the 63rd bit is set then it
> must
> mean that the server is a FreeBSD one trying to send a negative value.
> So I prefer to use my (untested in this case code to support negative
> values:
> 
same as above, since the RFC says they're unsigned, I think that's what
the client should assume.

rick

From owner-freebsd-fs@FreeBSD.ORG  Mon May  2 20:47:15 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id AC269106566C;
	Mon,  2 May 2011 20:47:15 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
	[131.104.91.36])
	by mx1.freebsd.org (Postfix) with ESMTP id 543468FC18;
	Mon,  2 May 2011 20:47:15 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApwEABEYv02DaFvO/2dsb2JhbACEUaIziHGpWZBHgSqDVYEBBI55jj4
X-IronPort-AV: E=Sophos;i="4.64,304,1301889600"; d="scan'208";a="119354736"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-annu-pri.mail.uoguelph.ca with ESMTP; 02 May 2011 16:47:05 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id A24A7B3F54;
	Mon,  2 May 2011 16:47:05 -0400 (EDT)
Date: Mon, 2 May 2011 16:47:05 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <1040257715.898126.1304369225601.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20110503020940.N2001@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: rmacklem@FreeBSD.org, fs@FreeBSD.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 May 2011 20:47:15 -0000

> 
> > I'll try and make my Solaris10 box get to -ve frees and then see
> > what
> > it puts on the wire. After that, I'll start a discussion on
> > freebsd-fs@
> > about how they think a FreeBSD server should behave when f_bavail
> > and/or
> > f_ffree are negative.
> 
> The result on Solaris would be interesting. Does Solaris still support
> ffs? You said later that you couldn't get it to generate negative
> values.
> 
Well, I just did the reverse (ran a FreeBSD FFS disk out of space so
it reported a -ve free and mounted in on Solaris10). Here are the
"df" outputs (I used "df -k" on Solaris, since that's a compatible format):

FreeBSD-current server (nfsv4-newlap):
Filesystem  1K-blocks    Used   Avail Capacity  Mounted on
/dev/ad4s3a   2026030  671492 1192456    36%    /
devfs               1       1       0   100%    /dev
/dev/ad4s3e   4697030 4544054 -222786   105%    /sub1
/dev/ad4s3d   5077038  641462 4029414    14%    /usr

Solaris10 client:
Filesystem            kbytes    used   avail capacity  Mounted on
/dev/dsk/c0d0s0      3870110 2790938 1040471    73%    /
/devices                   0       0       0     0%    /devices
ctfs                       0       0       0     0%    /system/contract
proc                       0       0       0     0%    /proc
mnttab                     0       0       0     0%    /etc/mnttab
swap                  975736     624  975112     1%    /etc/svc/volatile
objfs                      0       0       0     0%    /system/object
/usr/lib/libc/libc_hwcap1.so.1 3870110 2790938 1040471    73%    /lib/libc.so.1
fd                         0       0       0     0%    /dev/fd
swap                  975112       0  975112     0%    /tmp
swap                  975140      28  975112     1%    /var/run
/dev/dsk/c0d0s7      5608190 4118091 1434018    75%    /export/home
nfsv4-newlap:/sub1   4697030 4544054 18014398509259198     1%    /mnt

as you can see, Solaris10 doesn't assume it's negative and
reports lottsa avail.

I don't have a Linux client handy, so I can't do the same test
with Linux, rick

From owner-freebsd-fs@FreeBSD.ORG  Mon May  2 20:58:16 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 74B41106566B
	for <freebsd-fs@freebsd.org>; Mon,  2 May 2011 20:58:16 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
	[131.104.91.36])
	by mx1.freebsd.org (Postfix) with ESMTP id 34DF38FC12
	for <freebsd-fs@freebsd.org>; Mon,  2 May 2011 20:58:15 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AiwHAEQav02DaFvO/2dsb2JhbACEUZNxjkKlW40CkEeBKoNVgQEEjnmGfIdC
X-IronPort-AV: E=Sophos;i="4.64,304,1301889600"; d="scan'208";a="119356546"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-annu-pri.mail.uoguelph.ca with ESMTP; 02 May 2011 16:58:15 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 4D324B3F54
	for <freebsd-fs@freebsd.org>; Mon,  2 May 2011 16:58:15 -0400 (EDT)
Date: Mon, 2 May 2011 16:58:15 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: freebsd-fs@freebsd.org
Message-ID: <924130649.898737.1304369895239.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Subject: Re: RFC: NFS server handling of negative f_bavail?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 May 2011 20:58:16 -0000

I just ran a little test where I ran an FFS volume on
a FreeBSD-current server out of space so that it showed
negative avail and then mounted it on Solaris10. Here
are the dfs for the server and client.

FreeBSD server (nfsv4-newlap):
Filesystem  1K-blocks    Used   Avail Capacity  Mounted on
/dev/ad4s3a   2026030  671492 1192456    36%    /
devfs               1       1       0   100%    /dev
/dev/ad4s3e   4697030 4544054 -222786   105%    /sub1
/dev/ad4s3d   5077038  641462 4029414    14%    /usr

and for the Solaris10 client:
Filesystem            kbytes    used   avail capacity  Mounted on
/dev/dsk/c0d0s0      3870110 2790938 1040471    73%    /
/devices                   0       0       0     0%    /devices
ctfs                       0       0       0     0%    /system/contract
proc                       0       0       0     0%    /proc
mnttab                     0       0       0     0%    /etc/mnttab
swap                  975736     624  975112     1%    /etc/svc/volatile
objfs                      0       0       0     0%    /system/object
/usr/lib/libc/libc_hwcap1.so.1 3870110 2790938 1040471    73%    /lib/libc.so.1
fd                         0       0       0     0%    /dev/fd
swap                  975112       0  975112     0%    /tmp
swap                  975140      28  975112     1%    /var/run
/dev/dsk/c0d0s7      5608190 4118091 1434018    75%    /export/home
nfsv4-newlap:/sub1   4697030 4544054 18014398509259198     1%    /mnt

You can see that the Solaris10 client thinks there is lottsa
avail. I think sending the field as 0 over the wire would
provide better interoperability.

rick

From owner-freebsd-fs@FreeBSD.ORG  Mon May  2 22:51:52 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 109CE106566B
	for <freebsd-fs@freebsd.org>; Mon,  2 May 2011 22:51:52 +0000 (UTC)
	(envelope-from jan.koum@gmail.com)
Received: from mail-qy0-f175.google.com (mail-qy0-f175.google.com
	[209.85.216.175])
	by mx1.freebsd.org (Postfix) with ESMTP id C22228FC08
	for <freebsd-fs@freebsd.org>; Mon,  2 May 2011 22:51:51 +0000 (UTC)
Received: by qyk35 with SMTP id 35so1765337qyk.13
	for <freebsd-fs@freebsd.org>; Mon, 02 May 2011 15:51:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:sender:date:x-google-sender-auth
	:message-id:subject:from:to:cc:content-type;
	bh=hMW2o1DB+Dg/hvA3CU7AvAk0y3vnmXZSGPg9JWe5YBc=;
	b=C5w60uORY92tiZH9fr+ouhTjACvTPzAId6sVPMSVf60ENyS+p0yB4x1EHPiIQ+SMY5
	UX/DxoUv9fkANlpLvyA+GW7ozJfVr/E1v+sodY7lUse5BOgT0YDdFtdRVaRsSr4xLX4J
	Z1uePKWAjElzLqpyKrT1ca1uoBgnUoF7D4Mlo=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:sender:date:x-google-sender-auth:message-id:subject
	:from:to:cc:content-type;
	b=YhjVOpbaKfp6FYC/Q4flbgjVzOYcqALeLuIDjxyue02ZhTMBQSIDOKGjQF3rSYZLck
	K4KMOry6pqlPWB+SRWGZMFYMHqGn1q8xo8aMYOosWEZ95pV5Siv2cr9ctyHoZjCACy4L
	cl4jScs2SC4bye2kuOl8nO/+JtIKpwGMiFtVw=
MIME-Version: 1.0
Received: by 10.224.28.133 with SMTP id m5mr6781069qac.281.1304375303875; Mon,
	02 May 2011 15:28:23 -0700 (PDT)
Sender: jan.koum@gmail.com
Received: by 10.229.88.73 with HTTP; Mon, 2 May 2011 15:28:23 -0700 (PDT)
Date: Mon, 2 May 2011 15:28:23 -0700
X-Google-Sender-Auth: 43yF5vdY7S7ZqdHDMO8smWtaXvs
Message-ID: <BANLkTin-qEoxxFbjJkDaA_-UZMkza08NNQ@mail.gmail.com>
From: Jan Koum <jan@whatsapp.com>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: Chris Peiffer <chris@whatsapp.com>
Subject: very strange IO issue with FreeBSD 8 and SSD
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 May 2011 22:51:52 -0000

hello,

we are seeing some strange activity on our FreeBSD systems running
8.2-PRERELEASE snapshot from early december

our system has 4 Intel SSD drives (64GB each) connected directly into
motherboard through AHCI:

ad4: setting UDMA100
ad4: 61057MB <SSDSA2SH064G1GC INTEL 045C8860> at ata2-master UDMA100 SATA
3Gb/s
ad4: 125045424 sectors [124053C/16H/63S] 16 sectors/interrupt 1 depth queue
[...]
ad7: setting UDMA100
ad7: 61057MB <SSDSA2SH064G1GC INTEL 045C8860> at ata3-slave UDMA100 SATA
3Gb/s
ad7: 125045424 sectors [124053C/16H/63S] 16 sectors/interrupt 1 depth queue

$ df -h
Filesystem     Size    Used   Avail Capacity  Mounted on
/dev/ad4s1a     57G     24G     29G    45%    /
/dev/ad5a       58G     17G     36G    32%    /d2
/dev/ad7a       58G     17G     36G    32%    /d4
/dev/ad6a       58G     17G     36G    32%    /d3

so far - so good, right?  this is where things get very bizarre:  our
application receives data from network and writes to disk.   on average the
file size grows to about 7Kbytes while an average file append is 300-400
bytes.

netstat shows about 700-800Kbytes of input and our application log shows we
write about 500Kbytes each second.  however, when i run iostat i we see
upwards of 10MB a second written to disk (if not more).  for example:

$ iostat -KC -x 1
                        extended device statistics             cpu
device     r/s   w/s    kr/s    kw/s wait svc_t  %b  us ni sy in id
ad4        9.0 423.3    45.2  4410.1    0  84.3  11   5  0  5  1 89
ad5        9.0 420.7    44.9  4237.4    0  82.3  11
ad6        9.0 420.6    45.1  4254.4    0  81.1  11
ad7        9.0 420.3    44.9  4225.7    0  83.8  11
                        extended device statistics             cpu
device     r/s   w/s    kr/s    kw/s wait svc_t  %b  us ni sy in id
ad4       14.9 157.9    79.5  1108.4    0  31.7  18   8  0  5  1 86
ad5       15.9 1480.8    63.6 18886.1    0  36.4  19
ad6       20.9 154.9    93.4  1032.9    0   7.4   4
ad7       19.9 216.5    63.6  1450.0    0   9.2   4
                        extended device statistics             cpu
device     r/s   w/s    kr/s    kw/s wait svc_t  %b  us ni sy in id
ad4       20.9 169.2   115.4  1271.7    0  39.3  13   9  0  4  1 85
ad5       21.9 1179.1   129.4 11598.1    0  34.6  14
ad6       14.9 140.3    39.8   925.4    0   9.4   3
ad7       15.9 213.9    33.8  1610.0    0   7.9   3
                        extended device statistics             cpu
device     r/s   w/s    kr/s    kw/s wait svc_t  %b  us ni sy in id
ad4       15.9 403.6    53.7  3208.6    0  30.0  10   8  0  6  1 85
ad5       16.9 709.7    47.7  4691.6    0  20.2   9
ad6       23.9 321.1    97.4  2262.3    0  12.9   7
ad7       14.9 421.4    51.7  3437.2    0  13.3   7

(apologies in advance for bad formatting)

so, here are we are, looking at iostat output and trying to figure out how
it can be this bad and where the discrepancy is coming from.  a few things
to get out of the way: no, we do not have TRIM enabled yet, we would need to
upgrade OS for that, but we don't think TRIM would make such a big
different.  also we know that we can newfs with -b 512 -f 4096 but again, we
also dont think that it would account for such a large IO discrepancy.

any thoughts to what this could be?  has anybody seen anything similar
before?  10MB of metadata for 500K worth of disk writes?  that can't be....
right?

From owner-freebsd-fs@FreeBSD.ORG  Mon May  2 23:36:06 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4EF611065670
	for <freebsd-fs@freebsd.org>; Mon,  2 May 2011 23:36:06 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta02.westchester.pa.mail.comcast.net
	(qmta02.westchester.pa.mail.comcast.net [76.96.62.24])
	by mx1.freebsd.org (Postfix) with ESMTP id EF1DA8FC16
	for <freebsd-fs@freebsd.org>; Mon,  2 May 2011 23:36:05 +0000 (UTC)
Received: from omta16.westchester.pa.mail.comcast.net ([76.96.62.88])
	by qmta02.westchester.pa.mail.comcast.net with comcast
	id en811g00A1uE5Es52nc6Vl; Mon, 02 May 2011 23:36:06 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta16.westchester.pa.mail.comcast.net with comcast
	id enc31g00U1t3BNj3cnc4SD; Mon, 02 May 2011 23:36:05 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id B6A159B418; Mon,  2 May 2011 16:36:01 -0700 (PDT)
Date: Mon, 2 May 2011 16:36:01 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Jan Koum <jan@whatsapp.com>
Message-ID: <20110502233601.GA29710@icarus.home.lan>
References: <BANLkTin-qEoxxFbjJkDaA_-UZMkza08NNQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <BANLkTin-qEoxxFbjJkDaA_-UZMkza08NNQ@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org, Chris Peiffer <chris@whatsapp.com>
Subject: Re: very strange IO issue with FreeBSD 8 and SSD
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 May 2011 23:36:06 -0000

On Mon, May 02, 2011 at 03:28:23PM -0700, Jan Koum wrote:
> hello,
> 
> we are seeing some strange activity on our FreeBSD systems running
> 8.2-PRERELEASE snapshot from early december
> 
> our system has 4 Intel SSD drives (64GB each) connected directly into
> motherboard through AHCI:
> 
> ad4: setting UDMA100
> ad4: 61057MB <SSDSA2SH064G1GC INTEL 045C8860> at ata2-master UDMA100 SATA
> 3Gb/s
> ad4: 125045424 sectors [124053C/16H/63S] 16 sectors/interrupt 1 depth queue
> [...]
> ad7: setting UDMA100
> ad7: 61057MB <SSDSA2SH064G1GC INTEL 045C8860> at ata3-slave UDMA100 SATA
> 3Gb/s
> ad7: 125045424 sectors [124053C/16H/63S] 16 sectors/interrupt 1 depth queue
> 
> $ df -h
> Filesystem     Size    Used   Avail Capacity  Mounted on
> /dev/ad4s1a     57G     24G     29G    45%    /
> /dev/ad5a       58G     17G     36G    32%    /d2
> /dev/ad7a       58G     17G     36G    32%    /d4
> /dev/ad6a       58G     17G     36G    32%    /d3
> 
> so far - so good, right?  this is where things get very bizarre:  our
> application receives data from network and writes to disk.   on average the
> file size grows to about 7Kbytes while an average file append is 300-400
> bytes.
> 
> netstat shows about 700-800Kbytes of input and our application log shows we
> write about 500Kbytes each second.  however, when i run iostat i we see
> upwards of 10MB a second written to disk (if not more).  for example:
> 
> $ iostat -KC -x 1
>                         extended device statistics             cpu
> device     r/s   w/s    kr/s    kw/s wait svc_t  %b  us ni sy in id
> ad4        9.0 423.3    45.2  4410.1    0  84.3  11   5  0  5  1 89
> ad5        9.0 420.7    44.9  4237.4    0  82.3  11
> ad6        9.0 420.6    45.1  4254.4    0  81.1  11
> ad7        9.0 420.3    44.9  4225.7    0  83.8  11
>                         extended device statistics             cpu
> device     r/s   w/s    kr/s    kw/s wait svc_t  %b  us ni sy in id
> ad4       14.9 157.9    79.5  1108.4    0  31.7  18   8  0  5  1 86
> ad5       15.9 1480.8    63.6 18886.1    0  36.4  19
> ad6       20.9 154.9    93.4  1032.9    0   7.4   4
> ad7       19.9 216.5    63.6  1450.0    0   9.2   4
>                         extended device statistics             cpu
> device     r/s   w/s    kr/s    kw/s wait svc_t  %b  us ni sy in id
> ad4       20.9 169.2   115.4  1271.7    0  39.3  13   9  0  4  1 85
> ad5       21.9 1179.1   129.4 11598.1    0  34.6  14
> ad6       14.9 140.3    39.8   925.4    0   9.4   3
> ad7       15.9 213.9    33.8  1610.0    0   7.9   3
>                         extended device statistics             cpu
> device     r/s   w/s    kr/s    kw/s wait svc_t  %b  us ni sy in id
> ad4       15.9 403.6    53.7  3208.6    0  30.0  10   8  0  6  1 85
> ad5       16.9 709.7    47.7  4691.6    0  20.2   9
> ad6       23.9 321.1    97.4  2262.3    0  12.9   7
> ad7       14.9 421.4    51.7  3437.2    0  13.3   7
> 
> (apologies in advance for bad formatting)
> 
> so, here are we are, looking at iostat output and trying to figure out how
> it can be this bad and where the discrepancy is coming from.  a few things
> to get out of the way: no, we do not have TRIM enabled yet, we would need to
> upgrade OS for that, but we don't think TRIM would make such a big
> different.  also we know that we can newfs with -b 512 -f 4096 but again, we
> also dont think that it would account for such a large IO discrepancy.
>
> any thoughts to what this could be?  has anybody seen anything similar
> before?  10MB of metadata for 500K worth of disk writes?  that can't be....
> right?

I would recommend trying ahci.ko instead of ataahci.ko.  Your device
names will change (ad4 --> ada0, ad5 --> ada1, etc.).  Just add
ahci_load="yes" to /boot/loader.conf and reboot into single-user, fix
/etc/fstab and related configuration files, and that's all you should
have to do.

We use Intel SSDs (X25-M 80GB) in our servers, also backed by UFS2 with
softupdates.  Controllers are Intel ICH7R (in AHCI mode) and Intel ICH9R
(also in AHCI mode).  We *did not* apply any 4K alignment when making
the partitions.  We use ahci.ko.  I haven't tested write speeds and all
that, but the disks work fine.

You might also try comparing iostat output to gstat output, though gstat
refreshes the screen continually making this a little difficult.

I would recommend "gstat -I500ms -f '^ad[0-9]$' and watch closely.
Change the regex, of course, if you switch to ahci.ko.

If you want to compare benchmarks, I need to know exactly what to do to
reproduce the issue you're stating.  I would prefer the traffic not come
off the network (e.g. use dd or bonnie++ or something) to rule out
problems there.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Mon May  2 23:57:18 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1DBDD1065670
	for <fs@freebsd.org>; Mon,  2 May 2011 23:57:18 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
	[131.104.91.36])
	by mx1.freebsd.org (Postfix) with ESMTP id CA05F8FC1C
	for <fs@freebsd.org>; Mon,  2 May 2011 23:57:17 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApwEAHBEv02DaFvO/2dsb2JhbACEUaI2tCiQWIR/gQEEjnmOPg
X-IronPort-AV: E=Sophos;i="4.64,306,1301889600"; d="scan'208";a="119370570"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-annu-pri.mail.uoguelph.ca with ESMTP; 02 May 2011 19:57:16 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id B78B7B3F2D;
	Mon,  2 May 2011 19:57:16 -0400 (EDT)
Date: Mon, 2 May 2011 19:57:16 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Kostik Belousov <kostikbel@gmail.com>
Message-ID: <2119325179.903923.1304380636687.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20110503020940.N2001@besplex.bde.org>
MIME-Version: 1.0
Content-Type: multipart/mixed; 
	boundary="----=_Part_903922_2059190712.1304380636685"
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: fs@FreeBSD.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 May 2011 23:57:18 -0000

------=_Part_903922_2059190712.1304380636685
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

Hi,

I have attached a version of the patch that I intend to commit
unless it doesn't work for Kostik's test case. Kostik, could
you please test this one.

Yes, Bruce, I realize you won't like it, but I
have put some comments in it
to try and clarify why it is coded the way it is.
(The arithmetic seems to work the way I would expect it to for
 i386, which is the only arch I have for testing.)

If the "collective concensus" is to "cheat" and put the negative
values in the uint64_t on the wire, then I can commit a change
to handle that later. If anyone has input w.r.t. this, please post
it under the Subject heading "NFS server handling of negative f_bavail?"
on freebsd-fs@freebsd.org.

I basically need to move onto other issues, rick

------=_Part_903922_2059190712.1304380636685
Content-Type: text/x-patch; name=statfs.patch
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename=statfs.patch

LS0tIGZzL25mc2NsaWVudC9uZnNfY2xwb3J0LmMuc2F2CTIwMTEtMDQtMzAgMjA6MTY6MzkuMDAw
MDAwMDAwIC0wNDAwCisrKyBmcy9uZnNjbGllbnQvbmZzX2NscG9ydC5jCTIwMTEtMDUtMDIgMTk6
MzI6MzEuMDAwMDAwMDAwIC0wNDAwCkBAIC04MzgsMjEgKzgzOCwzMyBAQCB2b2lkCiBuZnNjbF9s
b2Fkc2JpbmZvKHN0cnVjdCBuZnNtb3VudCAqbm1wLCBzdHJ1Y3QgbmZzc3RhdGZzICpzZnAsIHZv
aWQgKnN0YXRmcykKIHsKIAlzdHJ1Y3Qgc3RhdGZzICpzYnAgPSAoc3RydWN0IHN0YXRmcyAqKXN0
YXRmczsKLQluZnNxdWFkX3QgdHF1YWQ7CiAKIAlpZiAobm1wLT5ubV9mbGFnICYgKE5GU01OVF9O
RlNWMyB8IE5GU01OVF9ORlNWNCkpIHsKIAkJc2JwLT5mX2JzaXplID0gTkZTX0ZBQkxLU0laRTsK
LQkJdHF1YWQucXZhbCA9IHNmcC0+c2ZfdGJ5dGVzOwotCQlzYnAtPmZfYmxvY2tzID0gKGxvbmcp
KHRxdWFkLnF2YWwgLyAoKHVfcXVhZF90KU5GU19GQUJMS1NJWkUpKTsKLQkJdHF1YWQucXZhbCA9
IHNmcC0+c2ZfZmJ5dGVzOwotCQlzYnAtPmZfYmZyZWUgPSAobG9uZykodHF1YWQucXZhbCAvICgo
dV9xdWFkX3QpTkZTX0ZBQkxLU0laRSkpOwotCQl0cXVhZC5xdmFsID0gc2ZwLT5zZl9hYnl0ZXM7
Ci0JCXNicC0+Zl9iYXZhaWwgPSAobG9uZykodHF1YWQucXZhbCAvICgodV9xdWFkX3QpTkZTX0ZB
QkxLU0laRSkpOwotCQl0cXVhZC5xdmFsID0gc2ZwLT5zZl90ZmlsZXM7Ci0JCXNicC0+Zl9maWxl
cyA9ICh0cXVhZC5sdmFsWzBdICYgMHg3ZmZmZmZmZik7Ci0JCXRxdWFkLnF2YWwgPSBzZnAtPnNm
X2ZmaWxlczsKLQkJc2JwLT5mX2ZmcmVlID0gKHRxdWFkLmx2YWxbMF0gJiAweDdmZmZmZmZmKTsK
KwkJc2JwLT5mX2Jsb2NrcyA9IHNmcC0+c2ZfdGJ5dGVzIC8gTkZTX0ZBQkxLU0laRTsKKwkJc2Jw
LT5mX2JmcmVlID0gc2ZwLT5zZl9mYnl0ZXMgLyBORlNfRkFCTEtTSVpFOworCQkvKgorCQkgKiBB
bHRob3VnaCBzZl9hYnl0ZXMgaXMgdWludDY0X3QgYW5kIGZfYmF2YWlsIGlzIGludDY0X3QsCisJ
CSAqIHRoZSB2YWx1ZSBhZnRlciBkaXZpZGluZyBieSBORlNfRkFCTEtTSVpFIGlzIHNtYWxsCisJ
CSAqIGVub3VnaCB0aGF0IGl0IHdpbGwgZml0IGluIDYzYml0cywgc28gaXQgaXMgb2sgdG8KKwkJ
ICogYXNzaWduIGl0IHRvIGZfYmF2YWlsIHdpdGhvdXQgZmVhciB0aGF0IGl0IHdpbGwgYmVjb21l
CisJCSAqIG5lZ2F0aXZlLgorCQkgKi8KKwkJc2JwLT5mX2JhdmFpbCA9IHNmcC0+c2ZfYWJ5dGVz
IC8gTkZTX0ZBQkxLU0laRTsKKwkJc2JwLT5mX2ZpbGVzID0gc2ZwLT5zZl90ZmlsZXM7CisJCS8q
IFNpbmNlIGZfZmZyZWUgaXMgaW50NjRfdCwgY2xpcCBpdCB0byA2M2JpdHMuICovCisJCWlmIChz
ZnAtPnNmX2ZmaWxlcyA+ICh1aW50NjRfdClJTlQ2NF9NQVgpCisJCQlzYnAtPmZfZmZyZWUgPSBJ
TlQ2NF9NQVg7CisJCWVsc2UKKwkJCXNicC0+Zl9mZnJlZSA9IHNmcC0+c2ZfZmZpbGVzOwogCX0g
ZWxzZSBpZiAoKG5tcC0+bm1fZmxhZyAmIE5GU01OVF9ORlNWNCkgPT0gMCkgeworCQkvKgorCQkg
KiBUaGUgdHlwZSBjYXN0cyB0byAoaW50MzJfdCkgZW5zdXJlIHRoYXQgdGhpcyBjb2RlIGlzCisJ
CSAqIGNvbXBhdGlibGUgd2l0aCB0aGUgb2xkIE5GUyBjbGllbnQsIGluIHRoYXQgaXQgd2lsbAor
CQkgKiBzaWduIGV4dGVuZCBhIHZhbHVlIHdpdGggYml0MzEgc2V0LiBUaGlzIG1heSBvciBtYXkK
KwkJICogbm90IGJlIGNvcnJlY3QgZm9yIE5GU3YyLCBidXQgc2luY2UgaXQgaXMgYSBsZWdhY3kK
KwkJICogZW52aXJvbm1lbnQsIEknZCByYXRoZXIgcmV0YWluIGJhY2t3YXJkcyBjb21wYXRpYmls
aXR5LgorCQkgKi8KIAkJc2JwLT5mX2JzaXplID0gKGludDMyX3Qpc2ZwLT5zZl9ic2l6ZTsKIAkJ
c2JwLT5mX2Jsb2NrcyA9IChpbnQzMl90KXNmcC0+c2ZfYmxvY2tzOwogCQlzYnAtPmZfYmZyZWUg
PSAoaW50MzJfdClzZnAtPnNmX2JmcmVlOwo=
------=_Part_903922_2059190712.1304380636685--

From owner-freebsd-fs@FreeBSD.ORG  Tue May  3 03:53:01 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8FA3A106566B
	for <freebsd-fs@freebsd.org>; Tue,  3 May 2011 03:53:01 +0000 (UTC)
	(envelope-from jan.koum@gmail.com)
Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com
	[209.85.216.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 442718FC0C
	for <freebsd-fs@freebsd.org>; Tue,  3 May 2011 03:53:00 +0000 (UTC)
Received: by qwc9 with SMTP id 9so3706573qwc.13
	for <freebsd-fs@freebsd.org>; Mon, 02 May 2011 20:53:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type;
	bh=txSRA4WIgbfpHLgc0QzARaX1P4eMOkYRbqN/jX9V+Uw=;
	b=Z302K1CerNseU+Sl3nzZc5uZKAebIauARiY/uwvwQGE2kmb5w66p8YMsdSDvXEFeuS
	P7RGKIggQhBptHWNDVOuJlzL23MdQUnnGyZDcYkT0E1nLWZ5+H3IKGfBQPmnMiOva5aZ
	+x0Fi98213g/h/R+rG9amnje7B14Pz9Lrr/WM=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type;
	b=UlBb9adK0TqfbLx02MSga+Y7sGVn4sK2c96LMs3Wwtt7UBllKqCzsSqYMu8j4DHTku
	Nm0zaFBefk8sHWH+WFxAae2CqKnIDpJpWGqRkxIDrTqzX7Rky5+O0kjAq4+1heXxuqwl
	RN4jxOOjrrVTiDwyGnMkbayvIYLJHfVo6v9u0=
MIME-Version: 1.0
Received: by 10.229.43.99 with SMTP id v35mr6811819qce.8.1304394780385; Mon,
	02 May 2011 20:53:00 -0700 (PDT)
Sender: jan.koum@gmail.com
Received: by 10.229.88.73 with HTTP; Mon, 2 May 2011 20:53:00 -0700 (PDT)
In-Reply-To: <BANLkTik5tXegwoRvB7XAvpEPb385KjGEtA@mail.gmail.com>
References: <BANLkTin-qEoxxFbjJkDaA_-UZMkza08NNQ@mail.gmail.com>
	<20110502233601.GA29710@icarus.home.lan>
	<BANLkTik5tXegwoRvB7XAvpEPb385KjGEtA@mail.gmail.com>
Date: Mon, 2 May 2011 20:53:00 -0700
X-Google-Sender-Auth: WF2x4hPXiNaF4zu51BgfAtAds4w
Message-ID: <BANLkTinQt4YZiudZUSgxL0x8dJ6MJTueRw@mail.gmail.com>
From: Jan Koum <jan@whatsapp.com>
To: Adam Vande More <amvandemore@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-fs@freebsd.org, Chris Peiffer <chris@whatsapp.com>
Subject: Re: very strange IO issue with FreeBSD 8 and SSD
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 May 2011 03:53:01 -0000

On Mon, May 2, 2011 at 8:26 PM, Adam Vande More <amvandemore@gmail.com>wrote:

> On Mon, May 2, 2011 at 6:36 PM, Jeremy Chadwick <freebsd@jdc.parodius.com>wrote:
>
>> You might also try comparing iostat output to gstat output, though gstat
>> refreshes the screen continually making this a little difficult.
>>
>
> gstat -b
>


sure:

$ sudo gstat -b
dT: 1.007s  w: 1.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    0    605     33    137    0.4    572   4349   34.7   10.9  ad4
    0    605     33    137    0.4    572   4349   35.8   10.9  ad4s1
    0    620     25    149    1.0    595   4280   22.2    9.9  ad5
    0    605     33    137    0.4    572   4349   36.5   11.0  ad4s1a
    0      0      0      0    0.0      0      0    0.0    0.0  ad4s1b
    0     60     18     60    1.1     42    169    2.5    3.6  ad6
    0    817     30    121    0.2    787   5382   15.9    8.1  ad7
    0    620     25    149    1.1    595   4280   23.1   10.0  ad5a
    0     60     18     60    1.1     42    169    2.6    3.7  ad6a
    0    817     30    121    0.2    787   5382   16.5    8.1  ad7a


>
> Also top -m io may help.
>
>
doubt it.  these server only have a single process running (our app)

From owner-freebsd-fs@FreeBSD.ORG  Tue May  3 03:57:07 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 95148106566B
	for <freebsd-fs@freebsd.org>; Tue,  3 May 2011 03:57:07 +0000 (UTC)
	(envelope-from amvandemore@gmail.com)
Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com
	[209.85.161.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 212748FC0A
	for <freebsd-fs@freebsd.org>; Tue,  3 May 2011 03:57:06 +0000 (UTC)
Received: by fxm11 with SMTP id 11so5881503fxm.13
	for <freebsd-fs@freebsd.org>; Mon, 02 May 2011 20:57:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:in-reply-to:references:date
	:message-id:subject:from:to:cc:content-type;
	bh=qTxSMlN6vXOJk3W+pwtOiqXUGkK02QLbscr6+rZDdu0=;
	b=YNlV7cpQz1P9ARqvmrLet3MObTPPZvbja+23VoCA7VD0GzaSiRNxRHKCNfNyw28PA8
	YzfzXfO8VSQdMcj2rq8c7WLGSnC26YHDWb3ZhOP/BTXsfjEnbotkmtaq4awYC3lZRigd
	iK632LTc6vvIhKg3qvJ0T7LTEl7/UqNPxauBk=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=sCoHaYlsfxXVGuk2DnKeNNTphcHHxqGAbV2BhAfWVSQ7HVVWE9zsslkOnXNE0Lmc0j
	o1aKIxrZO4VGX7BsQrRF8POIXSAjNFCsDv7Rri4lNZa2ykfYhkycWRJaho35B5LxGz5E
	CfZW9ayWfkyDg/dRwCMrK5P+ilTG5LgfSfN0o=
MIME-Version: 1.0
Received: by 10.223.127.210 with SMTP id h18mr2630278fas.73.1304393198952;
	Mon, 02 May 2011 20:26:38 -0700 (PDT)
Received: by 10.223.20.145 with HTTP; Mon, 2 May 2011 20:26:38 -0700 (PDT)
In-Reply-To: <20110502233601.GA29710@icarus.home.lan>
References: <BANLkTin-qEoxxFbjJkDaA_-UZMkza08NNQ@mail.gmail.com>
	<20110502233601.GA29710@icarus.home.lan>
Date: Mon, 2 May 2011 22:26:38 -0500
Message-ID: <BANLkTik5tXegwoRvB7XAvpEPb385KjGEtA@mail.gmail.com>
From: Adam Vande More <amvandemore@gmail.com>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-fs@freebsd.org, Chris Peiffer <chris@whatsapp.com>
Subject: Re: very strange IO issue with FreeBSD 8 and SSD
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 May 2011 03:57:07 -0000

On Mon, May 2, 2011 at 6:36 PM, Jeremy Chadwick <freebsd@jdc.parodius.com>wrote:

> You might also try comparing iostat output to gstat output, though gstat
> refreshes the screen continually making this a little difficult.
>

gstat -b

Also top -m io may help.

-- 
Adam Vande More

From owner-freebsd-fs@FreeBSD.ORG  Tue May  3 04:17:21 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9240A106566C
	for <freebsd-fs@freebsd.org>; Tue,  3 May 2011 04:17:21 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta01.westchester.pa.mail.comcast.net
	(qmta01.westchester.pa.mail.comcast.net [76.96.62.16])
	by mx1.freebsd.org (Postfix) with ESMTP id 3B5138FC12
	for <freebsd-fs@freebsd.org>; Tue,  3 May 2011 04:17:20 +0000 (UTC)
Received: from omta03.westchester.pa.mail.comcast.net ([76.96.62.27])
	by qmta01.westchester.pa.mail.comcast.net with comcast
	id esHA1g0020bG4ec51sHMWi; Tue, 03 May 2011 04:17:21 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta03.westchester.pa.mail.comcast.net with comcast
	id esHK1g00M1t3BNj3PsHLJL; Tue, 03 May 2011 04:17:21 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 668359B418; Mon,  2 May 2011 21:17:18 -0700 (PDT)
Date: Mon, 2 May 2011 21:17:18 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Jan Koum <jan@whatsapp.com>
Message-ID: <20110503041718.GA34604@icarus.home.lan>
References: <BANLkTin-qEoxxFbjJkDaA_-UZMkza08NNQ@mail.gmail.com>
	<20110502233601.GA29710@icarus.home.lan>
	<BANLkTik5tXegwoRvB7XAvpEPb385KjGEtA@mail.gmail.com>
	<BANLkTinQt4YZiudZUSgxL0x8dJ6MJTueRw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <BANLkTinQt4YZiudZUSgxL0x8dJ6MJTueRw@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org, Chris Peiffer <chris@whatsapp.com>
Subject: Re: very strange IO issue with FreeBSD 8 and SSD
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 May 2011 04:17:21 -0000

On Mon, May 02, 2011 at 08:53:00PM -0700, Jan Koum wrote:
> On Mon, May 2, 2011 at 8:26 PM, Adam Vande More <amvandemore@gmail.com>wrote:
> 
> > On Mon, May 2, 2011 at 6:36 PM, Jeremy Chadwick <freebsd@jdc.parodius.com>wrote:
> >
> >> You might also try comparing iostat output to gstat output, though gstat
> >> refreshes the screen continually making this a little difficult.
> >>
> >
> > gstat -b
> >
> 
> 
> sure:
> 
> $ sudo gstat -b
> dT: 1.007s  w: 1.000s
>  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
>     0    605     33    137    0.4    572   4349   34.7   10.9  ad4
>     0    605     33    137    0.4    572   4349   35.8   10.9  ad4s1
>     0    620     25    149    1.0    595   4280   22.2    9.9  ad5
>     0    605     33    137    0.4    572   4349   36.5   11.0  ad4s1a
>     0      0      0      0    0.0      0      0    0.0    0.0  ad4s1b
>     0     60     18     60    1.1     42    169    2.5    3.6  ad6
>     0    817     30    121    0.2    787   5382   15.9    8.1  ad7
>     0    620     25    149    1.1    595   4280   23.1   10.0  ad5a
>     0     60     18     60    1.1     42    169    2.6    3.7  ad6a
>     0    817     30    121    0.2    787   5382   16.5    8.1  ad7a

To emulate "iostat 1", you will need to run this from inside of a while
loop via the shell.  E.g. in sh or bash:

while true; do gstat -b; sleep 1; done

I believe your concern point that started the thread was that
4MBytes/sec was considered bad performance.  There are indications from
your iostat output that occasionally the writes are buffered and come in
"in a burst" at 10-11MByte/sec, but your overall average is around
4-5MByte/sec.

You can test your disk I/O by simply dd'ing directly to a file on one of
the filesystems, e.g.

cd /place/where/ad5a/is/mounted
dd if=/dev/zero of=test.bin bs=64k

You can change bs to whatever value you'd like (larger or smaller), but
I tend to stick to 64k (64KBytes).  ^C when you're finished, and you'll
see overall I/O statistics.  You can run the gstat loop or iostat at the
same time if you wish.

Here's an example:

icarus# dd if=/dev/zero of=test.bin bs=64k
^C4401+0 records in
4400+0 records out
288358400 bytes transferred in 6.575845 secs (43851155 bytes/sec)

Another window running "iostat -x ada0 1":

                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ada0       0.0   0.0     0.0     0.0    0   0.0   0
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ada0       0.0   0.0     0.0     0.0    0   0.0   0
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ada0       0.0  61.9     0.0  7924.2    8  19.2  18
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ada0       1.0 334.8    15.9 42790.8    8  19.8 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ada0       1.0 338.5    15.9 43102.6    7  19.7 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ada0       2.0 335.2    31.8 42900.5    8  19.7 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ada0       1.0 336.3    15.9 43047.9    5  20.3 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ada0       1.0 331.7    15.8 42455.8    6  20.3 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ada0       2.0 366.2    31.8 42638.6    8  21.0 100
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ada0       0.0 125.7     0.0 15836.6    0  20.6  37
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ada0       0.0   0.0     0.0     0.0    0   0.0   0
^C

Controller and disk details:

ahci0: <Intel ICH9 AHCI SATA controller> port 0x1c50-0x1c57,0x1c44-0x1c47,0x1c48-0x1c4f,0x1c40-0x1c43,0x18e0-0x18ff mem 0xdc000800-0xdc000fff irq 17 at device 31.2 on pci0
ahci0: [ITHREAD]
ahci0: AHCI v1.20 with 6 3Gbps ports, Port Multiplier supported

ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich0: [ITHREAD]

ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <INTEL SSDSA2M040G2GC 2CV102M3> ATA-7 SATA 2.x device
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 38166MB (78165360 512 byte sectors: 16H 63S/T 16383C)

# camcontrol identify ada0
pass0: <INTEL SSDSA2M040G2GC 2CV102M3> ATA-7 SATA 2.x device
pass0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)

protocol              ATA/ATAPI-7 SATA 2.x
device model          INTEL SSDSA2M040G2GC
firmware revision     2CV102M3
serial number         XXX
WWN                   XXX
cylinders             16383
heads                 16
sectors/track         63
sector size           logical 512, physical 512, offset 0
LBA supported         78165360 sectors
LBA48 supported       78165360 sectors
PIO supported         PIO4
DMA supported         WDMA2 UDMA6
media RPM             non-rotating

Feature                      Support  Enabled   Value           Vendor
read ahead                     yes      yes
write cache                    yes      yes
flush cache                    yes      yes
overlap                        no
Tagged Command Queuing (TCQ)   no       no
Native Command Queuing (NCQ)   yes              32 tags
SMART                          yes      yes
microcode download             yes      yes
security                       yes      no
power management               yes      yes
advanced power management      no       no
automatic acoustic management  no       no
media status notification      no       no
power-up in Standby            no       no
write-read-verify              no       no
unload                         yes      yes
free-fall                      no       no
data set management (TRIM)     yes

I can safely say the conversation is going to immediately turn to "how
does your application work?", including people asking for full source
code and so on.  Unless I misunderstand, that's effectively what you're
asking: "why does our application perform so badly on these SSDs?"

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Tue May  3 05:49:34 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CA268106564A
	for <freebsd-fs@freebsd.org>; Tue,  3 May 2011 05:49:34 +0000 (UTC)
	(envelope-from jan.koum@gmail.com)
Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com
	[209.85.216.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 7AA328FC16
	for <freebsd-fs@freebsd.org>; Tue,  3 May 2011 05:49:34 +0000 (UTC)
Received: by qwc9 with SMTP id 9so3738454qwc.13
	for <freebsd-fs@freebsd.org>; Mon, 02 May 2011 22:49:33 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type;
	bh=r/ZIf/YbFBJ/VVKyo6rvcjQV/9L2JloX5SP+kkIj05o=;
	b=FTT3pRoT6NCkEfzXUIqhxfvjWvp7H3FjnMXpTaUGhaMvnXNg6pedR+5lvKQvG9qyjp
	ZQQ3YPz89LwnINdMEQ7PX/HOYckhnK5IOhI5p54K+TFxQ6zpQDgXhX3Q8R4RgN1gcdvO
	ItidufOC74V4XQ+wKRyNkTT0Dgmc0NOsE+5VE=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type;
	b=T9idXkogme9Ysyjr0OoTHN+z/e4Q/fyQ8g1dvXCLEqbWeypIN3uLF/1WhRNbAh8MsW
	hFvRsm4ebxmb/SULvi8D9SQsdEggT7hAMLhu98hq8T0XTyn9D98C6afRSV9M1HgBrPWo
	C0ng4Qh5W6g372S5nIWToRjRc/x26kZTKZNWg=
MIME-Version: 1.0
Received: by 10.229.17.11 with SMTP id q11mr6829607qca.46.1304401772514; Mon,
	02 May 2011 22:49:32 -0700 (PDT)
Sender: jan.koum@gmail.com
Received: by 10.229.88.73 with HTTP; Mon, 2 May 2011 22:49:32 -0700 (PDT)
In-Reply-To: <20110503041718.GA34604@icarus.home.lan>
References: <BANLkTin-qEoxxFbjJkDaA_-UZMkza08NNQ@mail.gmail.com>
	<20110502233601.GA29710@icarus.home.lan>
	<BANLkTik5tXegwoRvB7XAvpEPb385KjGEtA@mail.gmail.com>
	<BANLkTinQt4YZiudZUSgxL0x8dJ6MJTueRw@mail.gmail.com>
	<20110503041718.GA34604@icarus.home.lan>
Date: Mon, 2 May 2011 22:49:32 -0700
X-Google-Sender-Auth: n3DPmuseieyxkLVkWuok2ntXZ_c
Message-ID: <BANLkTinchOrXFo+7RqV9-pf_2zFoBtVdeQ@mail.gmail.com>
From: Jan Koum <jan@whatsapp.com>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-fs@freebsd.org, Chris Peiffer <chris@whatsapp.com>
Subject: Re: very strange IO issue with FreeBSD 8 and SSD
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 May 2011 05:49:34 -0000

On Mon, May 2, 2011 at 9:17 PM, Jeremy Chadwick <freebsd@jdc.parodius.com>wrote:

>
> To emulate "iostat 1", you will need to run this from inside of a while
> loop via the shell.  E.g. in sh or bash:
>
> while true; do gstat -b; sleep 1; done
>
>
sure:

$ sudo gstat -b | head -2 ; while true; do sudo gstat -b | grep 'a$'; sleep
1; echo; done
dT: 1.009s  w: 1.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
  258     56     16     42    0.2     40    312    2.2    1.0  ad4s1a
  288     76     20     81    0.2     57    387    4.0    1.2  ad5a
  255    208     28     76    0.4    180   1977   12.1    3.1  ad6a
  276     83     26    139    0.5     58    499    6.2    3.1  ad7a

    0     17     16     40    0.2      1      4    0.2    0.4  ad4s1a
    0     30     28     95    5.4      2     20    0.2   15.1  ad5a
    0   2943     30    139   17.9   2913  46257  261.6   40.5  ad6a
    0     24     23     82    0.2      1      4    1.6    0.6  ad7a

    0    791     30    137    0.5    762   6897   24.2   16.1  ad4s1a
    0    858     18     68    0.2    840   8261   35.7   16.7  ad5a
    0   1308     18     46    1.7   1290  13023   25.5   22.1  ad6a
    0    791     21    113    1.5    771   7320   19.8   21.3  ad7a

    0   3152     26     77   18.1   3126  46089  236.0   44.0  ad4s1a
    0    385     30    109   10.6    355   2420   11.4   28.1  ad5a
    0   1263     25    107   11.5   1239   7172   37.3   27.8  ad6a
  696    761     32    159   12.2    730   4510   22.5   31.1  ad7a

    0    456     26     76    0.4    430   1892   19.0    9.4  ad4s1a
    0    616     14     36    0.2    602   4971   20.3    8.6  ad5a
    0    811     14     46    0.3    797   6186   27.0   10.4  ad6a
    0    207     19     58    2.1    188   2982   25.2   10.3  ad7a

  313    467     20     76    0.2    447   3834   19.2    4.6  ad4s1a
   10     33     17     96    0.2     16    123   82.7    8.8  ad5a
    3     32     16     62    0.2     16     98    0.3    0.6  ad6a
    1     40     20     52    0.2     20    223    0.3    0.7  ad7a

  151   1624     18     77   51.6   1606  10039  106.3   69.1  ad4s1a
   25    232      8     22   95.1    224   3565   94.4   64.5  ad5a
    0    868     15     48    0.2    854   7438   20.7   17.7  ad6a
    0    821     11     73    1.2    810   8846   26.3   17.1  ad7a


> I believe your concern point that started the thread was that
> 4MBytes/sec was considered bad performance.


sorry, not quite...  i am not judging "performance" - what i am trying to
get to the bottom of is why in the world would 500KB of file updates
(write/append) per second would generate so much IO


> There are indications from
> your iostat output that occasionally the writes are buffered and come in
> "in a burst" at 10-11MByte/sec, but your overall average is around
> 4-5MByte/sec.
>
>

we see higher averages, but OK -- don't think you 4-5MB/sec is still way too
high for the little IO application is doing?


(dd doesn't really reproduce the real life usage of filesystem with multiple
directories and threads using the underlying fs)


> I can safely say the conversation is going to immediately turn to "how
> does your application work?", including people asking for full source
> code and so on.


it is a very very very simple app built on top of erlang file module:
http://www.erlang.org/doc/man/file.html


> Unless I misunderstand, that's effectively what you're
> asking: "why does our application perform so badly on these SSDs?"
>
>
not really.  what i am asking is: why is there so much IO overhead?  where
is it coming from?

From owner-freebsd-fs@FreeBSD.ORG  Tue May  3 06:22:06 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A29D6106566C;
	Tue,  3 May 2011 06:22:06 +0000 (UTC)
	(envelope-from linimon@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 79AEA8FC13;
	Tue,  3 May 2011 06:22:06 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p436M6GP093746;
	Tue, 3 May 2011 06:22:06 GMT
	(envelope-from linimon@freefall.freebsd.org)
Received: (from linimon@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p436M66m093742;
	Tue, 3 May 2011 06:22:06 GMT (envelope-from linimon)
Date: Tue, 3 May 2011 06:22:06 GMT
Message-Id: <201105030622.p436M66m093742@freefall.freebsd.org>
To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org
From: linimon@FreeBSD.org
Cc: 
Subject: Re: kern/156781: [zfs] zfs is losing the snapshot directory,
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 May 2011 06:22:06 -0000

Old Synopsis: zfs is loosing the snapshot directory,
New Synopsis: [zfs] zfs is losing the snapshot directory,

Responsible-Changed-From-To: freebsd-bugs->freebsd-fs
Responsible-Changed-By: linimon
Responsible-Changed-When: Tue May 3 06:21:27 UTC 2011
Responsible-Changed-Why: 
reclassify.

http://www.freebsd.org/cgi/query-pr.cgi?pr=156781

From owner-freebsd-fs@FreeBSD.ORG  Tue May  3 08:30:56 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 56A621065674
	for <fs@FreeBSD.org>; Tue,  3 May 2011 08:30:56 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au
	[211.29.132.184])
	by mx1.freebsd.org (Postfix) with ESMTP id E77168FC08
	for <fs@FreeBSD.org>; Tue,  3 May 2011 08:30:55 +0000 (UTC)
Received: from c122-106-155-58.carlnfd1.nsw.optusnet.com.au
	(c122-106-155-58.carlnfd1.nsw.optusnet.com.au [122.106.155.58])
	by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	p438UpOZ008619
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 3 May 2011 18:30:52 +1000
Date: Tue, 3 May 2011 18:30:51 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Rick Macklem <rmacklem@uoguelph.ca>
In-Reply-To: <2119325179.903923.1304380636687.JavaMail.root@erie.cs.uoguelph.ca>
Message-ID: <20110503174200.V1050@besplex.bde.org>
References: <2119325179.903923.1304380636687.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: fs@FreeBSD.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 May 2011 08:30:56 -0000

On Mon, 2 May 2011, Rick Macklem wrote:

> I have attached a version of the patch that I intend to commit
> unless it doesn't work for Kostik's test case. Kostik, could
> you please test this one.
>
> Yes, Bruce, I realize you won't like it, but I
> have put some comments in it
> to try and clarify why it is coded the way it is.
> (The arithmetic seems to work the way I would expect it to for
> i386, which is the only arch I have for testing.)

Sigh.

% --- fs/nfsclient/nfs_clport.c.sav	2011-04-30 20:16:39.000000000 -0400
% +++ fs/nfsclient/nfs_clport.c	2011-05-02 19:32:31.000000000 -0400
% @@ -838,21 +838,33 @@ void
%  nfscl_loadsbinfo(struct nfsmount *nmp, struct nfsstatfs *sfp, void *statfs)
%  {
%  	struct statfs *sbp = (struct statfs *)statfs;
% -	nfsquad_t tquad;
% 
%  	if (nmp->nm_flag & (NFSMNT_NFSV3 | NFSMNT_NFSV4)) {
%  		sbp->f_bsize = NFS_FABLKSIZE;
% -		tquad.qval = sfp->sf_tbytes;
% -		sbp->f_blocks = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE));
% -		tquad.qval = sfp->sf_fbytes;
% -		sbp->f_bfree = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE));
% -		tquad.qval = sfp->sf_abytes;
% -		sbp->f_bavail = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE));
% -		tquad.qval = sfp->sf_tfiles;
% -		sbp->f_files = (tquad.lval[0] & 0x7fffffff);
% -		tquad.qval = sfp->sf_ffiles;
% -		sbp->f_ffree = (tquad.lval[0] & 0x7fffffff);
% +		sbp->f_blocks = sfp->sf_tbytes / NFS_FABLKSIZE;
% +		sbp->f_bfree = sfp->sf_fbytes / NFS_FABLKSIZE;
% +		/*
% +		 * Although sf_abytes is uint64_t and f_bavail is int64_t,
% +		 * the value after dividing by NFS_FABLKSIZE is small
% +		 * enough that it will fit in 63bits, so it is ok to
% +		 * assign it to f_bavail without fear that it will become
% +		 * negative.
% +		 */
% +		sbp->f_bavail = sfp->sf_abytes / NFS_FABLKSIZE;
% +		sbp->f_files = sfp->sf_tfiles;
% +		/* Since f_ffree is int64_t, clip it to 63bits. */
% +		if (sfp->sf_ffiles > (uint64_t)INT64_MAX)

This cast has no effect.  INT64_MAX has type int64_t.  sf_ffiles has
uint64_t.  The default binary promotions cause both types to be promoted
to the minimally larger common type.  This type is uint64_t.  Thus
INT64_MAX is converted automatically to the correct type.

% +			sbp->f_ffree = INT64_MAX;
% +		else
% +			sbp->f_ffree = sfp->sf_ffiles;
%  	} else if ((nmp->nm_flag & NFSMNT_NFSV4) == 0) {
% +		/*
% +		 * The type casts to (int32_t) ensure that this code is
% +		 * compatible with the old NFS client, in that it will
% +		 * sign extend a value with bit31 set. This may or may
% +		 * not be correct for NFSv2, but since it is a legacy
% +		 * environment, I'd rather retain backwards compatibility.
% +		 */
%  		sbp->f_bsize = (int32_t)sfp->sf_bsize;
%  		sbp->f_blocks = (int32_t)sfp->sf_blocks;
%  		sbp->f_bfree = (int32_t)sfp->sf_bfree;

It won't sign extend, but will propagate bit31 as an unsigned bit.  For
example, sfp->sf_blocks = 0x80000000 becomes sbp->f_blocks =
0xFFFFFFFF80000000, which is massively different.  Again, omitting the
cast gives the correct result if the wire insists on its values being
unsigned.

The result is only backwards compatible with relatively recent FreeBSD
nfs clients.  All FreeBSD clients are completely broken if bit31 is
set, and compatibility with this brokenness is not useful (but as I
pointed out in another reply, we would never have seen the broken case
when the old clients weren't old, since it takes a server file system
size of about 32TB for bit 31 to be set).  The details of the brokenness
vary:

Net/2, FreeBSD-1, 4.4BSD-Lite, FreeBSD-[2-4]:
    f_blocks was plain long:
       if long is 32 bits, then sfp->sf_blocks = 0x80000000 becomes
         sbp->f_blocks = -0x7fffffff - 1 (LONG_MIN)
       if long is 64 bits, then sfp->sf_blocks = 0x80000000 becomes
         sbp->f_blocks = -0x80000000L (INT32_MIN (same as 32-bit LONG_MIN)

FreeBSD-current after 2003/11/12, FreeBSD-[5-9]:
    f_blocks is now uint64_t:
      changing it (and others from a signed type to an unsigned type mainly
      gave lots of sign extension bugs, including here.  The bugs remain
      mostly unfixed.
        sfp->sf_blocks = 0x80000000 becomes
          sbp->f_blocks = 0xFFFFFFFF80000000 ((uint64_t)INT32_MIN) on all arches.

Neither of the garbage values INT32_MIN, ((uint64_t)INT_MIN) gives useful
behaviour.  The former is negative, though the wire value cannot be negative
(not sure about this for v2).  Applications that are naive enough to believe
this value should assume that the the file system has a negative size and
never try to write anything.  The latter is enormous and positive.  If the
wire count really is 0x80000000, then that is already very large, so
believing that the value is 0xFFFFFFFF80000000 should make little difference.

The bugs are a little different for signed fields like f_bavail.  Now there
are no sign extension bugs or version-dependent misbehaviours.  There are
just overflow bugs in the bogus casts.  (int32_t)0x80000000 overflows to
INT32_MIN (only on 2's complement machines, but no others are supported),
and assignment to sbp->f_bavail doesn't change this garbage value.  Now
the bugs are even further off, since it takes about a 400 TB ffs server file
system to reach them.  (400 TB with 8% minfree gives a 32TB reserve for
root.  After using all 32TB of this reserve, there would be -32TB available
for non-root.  -32TB is INT32_MIN in 16K-blocks.)

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Tue May  3 08:34:55 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BC5641065674
	for <freebsd-fs@freebsd.org>; Tue,  3 May 2011 08:34:55 +0000 (UTC)
	(envelope-from kraduk@gmail.com)
Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50])
	by mx1.freebsd.org (Postfix) with ESMTP id 0A6898FC26
	for <freebsd-fs@freebsd.org>; Tue,  3 May 2011 08:34:54 +0000 (UTC)
Received: by wwc33 with SMTP id 33so6503618wwc.31
	for <multiple recipients>; Tue, 03 May 2011 01:34:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:in-reply-to:references:date
	:message-id:subject:from:to:cc:content-type;
	bh=WdgJSuhSTDqLei4B/39H4xwb9Wl7nLruSYT8vuzO0Is=;
	b=KjCRLTVIF+4c5L9vdgl6eUxpX2Q2T7C8ln2fCBbDlWUO/tbzHlk7BtFGUM0RJhtuZk
	mb2W/BbqNRy7u7f0lQESPdKzIuv6hJgVjTaVuYnzK/3GFl6yw0OQQYPtrvNTQLFdebOM
	Ofzdt23Ig0oYHPYHSaAUkypBYuvUJZugUD1iM=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=vInYMIjqEhle9IYzx5GRHQDJulkmor+faFpXo/rwaCn4G8EfYiTAobJWT90VBzusrZ
	PpjFakkK54hBKsFe+ACDDFDnvTv0imxnvQ45Nu00FrVtm/l63rKD9MZdwYIvwiu9Wm4K
	O85wkEKJ//Eu/wUZ7jJM328paZ4iCr+f2Ce0Q=
MIME-Version: 1.0
Received: by 10.216.143.74 with SMTP id k52mr8655756wej.0.1304411693679; Tue,
	03 May 2011 01:34:53 -0700 (PDT)
Received: by 10.216.15.73 with HTTP; Tue, 3 May 2011 01:34:53 -0700 (PDT)
In-Reply-To: <BANLkTinJmdsjoRfj-4VBOSo2frj9b85q_g@mail.gmail.com>
References: <op.vn2iid1qk84lxj@arrow> <20110501133627.00006616@unknown>
	<BANLkTinJmdsjoRfj-4VBOSo2frj9b85q_g@mail.gmail.com>
Date: Tue, 3 May 2011 09:34:53 +0100
Message-ID: <BANLkTikywHSjwo=so6c7=Nkh7c8QcdLZNw@mail.gmail.com>
From: krad <kraduk@gmail.com>
To: ambrosehuang ambrose <ambrosehua@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-fs@freebsd.org, Alexander Leidinger <Alexander@leidinger.net>,
	dfr@freebsd.org, Emil Smolenski <am@raisa.eu.org>
Subject: Re: [ZFS] Booting from zpool created on 4k-sector drive
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 May 2011 08:34:55 -0000

On 2 May 2011 01:47, ambrosehuang ambrose <ambrosehua@gmail.com> wrote:

> Here is my trick:
>       1 Download the ZFS V28 patch for 8-stable,
>       2 patch the 8-stable ,
>       3 make buildkernel,
>       4 then you will get gptzfsboot, zfsloader, pmbr
>       5 install pmbr according to wiki/GPTboot
>       6 replace your old gptzfsboot, zfsloader with new ones;
>       then you can work around this. It works for me( 3 WD10ears +
> ZFS V15 + 8-stable)
>
> 2011/5/1 Alexander Leidinger <Alexander@leidinger.net>:
> > On Tue, 21 Dec 2010 15:29:01 +0100 "Emil Smolenski" <am@raisa.eu.org>
> > wrote:
> >
> >> Hello,
> >>
> >> There is a hack to force zpool creation with minimum sector size
> >> equal to 4k:
> >>
> >> # gnop create -S 4096 ${DEV0}
> >> # zpool create tank ${DEV0}.nop
> >> # zpool export tank
> >> # gnop destroy ${DEV0}.nop
> >> # zpool import tank
> >>
> >> Zpool created this way is much faster on problematic 4k sector
> >> drives which lies about its sector size (like WD EARS). This hack
> >> works perfectly fine when system is running. Gnop layer is created
> >> only for "zpool create" command -- ZFS stores information about
> >> sector size in its metadata. After zpool creation one can export the
> >> pool, remove gnop layer and reimport the pool. Difference can be seen
> >> in the output from the zdb command:
> >>
> >> - on 512 sector device (2**9 = 512):
> >> % zdb tank |grep ashift
> >> ashift=9
> >>
> >> - on 4096 sector device (2**12 = 4096):
> >> % zdb tank |grep ashift
> >> ashift=12
> >>
> >> This change is permanent. The only possibility to change the value
> >> of ashift is: zpool destroy/create and restoring pool from backup.
> >>
> >> But there is one problem: I cannot boot from such pool. Error message:
> >>
> >> ZFS: i/o error - all block copies unavailable
> >> ZFS: can't read MOS
> >> ZFS: unexpected object set type 0
> >
> > FYI: I can boot successfully from a ZFS v28 pool which was created like
> > this in a GPT partition (tested with 9-current).
> >
> > Bye,
> > Alexander.
> >
> > --
> > http://www.Leidinger.net    Alexander @ Leidinger.net: PGP ID = B0063FE7
> > http://www.FreeBSD.org       netchild @ FreeBSD.org  : PGP ID = 72077137
> > _______________________________________________
> > freebsd-fs@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
> >
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>


or grab these prebuilt boot blocks and install them

http://people.freebsd.org/~pjd/zfsboot/

worked for me a treat with exactly the problem you have

From owner-freebsd-fs@FreeBSD.ORG  Tue May  3 09:18:10 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DA1CF106566C;
	Tue,  3 May 2011 09:18:10 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au
	[211.29.132.188])
	by mx1.freebsd.org (Postfix) with ESMTP id 5DEF18FC15;
	Tue,  3 May 2011 09:18:09 +0000 (UTC)
Received: from c122-106-155-58.carlnfd1.nsw.optusnet.com.au
	(c122-106-155-58.carlnfd1.nsw.optusnet.com.au [122.106.155.58])
	by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	p439I404014489
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 3 May 2011 19:18:06 +1000
Date: Tue, 3 May 2011 19:18:04 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Rick Macklem <rmacklem@uoguelph.ca>
In-Reply-To: <1040257715.898126.1304369225601.JavaMail.root@erie.cs.uoguelph.ca>
Message-ID: <20110503183651.L1224@besplex.bde.org>
References: <1040257715.898126.1304369225601.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: rmacklem@FreeBSD.org, fs@FreeBSD.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 May 2011 09:18:10 -0000

On Mon, 2 May 2011, Rick Macklem wrote:

>>> I'll try and make my Solaris10 box get to -ve frees and then see
>>> what
>>> it puts on the wire. After that, I'll start a discussion on
>>> freebsd-fs@
>>> about how they think a FreeBSD server should behave when f_bavail
>>> and/or
>>> f_ffree are negative.
>>
>> The result on Solaris would be interesting. Does Solaris still support
>> ffs? You said later that you couldn't get it to generate negative
>> values.
>>
> Well, I just did the reverse (ran a FreeBSD FFS disk out of space so
> it reported a -ve free and mounted in on Solaris10). Here are the
> "df" outputs (I used "df -k" on Solaris, since that's a compatible format):

That is almost as good a test.

> FreeBSD-current server (nfsv4-newlap):
> Filesystem  1K-blocks    Used   Avail Capacity  Mounted on
> /dev/ad4s3a   2026030  671492 1192456    36%    /
> devfs               1       1       0   100%    /dev
> /dev/ad4s3e   4697030 4544054 -222786   105%    /sub1
> /dev/ad4s3d   5077038  641462 4029414    14%    /usr
>
> Solaris10 client:
> Filesystem            kbytes    used   avail capacity  Mounted on
> /dev/dsk/c0d0s0      3870110 2790938 1040471    73%    /
> /devices                   0       0       0     0%    /devices
> ctfs                       0       0       0     0%    /system/contract
> proc                       0       0       0     0%    /proc
> mnttab                     0       0       0     0%    /etc/mnttab
> swap                  975736     624  975112     1%    /etc/svc/volatile
> objfs                      0       0       0     0%    /system/object
> /usr/lib/libc/libc_hwcap1.so.1 3870110 2790938 1040471    73%    /lib/libc.so.1
> fd                         0       0       0     0%    /dev/fd
> swap                  975112       0  975112     0%    /tmp
> swap                  975140      28  975112     1%    /var/run
> /dev/dsk/c0d0s7      5608190 4118091 1434018    75%    /export/home
> nfsv4-newlap:/sub1   4697030 4544054 18014398509259198     1%    /mnt
>
> as you can see, Solaris10 doesn't assume it's negative and
> reports lottsa avail.
>
> I don't have a Linux client handy, so I can't do the same test
> with Linux, rick

I looked at linux-2.6.10 code.  It doesn't do anything good for signed
counts, and declares f_bavail with a bad mixture of arch-dependent types
-- int, s32, u32, __u32, long, u64, __u64 (but no s64 :-).  It does 1
nearby thing better: instead of a fixed blocksize of NFS_FABLKSIZE = 512
for nfs, the blocksize is a parameter, and in scaling by this it is
careful to round up.

NetBSD is best.  Its statvfs at least has full support for handling this
problem.  From a 2004 version of NetBSD statvfs.h:

% struct statvfs {
% 	unsigned long	f_flag;		/* copy of mount exported flags */
% 	unsigned long	f_bsize;	/* file system block size */
% 	unsigned long	f_frsize;	/* fundamental file system block size */
% 	unsigned long	f_iosize;	/* optimal file system block size */
% 
% 	fsblkcnt_t	f_blocks;	/* number of blocks in file system, */
% 					/*   (in units of f_frsize) */
% 	fsblkcnt_t	f_bfree;	/* free blocks avail in file system */
% 	fsblkcnt_t	f_bavail;	/* free blocks avail to non-root */
% 	fsblkcnt_t	f_bresvd;	/* blocks reserved for root */

statvfs is specified by POSIX, and I previously mentioned that POSIX is
quite broken in this area.  One of the bugs is that all the POSIX block
count types like fsblkcnt_t in the above are specified to be unsigned.
Thus negative block counts cannot be supported directly using these types,
even if the OS has negative block counts.  In the above, NetBSD works
around this by having an extension giving a nonnegative block count for
the blocks reserved for root.  statfs should have used this instead of
a hack involving negative counts, but presumably didn't to avoid changing
the ABI.  Even NetBSD doesn't have this extension for statfs, at least
in 2004.  statfs(2) was apparently deprecated in NetBSD before 2004, with
newer features only going into statvfs(2).

% 
% 	fsfilcnt_t	f_files;	/* total file nodes in file system */
% 	fsfilcnt_t	f_ffree;	/* free file nodes in file system */
% 	fsfilcnt_t	f_favail;	/* free file nodes avail to non-root */
% 	fsfilcnt_t	f_fresvd;	/* file nodes reserved for root */

Similarly.

% 
% 	uint64_t  	f_syncreads;	/* count of sync reads since mount */
% 	uint64_t  	f_syncwrites;	/* count of sync writes since mount */
% 
% 	uint64_t  	f_asyncreads;	/* count of async reads since mount */
% 	uint64_t  	f_asyncwrites;	/* count of async writes since mount */
% 
% 	fsid_t		f_fsidx;	/* NetBSD compatible fsid */
% 	unsigned long	f_fsid;		/* Posix compatible fsid */
% 	unsigned long	f_namemax;	/* maximum filename length */
% 	uid_t		f_owner;	/* user that mounted the file system */
% 
% 	uint32_t	f_spare[4];	/* spare space */
% 
% 	char	f_fstypename[_VFS_NAMELEN]; /* fs type name */
% 	char	f_mntonname[_VFS_MNAMELEN];  /* directory on which mounted */
% 	char	f_mntfromname[_VFS_MNAMELEN];  /* mounted file system */
% 
% };

As I said before, NetBSD's nfs tries to make this work for nfs, but I
couldn't this worked in NetBSD or anything I could think of, since the
extension is not in the nfs protocol.  Now I think it does work, but
still can't see how.  Details: NetBSD puts f_bavail on the wire without
clamping it (it just scales it).  Now I think f_bavail is never negative
in NetBSD, so this scaling doesn't involves the usual sign extension
and overflow bugs, or abuse of the top bit.  The client zaps negative
values for v3 f_bavail but not for other things, and initializes f_bresvd:
from a 2005 version ofs nfs_vfsops.c:

% 	if (v3) {
% 		sbp->f_frsize = sbp->f_bsize = NFS_FABLKSIZE;
% 		tquad = fxdr_hyper(&sfp->sf_tbytes);
% 		sbp->f_blocks = ((quad_t)tquad / (quad_t)NFS_FABLKSIZE);
% 		tquad = fxdr_hyper(&sfp->sf_fbytes);
% 		sbp->f_bfree = ((quad_t)tquad / (quad_t)NFS_FABLKSIZE);
% 		tquad = fxdr_hyper(&sfp->sf_abytes);
% 		tquad = ((quad_t)tquad / (quad_t)NFS_FABLKSIZE);
% 		sbp->f_bresvd = sbp->f_bfree - tquad;

I still can't see how this initialization works.  f_bresvd has to end
up as nonzero if root has a reserve, and drop to zero as the reserve
is used up.  sf_fbytes - sf_abytes must give this reserve.

% 		sbp->f_bavail = tquad;
% #ifdef COMPAT_20
% 		/* Handle older NFS servers returning negative values */
% 		if ((quad_t)sbp->f_bavail < 0)
% 			sbp->f_bavail = 0;
% #endif

NetBSD's own server puts f_bavail on the wire unchanged except for scaling,
so it is now clear that f_bavail is never negative in NetBSD.

% 		tquad = fxdr_hyper(&sfp->sf_tfiles);
% 		sbp->f_files = tquad;
% 		tquad = fxdr_hyper(&sfp->sf_ffiles);
% 		sbp->f_ffree = tquad;
% 		sbp->f_favail = tquad;

"Negative" values for this are not zapped.

% 		sbp->f_fresvd = 0;

This reserv is not really supported.  Supporting it is impossible since
there is not as much redundancy in the wire values for the file counts
as for the block counts.

% 		sbp->f_namemax = MAXNAMLEN;
% 	} else {
% 		sbp->f_bsize = NFS_FABLKSIZE;
% 		sbp->f_frsize = fxdr_unsigned(int32_t, sfp->sf_bsize);
% 		sbp->f_blocks = fxdr_unsigned(int32_t, sfp->sf_blocks);
% 		sbp->f_bfree = fxdr_unsigned(int32_t, sfp->sf_bfree);
% 		sbp->f_bavail = fxdr_unsigned(int32_t, sfp->sf_bavail);

Still has old bugs.

% 		sbp->f_fresvd = 0;
% 		sbp->f_files = 0;
% 		sbp->f_ffree = 0;
% 		sbp->f_favail = 0;
% 		sbp->f_fresvd = 0;
% 		sbp->f_namemax = MAXNAMLEN;
% 	}

Next steps: someone should look at why there are 3 nfsv3 protocol
fields for the block counts when only 2 are strictly needed.

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Tue May  3 11:48:44 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5180F106566C;
	Tue,  3 May 2011 11:48:44 +0000 (UTC)
	(envelope-from alexander@leidinger.net)
Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de
	[217.11.53.44])
	by mx1.freebsd.org (Postfix) with ESMTP id 01CAB8FC21;
	Tue,  3 May 2011 11:48:43 +0000 (UTC)
Received: from outgoing.leidinger.net (p5B155A42.dip.t-dialin.net
	[91.21.90.66])
	by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id D183F844017;
	Tue,  3 May 2011 13:48:29 +0200 (CEST)
Received: from webmail.leidinger.net (webmail.Leidinger.net
	[IPv6:fd73:10c7:2053:1::2:102])
	by outgoing.leidinger.net (Postfix) with ESMTP id 15D5311C5;
	Tue,  3 May 2011 13:48:27 +0200 (CEST)
Received: (from www@localhost)
	by webmail.leidinger.net (8.14.4/8.13.8/Submit) id p43BmQpR006371;
	Tue, 3 May 2011 13:48:26 +0200 (CEST)
	(envelope-from Alexander@Leidinger.net)
Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by
	webmail.leidinger.net (Horde Framework) with HTTP; Tue, 03 May 2011
	13:48:26 +0200
Message-ID: <20110503134826.712070yt2urhxp8g@webmail.leidinger.net>
Date: Tue, 03 May 2011 13:48:26 +0200
From: Alexander Leidinger <Alexander@Leidinger.net>
To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
References: <4DBBB20A.5050102@FreeBSD.org>
	<20110430072831.GA65598@icarus.home.lan>
	<20110501000656.00007ea1@unknown>
	<20110501133752.GC3245@garage.freebsd.pl>
In-Reply-To: <20110501133752.GC3245@garage.freebsd.pl>
MIME-Version: 1.0
Content-Type: text/plain;
 charset=UTF-8;
 DelSp="Yes";
 format="flowed"
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.6)
X-EBL-MailScanner-Information: Please contact the ISP for more information
X-EBL-MailScanner-ID: D183F844017.AF0FB
X-EBL-MailScanner: Found to be clean
X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN,
	SpamAssassin (not cached, score=0, required 6, autolearn=disabled)
X-EBL-MailScanner-From: alexander@leidinger.net
X-EBL-MailScanner-Watermark: 1305028111.44812@DGdeTBgXehAN7f2t7b6JVg
X-EBL-Spam-Status: No
Cc: freebsd-fs@FreeBSD.org, Alexander Motin <mav@FreeBSD.org>
Subject: Re: TRIM clustering
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 May 2011 11:48:44 -0000

Quoting Pawel Jakub Dawidek <pjd@FreeBSD.org> (from Sun, 1 May 2011  
15:37:52 +0200):

> On Sun, May 01, 2011 at 12:06:56AM +0200, Alexander Leidinger wrote:
>> On Sat, 30 Apr 2011 00:28:31 -0700 Jeremy Chadwick
>> <freebsd@jdc.parodius.com> wrote:
>>
>> > On Sat, Apr 30, 2011 at 09:54:02AM +0300, Alexander Motin wrote:
>>
>> > Other notes: TRIM needs to be supported on swap as well, and in my
>> > opinion this is just as important as it being in UFS.  I'm not sure
>> > how one would implement that.
>>
>> This brings up the question if a ZFS cache (where the contents do not
>> survive a reboot) is completely TRIMmed before used (and normally
>> trimmed during use)...
>
> It is not trimmed at all.

This does not sound like the optimal solution... is there a way to  
know the first access after boot/attach to a cache device? If yes,  
would it be possible to TRIM the complete provider (except for some  
static data which needs to be there) from this place? This would not  
solve the not TRIMmed during use part, put at least a reboot/reattach  
could provide a sane state.

Bye,
Alexander.

-- 
BOFH excuse #189:

SCSI's too wide

http://www.Leidinger.net    Alexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org       netchild @ FreeBSD.org  : PGP ID = 72077137

From owner-freebsd-fs@FreeBSD.ORG  Tue May  3 14:07:47 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3B468106564A
	for <freebsd-fs@freebsd.org>; Tue,  3 May 2011 14:07:47 +0000 (UTC)
	(envelope-from ticso@cicely7.cicely.de)
Received: from raven.bwct.de (raven.bwct.de [85.159.14.73])
	by mx1.freebsd.org (Postfix) with ESMTP id 96DAC8FC0A
	for <freebsd-fs@freebsd.org>; Tue,  3 May 2011 14:07:46 +0000 (UTC)
Received: from mail.cicely.de ([10.1.1.37])
	by raven.bwct.de (8.13.4/8.13.4) with ESMTP id p43Db59L039034
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);
	Tue, 3 May 2011 15:37:06 +0200 (CEST)
	(envelope-from ticso@cicely7.cicely.de)
Received: from cicely7.cicely.de (cicely7.cicely.de [10.1.1.9])
	by mail.cicely.de (8.14.4/8.14.4) with ESMTP id p43DapMk069143
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 3 May 2011 15:36:51 +0200 (CEST)
	(envelope-from ticso@cicely7.cicely.de)
Received: from cicely7.cicely.de (localhost [127.0.0.1])
	by cicely7.cicely.de (8.14.2/8.14.2) with ESMTP id p43DPHG8002183;
	Tue, 3 May 2011 15:25:17 +0200 (CEST)
	(envelope-from ticso@cicely7.cicely.de)
Received: (from ticso@localhost)
	by cicely7.cicely.de (8.14.2/8.14.2/Submit) id p43DPHgm002182;
	Tue, 3 May 2011 15:25:17 +0200 (CEST) (envelope-from ticso)
Date: Tue, 3 May 2011 15:25:17 +0200
From: Bernd Walter <ticso@cicely7.cicely.de>
To: Alexander Leidinger <Alexander@Leidinger.net>
Message-ID: <20110503132517.GF1549@cicely7.cicely.de>
References: <4DBBB20A.5050102@FreeBSD.org>
	<20110430072831.GA65598@icarus.home.lan>
	<20110501000656.00007ea1@unknown>
	<20110501133752.GC3245@garage.freebsd.pl>
	<20110503134826.712070yt2urhxp8g@webmail.leidinger.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20110503134826.712070yt2urhxp8g@webmail.leidinger.net>
X-Operating-System: FreeBSD cicely7.cicely.de 7.0-STABLE i386
User-Agent: Mutt/1.5.11
X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED=-1, BAYES_00=-1.9,
	T_RP_MATCHES_RCVD=-0.01 autolearn=unavailable version=3.3.0
X-Spam-Checker-Version: SpamAssassin 3.3.0 (2010-01-18) on spamd.cicely.de
Cc: freebsd-fs@freebsd.org, Alexander Motin <mav@freebsd.org>,
	Pawel Jakub Dawidek <pjd@freebsd.org>
Subject: Re: TRIM clustering
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: ticso@cicely.de
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 May 2011 14:07:47 -0000

On Tue, May 03, 2011 at 01:48:26PM +0200, Alexander Leidinger wrote:
> Quoting Pawel Jakub Dawidek <pjd@FreeBSD.org> (from Sun, 1 May 2011  
> 15:37:52 +0200):
> 
> >On Sun, May 01, 2011 at 12:06:56AM +0200, Alexander Leidinger wrote:
> >>On Sat, 30 Apr 2011 00:28:31 -0700 Jeremy Chadwick
> >><freebsd@jdc.parodius.com> wrote:
> >>
> >>> On Sat, Apr 30, 2011 at 09:54:02AM +0300, Alexander Motin wrote:
> >>
> >>> Other notes: TRIM needs to be supported on swap as well, and in my
> >>> opinion this is just as important as it being in UFS.  I'm not sure
> >>> how one would implement that.
> >>
> >>This brings up the question if a ZFS cache (where the contents do not
> >>survive a reboot) is completely TRIMmed before used (and normally
> >>trimmed during use)...
> >
> >It is not trimmed at all.
> 
> This does not sound like the optimal solution... is there a way to  
> know the first access after boot/attach to a cache device? If yes,  
> would it be possible to TRIM the complete provider (except for some  
> static data which needs to be there) from this place? This would not  
> solve the not TRIMmed during use part, put at least a reboot/reattach  
> could provide a sane state.

What would be the possible benefit?
I mean it's just until the device is filled, which won't happen that
regular in environments where cache devices make sense.
More interesting would be to have the cached data reboot persistent
one day instead of TRIMing it.

-- 
B.Walter <bernd@bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.

From owner-freebsd-fs@FreeBSD.ORG  Tue May  3 14:33:43 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9CD8C106566C;
	Tue,  3 May 2011 14:33:43 +0000 (UTC)
	(envelope-from alexander@leidinger.net)
Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de
	[217.11.53.44])
	by mx1.freebsd.org (Postfix) with ESMTP id 2BDB48FC19;
	Tue,  3 May 2011 14:33:42 +0000 (UTC)
Received: from outgoing.leidinger.net (p5B155A42.dip.t-dialin.net
	[91.21.90.66])
	by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id E7DF3844017;
	Tue,  3 May 2011 16:33:27 +0200 (CEST)
Received: from webmail.leidinger.net (webmail.Leidinger.net
	[IPv6:fd73:10c7:2053:1::2:102])
	by outgoing.leidinger.net (Postfix) with ESMTP id 0179011C6;
	Tue,  3 May 2011 16:33:24 +0200 (CEST)
Received: (from www@localhost)
	by webmail.leidinger.net (8.14.4/8.13.8/Submit) id p43EXOnR046041;
	Tue, 3 May 2011 16:33:24 +0200 (CEST)
	(envelope-from Alexander@Leidinger.net)
Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by
	webmail.leidinger.net (Horde Framework) with HTTP; Tue, 03 May 2011
	16:33:24 +0200
Message-ID: <20110503163324.11285rolq1oyrnlc@webmail.leidinger.net>
Date: Tue, 03 May 2011 16:33:24 +0200
From: Alexander Leidinger <Alexander@Leidinger.net>
To: ticso@cicely.de, Bernd Walter <ticso@cicely7.cicely.de>
References: <4DBBB20A.5050102@FreeBSD.org>
	<20110430072831.GA65598@icarus.home.lan>
	<20110501000656.00007ea1@unknown>
	<20110501133752.GC3245@garage.freebsd.pl>
	<20110503134826.712070yt2urhxp8g@webmail.leidinger.net>
	<20110503132517.GF1549@cicely7.cicely.de>
In-Reply-To: <20110503132517.GF1549@cicely7.cicely.de>
MIME-Version: 1.0
Content-Type: text/plain;
 charset=UTF-8;
 DelSp="Yes";
 format="flowed"
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.6)
X-EBL-MailScanner-Information: Please contact the ISP for more information
X-EBL-MailScanner-ID: E7DF3844017.A04FF
X-EBL-MailScanner: Found to be clean
X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN,
	SpamAssassin (not cached, score=0, required 6, autolearn=disabled)
X-EBL-MailScanner-From: alexander@leidinger.net
X-EBL-MailScanner-Watermark: 1305038008.30478@0FRKaNgB0n/VcxxUkpbs+g
X-EBL-Spam-Status: No
Cc: freebsd-fs@freebsd.org, Alexander Motin <mav@freebsd.org>,
	Pawel Jakub Dawidek <pjd@freebsd.org>
Subject: Re: TRIM clustering
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 May 2011 14:33:43 -0000

Quoting Bernd Walter <ticso@cicely7.cicely.de> (from Tue, 3 May 2011  
15:25:17 +0200):

> On Tue, May 03, 2011 at 01:48:26PM +0200, Alexander Leidinger wrote:
>> Quoting Pawel Jakub Dawidek <pjd@FreeBSD.org> (from Sun, 1 May 2011
>> 15:37:52 +0200):
>>
>> >On Sun, May 01, 2011 at 12:06:56AM +0200, Alexander Leidinger wrote:
>> >>On Sat, 30 Apr 2011 00:28:31 -0700 Jeremy Chadwick
>> >><freebsd@jdc.parodius.com> wrote:
>> >>
>> >>> On Sat, Apr 30, 2011 at 09:54:02AM +0300, Alexander Motin wrote:
>> >>
>> >>> Other notes: TRIM needs to be supported on swap as well, and in my
>> >>> opinion this is just as important as it being in UFS.  I'm not sure
>> >>> how one would implement that.
>> >>
>> >>This brings up the question if a ZFS cache (where the contents do not
>> >>survive a reboot) is completely TRIMmed before used (and normally
>> >>trimmed during use)...
>> >
>> >It is not trimmed at all.
>>
>> This does not sound like the optimal solution... is there a way to
>> know the first access after boot/attach to a cache device? If yes,
>> would it be possible to TRIM the complete provider (except for some
>> static data which needs to be there) from this place? This would not
>> solve the not TRIMmed during use part, put at least a reboot/reattach
>> could provide a sane state.
>
> What would be the possible benefit?
> I mean it's just until the device is filled, which won't happen that
> regular in environments where cache devices make sense.

If a cache is not full, it is not used well (or you do not access that  
much data, but in this case you do not have to worry).

The benefit of the initial TRIM should be a faster cache-fill latency  
(if it matters or not depends upon your use-case/drive-channel-usage).

Regarding the in-use-TRIMming... I agree that it is subject to  
discussion (and the use-case), but at least it looks like a more  
correct solution. If large objects are removed from the cache,  
following cache fills could have lower write latency. Again, if this  
matters or not depends upon your use-case/drive-channel-usage.

> More interesting would be to have the cached data reboot persistent
> one day instead of TRIMing it.

I assume this would be more work than to teach it to TRIM (looking for  
low haning fruits), but in general I agree.

Bye,
Alexander.

-- 
Fools rush in -- and get the best seats.

http://www.Leidinger.net    Alexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org       netchild @ FreeBSD.org  : PGP ID = 72077137

From owner-freebsd-fs@FreeBSD.ORG  Tue May  3 22:05:50 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 66D971065672
	for <freebsd-fs@freebsd.org>; Tue,  3 May 2011 22:05:50 +0000 (UTC)
	(envelope-from wmn@siberianet.ru)
Received: from mail.siberianet.ru (mail.siberianet.ru [89.105.136.7])
	by mx1.freebsd.org (Postfix) with ESMTP id C0DF08FC15
	for <freebsd-fs@freebsd.org>; Tue,  3 May 2011 22:05:49 +0000 (UTC)
Received: from wmn.localnet (wmn.siberianet.ru [89.105.137.12])
	by mail.siberianet.ru (Postfix) with ESMTPA id 612695028AE
	for <freebsd-fs@freebsd.org>; Wed,  4 May 2011 05:47:58 +0800 (KRAST)
From: Sergey Lobanov <wmn@siberianet.ru>
Organization: ISP "SiberiaNet"
Date: Wed, 4 May 2011 05:47:52 +0800
User-Agent: KMail/1.13.7 (Linux/2.6.38-ARCH; KDE/4.6.2; i686; ; )
MIME-Version: 1.0
X-Length: 2409
X-UID: 18
To: freebsd-fs@freebsd.org
Content-Type: Text/Plain;
  charset="us-ascii"
Content-Transfer-Encoding: 7bit
Message-Id: <201105040547.52216.wmn@siberianet.ru>
Subject: fsck_ufs only in preen mode terminates with non-zero exit status
	trying to check absent device
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 May 2011 22:05:50 -0000

Hello,

I am trying to workaround problem in setup with md(4) file-backed images 
mounted in jails(8). I could not find how to automatically check file systems on 
md images during system boot or jail start. That is, after hard system reset 
(for example, because of power loss) file systems on md file-backed images are 
all dirty and is not auto-repaired out of the box. May be I've missed 
something, feel free to point me out to the documentation describing the case.

Here is workaround i am trying to make:
1) md images are all added into jail fstabs so system can boot normally 
(because if images are in host fstab, system stops on check of such obviously 
absent at boot time devices).
2) I use ezjail, so we add hack into its rc-NG script which executes external 
script to check file systems on corresponding md images before jail start; 
ezjail script relies on exit status of this external script, so we can skip 
jail if check have been failed.
3) The script for check of md images gets jail name as parameter, greps 
/dev/md* rows from corresponding fstab file and tries to fsck in preen mode 
first and then in normal mode if first fails. And here we get problem with fsck: 
in preen mode it exits with non-zero status if device is not present, but if 
we then launch it in normal mode for the same device, it prints errors and 
terminates with status 0.

Example script (test-fsck-ufs.sh):
--------------------
#!/bin/sh

rc_info="YES"
. /etc/rc.subr

/sbin/fsck_ufs -p /dev/md-non-existent
if [ $? -ne 0 ]; then
  warn "Could not check in preen mode, trying normal..."
  /sbin/fsck_ufs -y /dev/md-non-existent || err $? "Could not check in normal 
mode, XXX IMAGE FILE IS CORRUPT XXX"
else
  info "Consistent"
fi
--------------------

Result of execution of above script on 8.2-stable r220968 and 7.3-stable 
r215651:
Can't stat /dev/md-non-existent: No such file or directory
./test-fsck-ufs.sh: WARNING: Could not check in preen mode, trying normal...
Can't stat /dev/md-non-existent: No such file or directory
Can't stat /dev/md-non-existent: No such file or directory

Which is incorrect from my point of view, fsck_ffs(8) clearly states at the 
very end:
"EXIT STATUS
     The fsck_ffs utility exits 0 on success, and >0 if an error occurs."

I can definitely hack fsck_ffs so it will return error on such conditions, 
something like this (fixes my case but was not checked in normal operation, 
patch for releng8):
---patch start---
--- sbin/fsck_ffs/main.c.orig   2011-05-04 04:11:18.000000000 +0800
+++ sbin/fsck_ffs/main.c        2011-05-04 04:29:23.000000000 +0800
@@ -70,6 +70,7 @@
 static int checkfilesys(char *filesys);
 static int chkdoreload(struct statfs *mntp);
 static struct statfs *getmntpt(const char *);
+char fails = 0;
 
 int
 main(int argc, char *argv[])
@@ -179,6 +180,8 @@
 
        if (returntosingle)
                ret = 2;
+       else
+               if (fails) ret = EEXIT;
        exit(ret);
 }
 
@@ -373,6 +376,7 @@
        case 0:
                if (preen)
                        pfatal("CAN'T CHECK FILE SYSTEM.");
+               fails = 1;
                return (0);
        case -1:
        clean:
---patch end---
but may be there is some other, more sane way.

Or I've just missed something, there is strong reason for such behaviour and 
it is a feature actually :}


I am subscribed to the list so there is no need to add me to CC.

-- 
ISP SiberiaNet
System and Network Administrator

From owner-freebsd-fs@FreeBSD.ORG  Tue May  3 23:23:56 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7780F106564A
	for <freebsd-fs@freebsd.org>; Tue,  3 May 2011 23:23:56 +0000 (UTC)
	(envelope-from roberto@keltia.freenix.fr)
Received: from keltia.net (unknown [IPv6:2a01:240:fe5c::41])
	by mx1.freebsd.org (Postfix) with ESMTP id 2DA798FC13
	for <freebsd-fs@freebsd.org>; Tue,  3 May 2011 23:23:56 +0000 (UTC)
Received: from lonrach.keltia.net (lonrach.keltia.net [193.56.58.71])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested) (Authenticated sender: roberto)
	by keltia.net (Postfix/TLS) with ESMTPSA id 09FB5E15B
	for <freebsd-fs@freebsd.org>; Wed,  4 May 2011 01:23:54 +0200 (CEST)
Date: Wed, 4 May 2011 01:23:52 +0200
From: Ollivier Robert <roberto@keltia.freenix.fr>
To: freebsd-fs@freebsd.org
Message-ID: <20110503232352.GB29092@lonrach.keltia.net>
References: <4DB8EF02.8060406@bk.ru> <ipf6i6$54v$1@dough.gmane.org>
	<20110430001524.GA58845@icarus.home.lan>
	<4DBC2E46.9060404@userid.org> <4DBCA4AE.3090506@FreeBSD.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4DBCA4AE.3090506@FreeBSD.org>
X-Operating-System: MacOS X / MBP 4,1 - FreeBSD 8.0 / T3500-E5520 Nehalem
User-Agent: Mutt/1.5.20 (2009-06-14)
Subject: Re: ZFS v28 for 8.2-STABLE
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 May 2011 23:23:56 -0000

According to Martin Matuska:
> I have updated patch to reflect latest changes (grab latest one):
> http://people.freebsd.org/~mm/patches/zfs/v28/

My 8.2-STABLE machine (r221058) is running with the 20110317 patch applied, I put back my partition on the 3rd drive as a cache (was taking a full CPU in v15 due to a overflow bug) and it has been working fine for a few days now, doing www/uucp/dns/dnssec/ssh and sending away dozens of spammers.

Handle these all fine, I even enabled deduplication on some filesets:
643 [1:19] root@centre:munin/plugins# zpool list

NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
data   284G   210G  73.8G    74%  1.00x  ONLINE  -  1x 320 GB
tank   294G  69.2G   225G    23%  1.06x  ONLINE  -  2x 320 GB mirrorred

> As to your setup, have you tried using a partition as a log device?

cache yes, log no.

-- 
Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- roberto@keltia.freenix.fr
In memoriam to Ondine : http://ondine.keltia.net/

From owner-freebsd-fs@FreeBSD.ORG  Wed May  4 00:15:43 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D60AD1065670
	for <fs@freebsd.org>; Wed,  4 May 2011 00:15:42 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
	[131.104.91.44])
	by mx1.freebsd.org (Postfix) with ESMTP id 7D8D08FC14
	for <fs@freebsd.org>; Wed,  4 May 2011 00:15:42 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApwEAH6ZwE2DaFvO/2dsb2JhbACEUaJGiHKreJEdgSqDV4EBBI8Yjk4
X-IronPort-AV: E=Sophos;i="4.64,312,1301889600"; d="scan'208";a="120327109"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 03 May 2011 20:15:41 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 8AABCB3F24;
	Tue,  3 May 2011 20:15:41 -0400 (EDT)
Date: Tue, 3 May 2011 20:15:41 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <2143699515.968680.1304468141505.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20110503174200.V1050@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: fs@FreeBSD.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 04 May 2011 00:15:43 -0000

> On Mon, 2 May 2011, Rick Macklem wrote:
> 
> > I have attached a version of the patch that I intend to commit
> > unless it doesn't work for Kostik's test case. Kostik, could
> > you please test this one.
> >
> > Yes, Bruce, I realize you won't like it, but I
> > have put some comments in it
> > to try and clarify why it is coded the way it is.
> > (The arithmetic seems to work the way I would expect it to for
> > i386, which is the only arch I have for testing.)
> 
> Sigh.
> 
> % --- fs/nfsclient/nfs_clport.c.sav 2011-04-30 20:16:39.000000000
> -0400
> % +++ fs/nfsclient/nfs_clport.c 2011-05-02 19:32:31.000000000 -0400
> % @@ -838,21 +838,33 @@ void
> % nfscl_loadsbinfo(struct nfsmount *nmp, struct nfsstatfs *sfp, void
> *statfs)
> % {
> % struct statfs *sbp = (struct statfs *)statfs;
> % - nfsquad_t tquad;
> %
> % if (nmp->nm_flag & (NFSMNT_NFSV3 | NFSMNT_NFSV4)) {
> % sbp->f_bsize = NFS_FABLKSIZE;
> % - tquad.qval = sfp->sf_tbytes;
> % - sbp->f_blocks = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE));
> % - tquad.qval = sfp->sf_fbytes;
> % - sbp->f_bfree = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE));
> % - tquad.qval = sfp->sf_abytes;
> % - sbp->f_bavail = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE));
> % - tquad.qval = sfp->sf_tfiles;
> % - sbp->f_files = (tquad.lval[0] & 0x7fffffff);
> % - tquad.qval = sfp->sf_ffiles;
> % - sbp->f_ffree = (tquad.lval[0] & 0x7fffffff);
> % + sbp->f_blocks = sfp->sf_tbytes / NFS_FABLKSIZE;
> % + sbp->f_bfree = sfp->sf_fbytes / NFS_FABLKSIZE;
> % + /*
> % + * Although sf_abytes is uint64_t and f_bavail is int64_t,
> % + * the value after dividing by NFS_FABLKSIZE is small
> % + * enough that it will fit in 63bits, so it is ok to
> % + * assign it to f_bavail without fear that it will become
> % + * negative.
> % + */
> % + sbp->f_bavail = sfp->sf_abytes / NFS_FABLKSIZE;
> % + sbp->f_files = sfp->sf_tfiles;
> % + /* Since f_ffree is int64_t, clip it to 63bits. */
> % + if (sfp->sf_ffiles > (uint64_t)INT64_MAX)
> 
> This cast has no effect. INT64_MAX has type int64_t. sf_ffiles has
> uint64_t. The default binary promotions cause both types to be
> promoted
> to the minimally larger common type. This type is uint64_t. Thus
> INT64_MAX is converted automatically to the correct type.
> 
Yea, I didn't tthink the cast mattered and it didn't affect the outcome
for my little userland test program, so I'll take it out. (I was trying
the "play it safe", but if you say it doesn't matter, I believe you.)

> % + sbp->f_ffree = INT64_MAX;
> % + else
> % + sbp->f_ffree = sfp->sf_ffiles;
> % } else if ((nmp->nm_flag & NFSMNT_NFSV4) == 0) {
> % + /*
> % + * The type casts to (int32_t) ensure that this code is
> % + * compatible with the old NFS client, in that it will
> % + * sign extend a value with bit31 set. This may or may
> % + * not be correct for NFSv2, but since it is a legacy
> % + * environment, I'd rather retain backwards compatibility.
> % + */
> % sbp->f_bsize = (int32_t)sfp->sf_bsize;
> % sbp->f_blocks = (int32_t)sfp->sf_blocks;
> % sbp->f_bfree = (int32_t)sfp->sf_bfree;
> 
> It won't sign extend, but will propagate bit31 as an unsigned bit. For
> example, sfp->sf_blocks = 0x80000000 becomes sbp->f_blocks =
> 0xFFFFFFFF80000000, which is massively different. Again, omitting the
> cast gives the correct result if the wire insists on its values being
> unsigned.
> 
Ok, I'll change the comment.

> The result is only backwards compatible with relatively recent FreeBSD
> nfs clients. All FreeBSD clients are completely broken if bit31 is
> set, and compatibility with this brokenness is not useful (but as I
> pointed out in another reply, we would never have seen the broken case
> when the old clients weren't old, since it takes a server file system
> size of about 32TB for bit 31 to be set).
Well, the last legitimate use of the FreeBSD NFSv2 client was a diskless
root fs stored on a non-FreeBSD NFS server (because pxeboot didn't know the
correct file handle size). Since this is now fixed, there really isn't
any use for the NFSv2 client, as far as I know. Given that and the fact
that no one is complaining about it being broken, I feel it should just
be left alone. (Or remain "bug compatible" with the regular NFS client,
if you prefer.)

I'm afraid I have other things to work on and just don't see changing
NFSv2 (a 1985 protocol superceded by NFSv3 in 1994) a priority, rick.
 
> The details of the
> brokenness
> vary:
> 
> Net/2, FreeBSD-1, 4.4BSD-Lite, FreeBSD-[2-4]:
> f_blocks was plain long:
> if long is 32 bits, then sfp->sf_blocks = 0x80000000 becomes
> sbp->f_blocks = -0x7fffffff - 1 (LONG_MIN)
> if long is 64 bits, then sfp->sf_blocks = 0x80000000 becomes
> sbp->f_blocks = -0x80000000L (INT32_MIN (same as 32-bit LONG_MIN)
> 
> FreeBSD-current after 2003/11/12, FreeBSD-[5-9]:
> f_blocks is now uint64_t:
> changing it (and others from a signed type to an unsigned type mainly
> gave lots of sign extension bugs, including here. The bugs remain
> mostly unfixed.
> sfp->sf_blocks = 0x80000000 becomes
> sbp->f_blocks = 0xFFFFFFFF80000000 ((uint64_t)INT32_MIN) on all
> arches.
> 
> Neither of the garbage values INT32_MIN, ((uint64_t)INT_MIN) gives
> useful
> behaviour. The former is negative, though the wire value cannot be
> negative
> (not sure about this for v2). Applications that are naive enough to
> believe
> this value should assume that the the file system has a negative size
> and
> never try to write anything. The latter is enormous and positive. If
> the
> wire count really is 0x80000000, then that is already very large, so
> believing that the value is 0xFFFFFFFF80000000 should make little
> difference.
> 
> The bugs are a little different for signed fields like f_bavail. Now
> there
> are no sign extension bugs or version-dependent misbehaviours. There
> are
> just overflow bugs in the bogus casts. (int32_t)0x80000000 overflows
> to
> INT32_MIN (only on 2's complement machines, but no others are
> supported),
> and assignment to sbp->f_bavail doesn't change this garbage value. Now
> the bugs are even further off, since it takes about a 400 TB ffs
> server file
> system to reach them. (400 TB with 8% minfree gives a 32TB reserve for
> root. After using all 32TB of this reserve, there would be -32TB
> available
> for non-root. -32TB is INT32_MIN in 16K-blocks.)
> 
> Bruce

From owner-freebsd-fs@FreeBSD.ORG  Wed May  4 00:27:54 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1A088106564A;
	Wed,  4 May 2011 00:27:54 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
	[131.104.91.36])
	by mx1.freebsd.org (Postfix) with ESMTP id A908F8FC13;
	Wed,  4 May 2011 00:27:53 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApwEAP+cwE2DaFvO/2dsb2JhbACEUaJGiHKrX5EcgSqBX4F4gQEEjxiOTg
X-IronPort-AV: E=Sophos;i="4.64,312,1301889600"; d="scan'208";a="119503788"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-annu-pri.mail.uoguelph.ca with ESMTP; 03 May 2011 20:27:53 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id C4F95B3F34;
	Tue,  3 May 2011 20:27:52 -0400 (EDT)
Date: Tue, 3 May 2011 20:27:52 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Bruce Evans <brde@optusnet.com.au>
Message-ID: <308871799.968962.1304468872744.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <20110503183651.L1224@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: rmacklem@FreeBSD.org, fs@FreeBSD.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 04 May 2011 00:27:54 -0000

> On Mon, 2 May 2011, Rick Macklem wrote:
> 
> >>> I'll try and make my Solaris10 box get to -ve frees and then see
> >>> what
> >>> it puts on the wire. After that, I'll start a discussion on
> >>> freebsd-fs@
> >>> about how they think a FreeBSD server should behave when f_bavail
> >>> and/or
> >>> f_ffree are negative.
> >>
> >> The result on Solaris would be interesting. Does Solaris still
> >> support
> >> ffs? You said later that you couldn't get it to generate negative
> >> values.
> >>
> > Well, I just did the reverse (ran a FreeBSD FFS disk out of space so
> > it reported a -ve free and mounted in on Solaris10). Here are the
> > "df" outputs (I used "df -k" on Solaris, since that's a compatible
> > format):
> 
> That is almost as good a test.
> 
> > FreeBSD-current server (nfsv4-newlap):
> > Filesystem 1K-blocks Used Avail Capacity Mounted on
> > /dev/ad4s3a 2026030 671492 1192456 36% /
> > devfs 1 1 0 100% /dev
> > /dev/ad4s3e 4697030 4544054 -222786 105% /sub1
> > /dev/ad4s3d 5077038 641462 4029414 14% /usr
> >
> > Solaris10 client:
> > Filesystem kbytes used avail capacity Mounted on
> > /dev/dsk/c0d0s0 3870110 2790938 1040471 73% /
> > /devices 0 0 0 0% /devices
> > ctfs 0 0 0 0% /system/contract
> > proc 0 0 0 0% /proc
> > mnttab 0 0 0 0% /etc/mnttab
> > swap 975736 624 975112 1% /etc/svc/volatile
> > objfs 0 0 0 0% /system/object
> > /usr/lib/libc/libc_hwcap1.so.1 3870110 2790938 1040471 73%
> > /lib/libc.so.1
> > fd 0 0 0 0% /dev/fd
> > swap 975112 0 975112 0% /tmp
> > swap 975140 28 975112 1% /var/run
> > /dev/dsk/c0d0s7 5608190 4118091 1434018 75% /export/home
> > nfsv4-newlap:/sub1 4697030 4544054 18014398509259198 1% /mnt
> >
> > as you can see, Solaris10 doesn't assume it's negative and
> > reports lottsa avail.
> >
> > I don't have a Linux client handy, so I can't do the same test
> > with Linux, rick
> 
> I looked at linux-2.6.10 code. It doesn't do anything good for signed
> counts, and declares f_bavail with a bad mixture of arch-dependent
> types
> -- int, s32, u32, __u32, long, u64, __u64 (but no s64 :-). It does 1
> nearby thing better: instead of a fixed blocksize of NFS_FABLKSIZE =
> 512
> for nfs, the blocksize is a parameter, and in scaling by this it is
> careful to round up.
> 
> NetBSD is best. Its statvfs at least has full support for handling
> this
> problem. From a 2004 version of NetBSD statvfs.h:
> 
> % struct statvfs {
> % unsigned long f_flag; /* copy of mount exported flags */
> % unsigned long f_bsize; /* file system block size */
> % unsigned long f_frsize; /* fundamental file system block size */
> % unsigned long f_iosize; /* optimal file system block size */
> %
> % fsblkcnt_t f_blocks; /* number of blocks in file system, */
> % /* (in units of f_frsize) */
> % fsblkcnt_t f_bfree; /* free blocks avail in file system */
> % fsblkcnt_t f_bavail; /* free blocks avail to non-root */
> % fsblkcnt_t f_bresvd; /* blocks reserved for root */
> 
> statvfs is specified by POSIX, and I previously mentioned that POSIX
> is
> quite broken in this area. One of the bugs is that all the POSIX block
> count types like fsblkcnt_t in the above are specified to be unsigned.
> Thus negative block counts cannot be supported directly using these
> types,
> even if the OS has negative block counts. In the above, NetBSD works
> around this by having an extension giving a nonnegative block count
> for
> the blocks reserved for root. statfs should have used this instead of
> a hack involving negative counts, but presumably didn't to avoid
> changing
> the ABI. Even NetBSD doesn't have this extension for statfs, at least
> in 2004. statfs(2) was apparently deprecated in NetBSD before 2004,
> with
> newer features only going into statvfs(2).
> 
> %
> % fsfilcnt_t f_files; /* total file nodes in file system */
> % fsfilcnt_t f_ffree; /* free file nodes in file system */
> % fsfilcnt_t f_favail; /* free file nodes avail to non-root */
> % fsfilcnt_t f_fresvd; /* file nodes reserved for root */
> 
> Similarly.
> 
> %
> % uint64_t f_syncreads; /* count of sync reads since mount */
> % uint64_t f_syncwrites; /* count of sync writes since mount */
> %
> % uint64_t f_asyncreads; /* count of async reads since mount */
> % uint64_t f_asyncwrites; /* count of async writes since mount */
> %
> % fsid_t f_fsidx; /* NetBSD compatible fsid */
> % unsigned long f_fsid; /* Posix compatible fsid */
> % unsigned long f_namemax; /* maximum filename length */
> % uid_t f_owner; /* user that mounted the file system */
> %
> % uint32_t f_spare[4]; /* spare space */
> %
> % char f_fstypename[_VFS_NAMELEN]; /* fs type name */
> % char f_mntonname[_VFS_MNAMELEN]; /* directory on which mounted */
> % char f_mntfromname[_VFS_MNAMELEN]; /* mounted file system */
> %
> % };
> 
> As I said before, NetBSD's nfs tries to make this work for nfs, but I
> couldn't this worked in NetBSD or anything I could think of, since the
> extension is not in the nfs protocol. Now I think it does work, but
> still can't see how. Details: NetBSD puts f_bavail on the wire without
> clamping it (it just scales it). Now I think f_bavail is never
> negative
> in NetBSD, so this scaling doesn't involves the usual sign extension
> and overflow bugs, or abuse of the top bit. The client zaps negative
> values for v3 f_bavail but not for other things, and initializes
> f_bresvd:
> from a 2005 version ofs nfs_vfsops.c:
> 
> % if (v3) {
> % sbp->f_frsize = sbp->f_bsize = NFS_FABLKSIZE;
> % tquad = fxdr_hyper(&sfp->sf_tbytes);
> % sbp->f_blocks = ((quad_t)tquad / (quad_t)NFS_FABLKSIZE);
> % tquad = fxdr_hyper(&sfp->sf_fbytes);
> % sbp->f_bfree = ((quad_t)tquad / (quad_t)NFS_FABLKSIZE);
> % tquad = fxdr_hyper(&sfp->sf_abytes);
> % tquad = ((quad_t)tquad / (quad_t)NFS_FABLKSIZE);
> % sbp->f_bresvd = sbp->f_bfree - tquad;
> 
> I still can't see how this initialization works. f_bresvd has to end
> up as nonzero if root has a reserve, and drop to zero as the reserve
> is used up. sf_fbytes - sf_abytes must give this reserve.
> 
> % sbp->f_bavail = tquad;
> % #ifdef COMPAT_20
> % /* Handle older NFS servers returning negative values */
> % if ((quad_t)sbp->f_bavail < 0)
> % sbp->f_bavail = 0;
> % #endif
> 
> NetBSD's own server puts f_bavail on the wire unchanged except for
> scaling,
> so it is now clear that f_bavail is never negative in NetBSD.
> 
> % tquad = fxdr_hyper(&sfp->sf_tfiles);
> % sbp->f_files = tquad;
> % tquad = fxdr_hyper(&sfp->sf_ffiles);
> % sbp->f_ffree = tquad;
> % sbp->f_favail = tquad;
> 
> "Negative" values for this are not zapped.
> 
> % sbp->f_fresvd = 0;
> 
> This reserv is not really supported. Supporting it is impossible since
> there is not as much redundancy in the wire values for the file counts
> as for the block counts.
> 
> % sbp->f_namemax = MAXNAMLEN;
> % } else {
> % sbp->f_bsize = NFS_FABLKSIZE;
> % sbp->f_frsize = fxdr_unsigned(int32_t, sfp->sf_bsize);
> % sbp->f_blocks = fxdr_unsigned(int32_t, sfp->sf_blocks);
> % sbp->f_bfree = fxdr_unsigned(int32_t, sfp->sf_bfree);
> % sbp->f_bavail = fxdr_unsigned(int32_t, sfp->sf_bavail);
> 
> Still has old bugs.
> 
> % sbp->f_fresvd = 0;
> % sbp->f_files = 0;
> % sbp->f_ffree = 0;
> % sbp->f_favail = 0;
> % sbp->f_fresvd = 0;
> % sbp->f_namemax = MAXNAMLEN;
> % }
> 
> Next steps: someone should look at why there are 3 nfsv3 protocol
> fields for the block counts when only 2 are strictly needed.
> 
> Bruce
Here is the RFCs definition of the 3 fields:
      tbytes
         The total size, in bytes, of the file system.

      fbytes
         The amount of free space, in bytes, in the file
         system.

      abytes
         The amount of free space, in bytes, available to the
         user identified by the authentication information in
         the RPC.  (This reflects space that is reserved by the
         file system; it does not reflect any quota system
         implemented by the server.)

I suspect that most systems running FFS (mis)use abytes to represent
the non-root value, even when "root" does the RPC. If they didn't
do that, then abytes would be different when root did statfs and that
would be confusing to a typical client.

Since you don't know if the server's file system is one like FFS that
has a "minfree" (and you don't know what "minfree" is), you can't
reliably calculate a negative f_bavail from the above, from what I
can see.

rick

From owner-freebsd-fs@FreeBSD.ORG  Wed May  4 02:51:21 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 39A8F106566B;
	Wed,  4 May 2011 02:51:21 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail06.syd.optusnet.com.au (mail06.syd.optusnet.com.au
	[211.29.132.187])
	by mx1.freebsd.org (Postfix) with ESMTP id B4F898FC18;
	Wed,  4 May 2011 02:51:20 +0000 (UTC)
Received: from c122-106-155-58.carlnfd1.nsw.optusnet.com.au
	(c122-106-155-58.carlnfd1.nsw.optusnet.com.au [122.106.155.58])
	by mail06.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	p442pGqb017742
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 4 May 2011 12:51:17 +1000
Date: Wed, 4 May 2011 12:51:16 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Rick Macklem <rmacklem@uoguelph.ca>
In-Reply-To: <308871799.968962.1304468872744.JavaMail.root@erie.cs.uoguelph.ca>
Message-ID: <20110504120552.P956@besplex.bde.org>
References: <308871799.968962.1304468872744.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: rmacklem@freebsd.org, fs@freebsd.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 04 May 2011 02:51:21 -0000

On Tue, 3 May 2011, Rick Macklem wrote:

[attributions lost]
>> ...
>> As I said before, NetBSD's nfs tries to make this work for nfs, but I
>> couldn't this worked in NetBSD or anything I could think of, since the
>> extension is not in the nfs protocol. Now I think it does work, but
>> still can't see how. Details: NetBSD puts f_bavail on the wire without

Nah, it cannot work.

>> clamping it (it just scales it). Now I think f_bavail is never
>> negative
>> in NetBSD, so this scaling doesn't involves the usual sign extension
>> and overflow bugs, or abuse of the top bit. The client zaps negative
>> values for v3 f_bavail but not for other things, and initializes
>> f_bresvd:
>> from a 2005 version ofs nfs_vfsops.c:
>>
>> % if (v3) {
>> % sbp->f_frsize = sbp->f_bsize = NFS_FABLKSIZE;
>> % tquad = fxdr_hyper(&sfp->sf_tbytes);
>> % sbp->f_blocks = ((quad_t)tquad / (quad_t)NFS_FABLKSIZE);
>> % tquad = fxdr_hyper(&sfp->sf_fbytes);
>> % sbp->f_bfree = ((quad_t)tquad / (quad_t)NFS_FABLKSIZE);
>> % tquad = fxdr_hyper(&sfp->sf_abytes);
>> % tquad = ((quad_t)tquad / (quad_t)NFS_FABLKSIZE);
>> % sbp->f_bresvd = sbp->f_bfree - tquad;

Hmm, the tabs are more mangled than usual.

>> I still can't see how this initialization works. f_bresvd has to end
>> up as nonzero if root has a reserve, and drop to zero as the reserve
>> is used up. sf_fbytes - sf_abytes must give this reserve.
>>
>> % sbp->f_bavail = tquad;
>> % #ifdef COMPAT_20
>> % /* Handle older NFS servers returning negative values */
>> % if ((quad_t)sbp->f_bavail < 0)
>> % sbp->f_bavail = 0;
>> % #endif
>>
>> NetBSD's own server puts f_bavail on the wire unchanged except for
>> scaling,
>> so it is now clear that f_bavail is never negative in NetBSD.
>>
>> % tquad = fxdr_hyper(&sfp->sf_tfiles);
>> % sbp->f_files = tquad;
>> % tquad = fxdr_hyper(&sfp->sf_ffiles);
>> % sbp->f_ffree = tquad;
>> % sbp->f_favail = tquad;
>>
>> "Negative" values for this are not zapped.
>>
>> % sbp->f_fresvd = 0;
>>
>> This reserv is not really supported. Supporting it is impossible since
>> there is not as much redundancy in the wire values for the file counts
>> as for the block counts.
>>
>> % sbp->f_namemax = MAXNAMLEN;
>> % } else {
>> % sbp->f_bsize = NFS_FABLKSIZE;
>> % sbp->f_frsize = fxdr_unsigned(int32_t, sfp->sf_bsize);
>> % sbp->f_blocks = fxdr_unsigned(int32_t, sfp->sf_blocks);
>> % sbp->f_bfree = fxdr_unsigned(int32_t, sfp->sf_bfree);
>> % sbp->f_bavail = fxdr_unsigned(int32_t, sfp->sf_bavail);
>>
>> Still has old bugs.
>>
>> % sbp->f_fresvd = 0;
>> % sbp->f_files = 0;
>> % sbp->f_ffree = 0;
>> % sbp->f_favail = 0;
>> % sbp->f_fresvd = 0;
>> % sbp->f_namemax = MAXNAMLEN;
>> % }
>>
>> Next steps: someone should look at why there are 3 nfsv3 protocol
>> fields for the block counts when only 2 are strictly needed.

> Here is the RFCs definition of the 3 fields:
>      tbytes
>         The total size, in bytes, of the file system.
>
>      fbytes
>         The amount of free space, in bytes, in the file
>         system.
>
>      abytes
>         The amount of free space, in bytes, available to the
>         user identified by the authentication information in
>         the RPC.  (This reflects space that is reserved by the
>         file system; it does not reflect any quota system
>         implemented by the server.)

So nfs does support a specially restricted amount of free space available
to a mere user, but it doesn't support this amount being negative.  BSD
uses a negative amount for this to indicate how far away fom having any
space to use the user is.

> I suspect that most systems running FFS (mis)use abytes to represent
> the non-root value, even when "root" does the RPC. If they didn't
> do that, then abytes would be different when root did statfs and that
> would be confusing to a typical client.
>
> Since you don't know if the server's file system is one like FFS that
> has a "minfree" (and you don't know what "minfree" is), you can't
> reliably calculate a negative f_bavail from the above, from what I
> can see.

Yes, it's very annoying that only 3 numbers are available, and 3 numbers
are supplied, and the number corresponding to minfree can be recovered
from the 3 numbers supplied, but only when abytes > 0.  (fbytes - abytes)
gives the amount of free space _not_ available to the user and therefore
the amount if free space reserved.  Under the condition abytes > 0,
for file systems like ffs, none of the original reservation (according
to minfree) is used, so (fbytes - abytes) also gives the size of the
original reservation.  But when the reservation starts being used,
abytes is clamped to 0 on broken systems, so the linear relations
between the 3 numbers and the alternative more useful 3 numbers (tbytes,
fbytes, origreservedbytes) are broken, so there is no way to recover
the original reservation, or equivalently, the amount of the reservation
that is used.  NetBSD uses:

 	tquad = fxdr_hyper(&sfp->sf_abytes);
 	tquad = ((quad_t)tquad / (quad_t)NFS_FABLKSIZE);
 	sbp->f_bresvd = sbp->f_bfree - tquad;

This is (fbytes - abytes) in blocks, so it only works when abytes > 0.

Repeating part of the above:
> I suspect that most systems running FFS (mis)use abytes to represent
> the non-root value, even when "root" does the RPC. If they didn't
> do that, then abytes would be different when root did statfs and that
> would be confusing to a typical client.

Negative abytes is even more useful for root (or for any user privileged
enough to use the reserve).  It tells how much of the reserve is used.
Users that can eat the reserve should try not to, and when they have
they should try to release space to get back to the full reserve.
Without negative abytes, there is no API in statfs(2) to tell how much
has been eaten.  NetBSD's f_bresvd in stavfs(2) might be able to tell,
but it is unclear if it is supposed to give the original reserve or
the current reserve, and it is already hard enough to decode the 3
numbers into 3 useful ones.

Bruce

From owner-freebsd-fs@FreeBSD.ORG  Wed May  4 03:58:18 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F045E1065670;
	Wed,  4 May 2011 03:58:18 +0000 (UTC)
	(envelope-from linimon@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id C83598FC19;
	Wed,  4 May 2011 03:58:18 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p443wIxI001631;
	Wed, 4 May 2011 03:58:18 GMT
	(envelope-from linimon@freefall.freebsd.org)
Received: (from linimon@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p443wIhD001627;
	Wed, 4 May 2011 03:58:18 GMT (envelope-from linimon)
Date: Wed, 4 May 2011 03:58:18 GMT
Message-Id: <201105040358.p443wIhD001627@freefall.freebsd.org>
To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org
From: linimon@FreeBSD.org
Cc: 
Subject: Re: kern/156797: [zfs] [panic] Double panic with FreeBSD 9-CURRENT
	and ZFS
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 04 May 2011 03:58:19 -0000

Old Synopsis: Double panic with FreeBSD 9-CURRENT and ZFS
New Synopsis: [zfs] [panic] Double panic with FreeBSD 9-CURRENT and ZFS

Responsible-Changed-From-To: freebsd-bugs->freebsd-fs
Responsible-Changed-By: linimon
Responsible-Changed-When: Wed May 4 03:58:07 UTC 2011
Responsible-Changed-Why: 
Over to maintainer(s).

http://www.freebsd.org/cgi/query-pr.cgi?pr=156797

From owner-freebsd-fs@FreeBSD.ORG  Wed May  4 08:19:34 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7D2301065670
	for <fs@freebsd.org>; Wed,  4 May 2011 08:19:34 +0000 (UTC)
	(envelope-from kostikbel@gmail.com)
Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200])
	by mx1.freebsd.org (Postfix) with ESMTP id 132D28FC08
	for <fs@freebsd.org>; Wed,  4 May 2011 08:19:33 +0000 (UTC)
Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua
	[10.1.1.148])
	by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p448IJ2t070852
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 4 May 2011 11:18:19 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1])
	by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id
	p448IJRA087944; Wed, 4 May 2011 11:18:19 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
Received: (from kostik@localhost)
	by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p448IJSP087943; 
	Wed, 4 May 2011 11:18:19 +0300 (EEST)
	(envelope-from kostikbel@gmail.com)
X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to
	kostikbel@gmail.com using -f
Date: Wed, 4 May 2011 11:18:19 +0300
From: Kostik Belousov <kostikbel@gmail.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Message-ID: <20110504081819.GM48734@deviant.kiev.zoral.com.ua>
References: <20110503174200.V1050@besplex.bde.org>
	<2143699515.968680.1304468141505.JavaMail.root@erie.cs.uoguelph.ca>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="ttU9AGiyjxxpUEJf"
Content-Disposition: inline
In-Reply-To: <2143699515.968680.1304468141505.JavaMail.root@erie.cs.uoguelph.ca>
User-Agent: Mutt/1.4.2.3i
X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua
X-Virus-Status: Clean
X-Spam-Status: No, score=-3.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,
	DNS_FROM_OPENWHOIS autolearn=no version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	skuns.kiev.zoral.com.ua
Cc: fs@freebsd.org
Subject: Re: newnfs client and statfs
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 04 May 2011 08:19:34 -0000


--ttU9AGiyjxxpUEJf
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, May 03, 2011 at 08:15:41PM -0400, Rick Macklem wrote:
> > On Mon, 2 May 2011, Rick Macklem wrote:
> >=20
> > > I have attached a version of the patch that I intend to commit
> > > unless it doesn't work for Kostik's test case. Kostik, could
> > > you please test this one.
> > >
> > > Yes, Bruce, I realize you won't like it, but I
> > > have put some comments in it
> > > to try and clarify why it is coded the way it is.
> > > (The arithmetic seems to work the way I would expect it to for
> > > i386, which is the only arch I have for testing.)
> >=20
> > Sigh.
> >=20
> > % --- fs/nfsclient/nfs_clport.c.sav 2011-04-30 20:16:39.000000000
> > -0400
> > % +++ fs/nfsclient/nfs_clport.c 2011-05-02 19:32:31.000000000 -0400
> > % @@ -838,21 +838,33 @@ void
> > % nfscl_loadsbinfo(struct nfsmount *nmp, struct nfsstatfs *sfp, void
> > *statfs)
> > % {
> > % struct statfs *sbp =3D (struct statfs *)statfs;
> > % - nfsquad_t tquad;
> > %
> > % if (nmp->nm_flag & (NFSMNT_NFSV3 | NFSMNT_NFSV4)) {
> > % sbp->f_bsize =3D NFS_FABLKSIZE;
> > % - tquad.qval =3D sfp->sf_tbytes;
> > % - sbp->f_blocks =3D (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE));
> > % - tquad.qval =3D sfp->sf_fbytes;
> > % - sbp->f_bfree =3D (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE));
> > % - tquad.qval =3D sfp->sf_abytes;
> > % - sbp->f_bavail =3D (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE));
> > % - tquad.qval =3D sfp->sf_tfiles;
> > % - sbp->f_files =3D (tquad.lval[0] & 0x7fffffff);
> > % - tquad.qval =3D sfp->sf_ffiles;
> > % - sbp->f_ffree =3D (tquad.lval[0] & 0x7fffffff);
> > % + sbp->f_blocks =3D sfp->sf_tbytes / NFS_FABLKSIZE;
> > % + sbp->f_bfree =3D sfp->sf_fbytes / NFS_FABLKSIZE;
> > % + /*
> > % + * Although sf_abytes is uint64_t and f_bavail is int64_t,
> > % + * the value after dividing by NFS_FABLKSIZE is small
> > % + * enough that it will fit in 63bits, so it is ok to
> > % + * assign it to f_bavail without fear that it will become
> > % + * negative.
> > % + */
> > % + sbp->f_bavail =3D sfp->sf_abytes / NFS_FABLKSIZE;
> > % + sbp->f_files =3D sfp->sf_tfiles;
> > % + /* Since f_ffree is int64_t, clip it to 63bits. */
> > % + if (sfp->sf_ffiles > (uint64_t)INT64_MAX)
> >=20
> > This cast has no effect. INT64_MAX has type int64_t. sf_ffiles has
> > uint64_t. The default binary promotions cause both types to be
> > promoted
> > to the minimally larger common type. This type is uint64_t. Thus
> > INT64_MAX is converted automatically to the correct type.
> >=20
> Yea, I didn't tthink the cast mattered and it didn't affect the outcome
> for my little userland test program, so I'll take it out. (I was trying
> the "play it safe", but if you say it doesn't matter, I believe you.)
>=20
> > % + sbp->f_ffree =3D INT64_MAX;
> > % + else
> > % + sbp->f_ffree =3D sfp->sf_ffiles;
> > % } else if ((nmp->nm_flag & NFSMNT_NFSV4) =3D=3D 0) {
> > % + /*
> > % + * The type casts to (int32_t) ensure that this code is
> > % + * compatible with the old NFS client, in that it will
> > % + * sign extend a value with bit31 set. This may or may
> > % + * not be correct for NFSv2, but since it is a legacy
> > % + * environment, I'd rather retain backwards compatibility.
> > % + */
> > % sbp->f_bsize =3D (int32_t)sfp->sf_bsize;
> > % sbp->f_blocks =3D (int32_t)sfp->sf_blocks;
> > % sbp->f_bfree =3D (int32_t)sfp->sf_bfree;
> >=20
> > It won't sign extend, but will propagate bit31 as an unsigned bit. For
> > example, sfp->sf_blocks =3D 0x80000000 becomes sbp->f_blocks =3D
> > 0xFFFFFFFF80000000, which is massively different. Again, omitting the
> > cast gives the correct result if the wire insists on its values being
> > unsigned.
> >=20
> Ok, I'll change the comment.
>=20
> > The result is only backwards compatible with relatively recent FreeBSD
> > nfs clients. All FreeBSD clients are completely broken if bit31 is
> > set, and compatibility with this brokenness is not useful (but as I
> > pointed out in another reply, we would never have seen the broken case
> > when the old clients weren't old, since it takes a server file system
> > size of about 32TB for bit 31 to be set).
> Well, the last legitimate use of the FreeBSD NFSv2 client was a diskless
> root fs stored on a non-FreeBSD NFS server (because pxeboot didn't know t=
he
> correct file handle size). Since this is now fixed, there really isn't
> any use for the NFSv2 client, as far as I know. Given that and the fact
> that no one is complaining about it being broken, I feel it should just
> be left alone. (Or remain "bug compatible" with the regular NFS client,
> if you prefer.)
>=20
> I'm afraid I have other things to work on and just don't see changing
> NFSv2 (a 1985 protocol superceded by NFSv3 in 1994) a priority, rick.

Rick, so any final version of the final patch to (re-)test ?

--ttU9AGiyjxxpUEJf
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iEYEARECAAYFAk3BC8oACgkQC3+MBN1Mb4jluQCfdYUpvlQVu1lY+zV/KsWyr97Q
QCcAnisrVqXE3UdiXE8KiGKoigmYk4zR
=ld0I
-----END PGP SIGNATURE-----

--ttU9AGiyjxxpUEJf--

From owner-freebsd-fs@FreeBSD.ORG  Wed May  4 11:29:39 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DA74C106566B
	for <freebsd-fs@freebsd.org>; Wed,  4 May 2011 11:29:39 +0000 (UTC)
	(envelope-from rs@bytecamp.net)
Received: from mail.bytecamp.net (mail.bytecamp.net [212.204.60.9])
	by mx1.freebsd.org (Postfix) with ESMTP id 6CB7C8FC13
	for <freebsd-fs@freebsd.org>; Wed,  4 May 2011 11:29:39 +0000 (UTC)
Received: (qmail 26889 invoked by uid 89); 4 May 2011 13:02:57 +0200
Received: from stella.bytecamp.net (HELO ?212.204.60.37?)
	(rs%bytecamp.net@212.204.60.37)
	by mail.bytecamp.net with CAMELLIA256-SHA encrypted SMTP;
	4 May 2011 13:02:57 +0200
Message-ID: <4DC13260.4020905@bytecamp.net>
Date: Wed, 04 May 2011 13:02:56 +0200
From: Robert Schulze <rs@bytecamp.net>
Organization: bytecamp GmbH
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
	rv:1.9.2.14) Gecko/20110223 Thunderbird/3.1.8
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
Subject: zfs/zpool upgrade required?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 04 May 2011 11:29:39 -0000

Hello,

when upgrading from 8.0 to 8-STABLE, kernel and userland support new 
versions of ZFS pool and filesystem.

Is it _required_ to upgrade existing pools and filesystems or can that 
be done anytime later?

with kind regards,
Robert Schulze

From owner-freebsd-fs@FreeBSD.ORG  Wed May  4 11:55:43 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0612B106564A
	for <freebsd-fs@freebsd.org>; Wed,  4 May 2011 11:55:43 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta13.emeryville.ca.mail.comcast.net
	(qmta13.emeryville.ca.mail.comcast.net [76.96.27.243])
	by mx1.freebsd.org (Postfix) with ESMTP id E1FE28FC14
	for <freebsd-fs@freebsd.org>; Wed,  4 May 2011 11:55:42 +0000 (UTC)
Received: from omta18.emeryville.ca.mail.comcast.net ([76.96.30.74])
	by qmta13.emeryville.ca.mail.comcast.net with comcast
	id fPqs1g0021bwxycADPviun; Wed, 04 May 2011 11:55:42 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta18.emeryville.ca.mail.comcast.net with comcast
	id fPvg1g00f1t3BNj8ePvhTd; Wed, 04 May 2011 11:55:41 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 82ACE102C36; Wed,  4 May 2011 04:55:40 -0700 (PDT)
Date: Wed, 4 May 2011 04:55:40 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Robert Schulze <rs@bytecamp.net>
Message-ID: <20110504115540.GA88625@icarus.home.lan>
References: <4DC13260.4020905@bytecamp.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4DC13260.4020905@bytecamp.net>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
Subject: Re: zfs/zpool upgrade required?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 04 May 2011 11:55:43 -0000

On Wed, May 04, 2011 at 01:02:56PM +0200, Robert Schulze wrote:
> when upgrading from 8.0 to 8-STABLE, kernel and userland support new
> versions of ZFS pool and filesystem.
> 
> Is it _required_ to upgrade existing pools and filesystems or can
> that be done anytime later?

- It can be done later, though by not upgrading you lose the ability to
use newer features.  For a list of what those are, refer to the
official OpenSolaris docs.  See menu on left side, near bottom:

http://hub.opensolaris.org/bin/view/Community+Group+zfs/whatis

- Make sure to note that the pool version and the filesystem version are
separate.  Some folks remember to "zpool upgrade" but not "zfs upgrade".

- Remember that upgrading is one-way; you cannot roll back to an older
version without destroying your pools.  If you're worried, do full
backups beforehand.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Wed May  4 15:21:38 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D850A1065672
	for <freebsd-fs@freebsd.org>; Wed,  4 May 2011 15:21:38 +0000 (UTC)
	(envelope-from kraduk@gmail.com)
Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com
	[74.125.82.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 64DB48FC14
	for <freebsd-fs@freebsd.org>; Wed,  4 May 2011 15:21:38 +0000 (UTC)
Received: by wyf23 with SMTP id 23so1215889wyf.13
	for <freebsd-fs@freebsd.org>; Wed, 04 May 2011 08:21:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:in-reply-to:references:date
	:message-id:subject:from:to:cc:content-type;
	bh=XXhPHbPM5VI/aSEILoxAcJSjZAgPF3c4pY9A9/0bg2E=;
	b=mBGBSfyyyYRkBFql4T9Jb2rQrFPk0JyZkr57ndJVJd6QB+yQ7EjRKOcmwHak3RsjMc
	KkFR0AT2E8aUlny2LbTFpPsLR8qz+W0G4K71axQtlZUThcv3EzDByeITlFYN4BhQ9JvC
	c0hAETp7OzEhckzxq3VhJJp+IvovJ5Xi0kTVU=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=crHlczAPf64FvdHpJh63zXI6y51wPe26XTtMF+aaFy0M0RmqUGhlKXtEFKszRmTFrL
	YDX9YOiULUp+hUV+WdGPGToql0fs6j7xccZaDfnwSFsUa6LCXxj2J+myB6BHGkELMi9F
	FW2BRvCkEl9+GKSP2p26X49EKNAXyavfbfexo=
MIME-Version: 1.0
Received: by 10.216.143.74 with SMTP id k52mr1250647wej.0.1304522497179; Wed,
	04 May 2011 08:21:37 -0700 (PDT)
Received: by 10.216.15.73 with HTTP; Wed, 4 May 2011 08:21:37 -0700 (PDT)
In-Reply-To: <20110504115540.GA88625@icarus.home.lan>
References: <4DC13260.4020905@bytecamp.net>
	<20110504115540.GA88625@icarus.home.lan>
Date: Wed, 4 May 2011 16:21:37 +0100
Message-ID: <BANLkTimdsgsx+em88zE73oVQyGj8d_6wnw@mail.gmail.com>
From: krad <kraduk@gmail.com>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-fs@freebsd.org
Subject: Re: zfs/zpool upgrade required?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 04 May 2011 15:21:38 -0000

On 4 May 2011 12:55, Jeremy Chadwick <freebsd@jdc.parodius.com> wrote:

> On Wed, May 04, 2011 at 01:02:56PM +0200, Robert Schulze wrote:
> > when upgrading from 8.0 to 8-STABLE, kernel and userland support new
> > versions of ZFS pool and filesystem.
> >
> > Is it _required_ to upgrade existing pools and filesystems or can
> > that be done anytime later?
>
> - It can be done later, though by not upgrading you lose the ability to
> use newer features.  For a list of what those are, refer to the
> official OpenSolaris docs.  See menu on left side, near bottom:
>
> http://hub.opensolaris.org/bin/view/Community+Group+zfs/whatis
>
> - Make sure to note that the pool version and the filesystem version are
> separate.  Some folks remember to "zpool upgrade" but not "zfs upgrade".
>
> - Remember that upgrading is one-way; you cannot roll back to an older
> version without destroying your pools.  If you're worried, do full
> backups beforehand.
>
> --
> | Jeremy Chadwick                                   jdc@parodius.com |
> | Parodius Networking                       http://www.parodius.com/ |
> | UNIX Systems Administrator                  Mountain View, CA, USA |
> | Making life hard for others since 1977.               PGP 4BD6C0CB |
>
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>


Generally in production i would leave it on the old pool version for a
while, until you are confident you are not having any issues. As previously
stated you can then roll back more easily. When you are happy upgrade the
pool.

From owner-freebsd-fs@FreeBSD.ORG  Wed May  4 21:53:23 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 22E36106567A
	for <freebsd-fs@freebsd.org>; Wed,  4 May 2011 21:53:23 +0000 (UTC)
	(envelope-from mad@madpilot.net)
Received: from megatron.madpilot.net (megatron.madpilot.net [88.149.173.206])
	by mx1.freebsd.org (Postfix) with ESMTP id B39C68FC20
	for <freebsd-fs@freebsd.org>; Wed,  4 May 2011 21:53:22 +0000 (UTC)
Received: from megatron.madpilot.net (localhost [127.0.0.1])
	by megatron.madpilot.net (Postfix) with ESMTP id D468D1E73
	for <freebsd-fs@freebsd.org>; Wed,  4 May 2011 23:53:21 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=madpilot.net; h=
	content-transfer-encoding:content-type:content-type:in-reply-to
	:references:subject:subject:mime-version:user-agent:from:from
	:date:date:message-id:received:received; s=mail; t=1304545999;
	x=1306360399; bh=yVNqxY5A4TpzlaJQWHCs0kK/5P/EyTV+uYWV/XcbK3I=; b=
	IEC80IaegZVyw6xBLoGcwyUPfSs/Nu1s8pMHJNeN9H7LY5fsstCO5I25E3PdVCwf
	pHYEJx9x8aPR4GcoWeBAeuLEr5gUw+6E45y2zqREeru17n0niS8AANRpoMezNuie
	G9s/7MxBsClXJRgdMxhu2yV/Bc5uaSDJWxDYoIGySDo=
X-Virus-Scanned: amavisd-new at madpilot.net
Received: from megatron.madpilot.net ([127.0.0.1])
	by megatron.madpilot.net (megatron.madpilot.net [127.0.0.1])
	(amavisd-new, port 10024)
	with ESMTP id uoRNHTnq+exG for <freebsd-fs@freebsd.org>;
	Wed,  4 May 2011 23:53:19 +0200 (CEST)
Received: from marvin.madpilot.net (localhost [127.0.0.1])
	by megatron.madpilot.net (Postfix) with ESMTP
	for <freebsd-fs@freebsd.org>; Wed,  4 May 2011 23:53:19 +0200 (CEST)
Message-ID: <4DC1CACF.8050506@madpilot.net>
Date: Wed, 04 May 2011 23:53:19 +0200
From: Guido Falsi <mad@madpilot.net>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.17) Gecko/20110429 Lightning/1.0b2 Thunderbird/3.1.10
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
References: <4DC13260.4020905@bytecamp.net>
	<20110504115540.GA88625@icarus.home.lan>
In-Reply-To: <20110504115540.GA88625@icarus.home.lan>
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: zfs/zpool upgrade required?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 04 May 2011 21:53:23 -0000

On 05/04/11 13:55, Jeremy Chadwick wrote:
> On Wed, May 04, 2011 at 01:02:56PM +0200, Robert Schulze wrote:
>> when upgrading from 8.0 to 8-STABLE, kernel and userland support new
>> versions of ZFS pool and filesystem.
>>
>> Is it _required_ to upgrade existing pools and filesystems or can
>> that be done anytime later?
>
> - It can be done later, though by not upgrading you lose the ability to
> use newer features.  For a list of what those are, refer to the
> official OpenSolaris docs.  See menu on left side, near bottom:
>
> http://hub.opensolaris.org/bin/view/Community+Group+zfs/whatis
>
> - Make sure to note that the pool version and the filesystem version are
> separate.  Some folks remember to "zpool upgrade" but not "zfs upgrade".
>
> - Remember that upgrading is one-way; you cannot roll back to an older
> version without destroying your pools.  If you're worried, do full
> backups beforehand.
>

I'd add, if he's booting off of zfs, that it is very important to 
upgrade the boot code as soon as he upgrades the pool or he will have 
trouble booting the system which could be a pain to recover at that point.

-- 
Guido Falsi <mad@madpilot.net>

From owner-freebsd-fs@FreeBSD.ORG  Wed May  4 22:22:55 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5DAC8106566C
	for <freebsd-fs@freebsd.org>; Wed,  4 May 2011 22:22:55 +0000 (UTC)
	(envelope-from kraduk@gmail.com)
Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com
	[74.125.82.182])
	by mx1.freebsd.org (Postfix) with ESMTP id DF48D8FC19
	for <freebsd-fs@freebsd.org>; Wed,  4 May 2011 22:22:54 +0000 (UTC)
Received: by wyf23 with SMTP id 23so1577987wyf.13
	for <freebsd-fs@freebsd.org>; Wed, 04 May 2011 15:22:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:in-reply-to:references:date
	:message-id:subject:from:to:cc:content-type;
	bh=5U87xPT7kftVWpe1r83YYldlNG5wF/Q92ONSz0dtDrM=;
	b=QPQq4YV3VyOPckWNbFCipNmlmsTI5dFC5dQt2aPCKMj6nw3lbYu43EsGF3wQgdPGuQ
	lxxmqwwpqeFxWSrQGDMZCR8ECS/k1XgukHAuOgOqE3M9w4DyXoGHELcRtIYYSl8WPvPS
	x/2CniLSw5n7h26RHdiWvgDJyTCTL/EZUIWaw=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=QzteNB05WhABHmX2R0nR3ELc1sgBtm48SfY6zNelFjlG5glbf+978UzFuhNm68Q9K2
	ESTQCAz4ckXknGnuhe8hdga5Ew5kI1hUALvNhG7D3PlSU0VQNxHfbCwKVRGopbFXOVHA
	ZAx/bCMO+mV3z17vk9FEA9vBRCGoFiuRgQC2g=
MIME-Version: 1.0
Received: by 10.216.143.96 with SMTP id k74mr5464758wej.100.1304547773684;
	Wed, 04 May 2011 15:22:53 -0700 (PDT)
Received: by 10.216.15.73 with HTTP; Wed, 4 May 2011 15:22:53 -0700 (PDT)
In-Reply-To: <4DC1CACF.8050506@madpilot.net>
References: <4DC13260.4020905@bytecamp.net>
	<20110504115540.GA88625@icarus.home.lan>
	<4DC1CACF.8050506@madpilot.net>
Date: Wed, 4 May 2011 23:22:53 +0100
Message-ID: <BANLkTikwjaWa8XMh12E_cgpLv4hMhKiC=A@mail.gmail.com>
From: krad <kraduk@gmail.com>
To: Guido Falsi <mad@madpilot.net>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-fs@freebsd.org
Subject: Re: zfs/zpool upgrade required?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 04 May 2011 22:22:55 -0000

On 4 May 2011 22:53, Guido Falsi <mad@madpilot.net> wrote:

> On 05/04/11 13:55, Jeremy Chadwick wrote:
>
>> On Wed, May 04, 2011 at 01:02:56PM +0200, Robert Schulze wrote:
>>
>>> when upgrading from 8.0 to 8-STABLE, kernel and userland support new
>>> versions of ZFS pool and filesystem.
>>>
>>> Is it _required_ to upgrade existing pools and filesystems or can
>>> that be done anytime later?
>>>
>>
>> - It can be done later, though by not upgrading you lose the ability to
>> use newer features.  For a list of what those are, refer to the
>> official OpenSolaris docs.  See menu on left side, near bottom:
>>
>> http://hub.opensolaris.org/bin/view/Community+Group+zfs/whatis
>>
>> - Make sure to note that the pool version and the filesystem version are
>> separate.  Some folks remember to "zpool upgrade" but not "zfs upgrade".
>>
>> - Remember that upgrading is one-way; you cannot roll back to an older
>> version without destroying your pools.  If you're worried, do full
>> backups beforehand.
>>
>>
> I'd add, if he's booting off of zfs, that it is very important to upgrade
> the boot code as soon as he upgrades the pool or he will have trouble
> booting the system which could be a pain to recover at that point.
>
> --
> Guido Falsi <mad@madpilot.net>
>
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>


yep, probably worth using these for the time being as they are the most
uptodate boot code bits

http://people.freebsd.org/~pjd/zfsboot/
<http://people.freebsd.org/~pjd/zfsboot/>

From owner-freebsd-fs@FreeBSD.ORG  Thu May  5 01:30:18 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C8F0A106566C
	for <freebsd-fs@freebsd.org>; Thu,  5 May 2011 01:30:18 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
	[131.104.91.44])
	by mx1.freebsd.org (Postfix) with ESMTP id 7902B8FC17
	for <freebsd-fs@freebsd.org>; Thu,  5 May 2011 01:30:18 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApwEAIH8wU2DaFvO/2dsb2JhbACEUKJGtlORIoEqhF0EjzWOVg
X-IronPort-AV: E=Sophos;i="4.64,317,1301889600"; d="scan'208";a="120905583"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 04 May 2011 21:24:51 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id AE91AB3F24;
	Wed,  4 May 2011 21:24:51 -0400 (EDT)
Date: Wed, 4 May 2011 21:24:51 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
Message-ID: <277230554.1031144.1304558691708.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <86iptvg9uo.fsf@ds4.des.no>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: freebsd-fs@freebsd.org
Subject: Re: RFC: make the experimental NFS subsystem the default one
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 05 May 2011 01:30:18 -0000

> Rick Macklem <rmacklem@uoguelph.ca> writes:
> > "Dag-Erling Sm=C3=B8rgrav" <des@des.no> writes:
> > > interface oldnfs.1 already present in the KLD 'kernel'!
> > > /etc/rc: WARNING: Unable to load kernel module nfsclient
> > Ok, I'll need to look at this. At a glance, I see a load_kld,
> > but that won't get upset if it's already loaded. (It does need
> > to be fixed, though, since it refers to nfsclient as the module
> > for "nfs" instead of nfscl.)
>=20
> This comes from mountcritremote:
>=20
> case "`mount -d -a -t nfs 2> /dev/null`" in
> *mount_nfs*)
> # Handle absent nfs client support
> load_kld -m nfs nfsclient || return 1
> ;;
> esac
>=20
> mount(8) will print "mount_oldnfs" instead of "mount_nfs". Note that
> until you flipped the switch, the exact same error would occur, in
> reverse, on systems running the new stack.
>=20
Testing here, it seems that none of the NFS specific stuff is needed
in mountcritremote (as hinted by the comment). You can try the version
without the NFS specific stuff if you'd like.
It's in:
   http://people.freebsd.org/~rmacklem/rc.conf

along with the other modified/added scripts.

rick

From owner-freebsd-fs@FreeBSD.ORG  Thu May  5 06:41:30 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A33A7106564A
	for <freebsd-fs@freebsd.org>; Thu,  5 May 2011 06:41:30 +0000 (UTC)
	(envelope-from john@theusgroup.com)
Received: from theusgroup.com (theusgroup.com [64.122.243.222])
	by mx1.freebsd.org (Postfix) with ESMTP id 8F93F8FC16
	for <freebsd-fs@freebsd.org>; Thu,  5 May 2011 06:41:30 +0000 (UTC)
To: freebsd-fs@freebsd.org
Date: Wed, 04 May 2011 23:22:45 -0700
From: John <john@theusgroup.com>
Message-Id: <20110505062246.60B561C4@server.theusgroup.com>
Subject: zfs v28 destory -r snapshot failure
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: john@TheUsGroup.com
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 05 May 2011 06:41:30 -0000

Applied this patch set stable-8-zfsv28-20110501.patch.xz to a fresh download
of 8.2-release, buildworld, buildkernel, install and rebooted. Did not upgrade
pool or filesystems.

Made a snapshot of tank/foo@today, then tried to delete with
zfs destroy -r tank@today yielded:
cannot destroy 'tank@today': dataset does not exist
no snapshots destroyed

If tank@today exists along with tank/foo@today, then the destroy works
correctly.

Rebooted with kernel.old which is 8.2-release without the v28 patch and
zfs destroy -r tank@today deleted tank/foo@today without an error.

John Theus

From owner-freebsd-fs@FreeBSD.ORG  Thu May  5 07:32:13 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6C282106564A
	for <freebsd-fs@freebsd.org>; Thu,  5 May 2011 07:32:13 +0000 (UTC)
	(envelope-from gpm@hotplug.ru)
Received: from gate.pikinvest.ru (gate.pikinvest.ru [87.245.155.170])
	by mx1.freebsd.org (Postfix) with ESMTP id DB1A98FC1C
	for <freebsd-fs@freebsd.org>; Thu,  5 May 2011 07:32:12 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
	by mailgate.pik.ru (Postfix) with ESMTP id BE9521C08A3
	for <freebsd-fs@freebsd.org>; Thu,  5 May 2011 11:16:48 +0400 (MSD)
Received: from EX03PIK.PICompany.ru (unknown [192.168.156.51])
	by mailgate.pik.ru (Postfix) with ESMTP id BA1E61C08A1
	for <freebsd-fs@freebsd.org>; Thu,  5 May 2011 11:16:48 +0400 (MSD)
Received: from [192.168.148.9] ([192.168.148.9]) by EX03PIK.PICompany.ru with
	Microsoft SMTPSVC(6.0.3790.4675); Thu, 5 May 2011 11:16:35 +0400
Message-ID: <4DC24ED3.4040703@hotplug.ru>
Date: Thu, 05 May 2011 11:16:35 +0400
From: Emil Muratov <gpm@hotplug.ru>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
	rv:1.9.2.14) Gecko/20110223 Thunderbird/3.1.8
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
References: <20110505062246.60B561C4@server.theusgroup.com>
In-Reply-To: <20110505062246.60B561C4@server.theusgroup.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-OriginalArrivalTime: 05 May 2011 07:16:35.0268 (UTC)
	FILETIME=[5E067040:01CC0AF4]
Subject: Re: zfs v28 destory -r snapshot failure
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 05 May 2011 07:32:13 -0000


> Applied this patch set stable-8-zfsv28-20110501.patch.xz to a fresh download
> of 8.2-release, buildworld, buildkernel, install and rebooted. Did not upgrade
> pool or filesystems.
>
> Made a snapshot of tank/foo@today, then tried to delete with
> zfs destroy -r tank@today yielded:
> cannot destroy 'tank@today': dataset does not exist
> no snapshots destroyed
>
> If tank@today exists along with tank/foo@today, then the destroy works
> correctly.
>
> Rebooted with kernel.old which is 8.2-release without the v28 patch and
> zfs destroy -r tank@today deleted tank/foo@today without an error.

Same here. zfSnap utility no longer purges old snaps since upgrading to 
v28. Manual testing discovered that zfs destroy -r no longer works as 
expected for snapshots, for datasets ok.


From owner-freebsd-fs@FreeBSD.ORG  Thu May  5 08:19:52 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 816471065672
	for <freebsd-fs@freebsd.org>; Thu,  5 May 2011 08:19:52 +0000 (UTC)
	(envelope-from rs@bytecamp.net)
Received: from mail.bytecamp.net (mail.bytecamp.net [212.204.60.9])
	by mx1.freebsd.org (Postfix) with ESMTP id 0F05D8FC1A
	for <freebsd-fs@freebsd.org>; Thu,  5 May 2011 08:19:51 +0000 (UTC)
Received: (qmail 39916 invoked by uid 89); 5 May 2011 10:19:50 +0200
Received: from stella.bytecamp.net (HELO ?212.204.60.37?)
	(rs%bytecamp.net@212.204.60.37)
	by mail.bytecamp.net with CAMELLIA256-SHA encrypted SMTP;
	5 May 2011 10:19:50 +0200
Message-ID: <4DC25DA6.3060009@bytecamp.net>
Date: Thu, 05 May 2011 10:19:50 +0200
From: Robert Schulze <rs@bytecamp.net>
Organization: bytecamp GmbH
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
	rv:1.9.2.14) Gecko/20110223 Thunderbird/3.1.8
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
Subject: zfs l2arc issue
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 05 May 2011 08:19:52 -0000

Hi,

we are running an NFS server with the following pool setup:

home        ONLINE       0     0     0
	  raidz2    ONLINE       0     0     0
	    da1     ONLINE       0     0     0
	    da2     ONLINE       0     0     0
	    da3     ONLINE       0     0     0
	    da4     ONLINE       0     0     0
	    da5     ONLINE       0     0     0
	  raidz2    ONLINE       0     0     0
	    da6     ONLINE       0     0     0
	    da7     ONLINE       0     0     0
	    da8     ONLINE       0     0     0
	    da9     ONLINE       0     0     0
	    da10    ONLINE       0     0     0
	logs        ONLINE       0     0     0
	  mirror    ONLINE       0     0     0
	    da12    ONLINE       0     0     0
	    da13    ONLINE       0     0     0
	cache
	  ad4       ONLINE       0     0     0
	  ad8       ONLINE       0     0     0


All drives except the caching SSDs are attached to a LSI 9690SA-8I.
The system is equipped with 32 GB RAM, and runs with a load of <1, 
please note: we are running 8.0, yet, since there was one issue with ZFS 
which blocked the upgrade to 8-STABLE.

After about 100d uptime, we had a sudden large increase in load of about 
5-7, nfsd had 100-400% WCPU. Also an rsync downloading files from that 
machine was very slow.

We didn't really narrow down the problem, we had to reboot the machine 
because performance was nearly completely absent. After reboot, system 
performance became normal.

Could this problem be related to the caching SSDs beeing full? Cache 
consists of two 76 GB SSDs, after warming up, only 8 MB are free on each 
disk.
Is ZFS supposed to fill arbitrary large caches? I think of doubling the 
cache and then ending up with fully filled SSDs again. For if, could 
l2arc be limited somehow, so that SSDs don't get written full?

Could this behaviour also appear in 8-STABLE?

With kind regards,
Robert Schulze

From owner-freebsd-fs@FreeBSD.ORG  Thu May  5 10:13:45 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 07CC91065673
	for <freebsd-fs@freebsd.org>; Thu,  5 May 2011 10:13:45 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta04.westchester.pa.mail.comcast.net
	(qmta04.westchester.pa.mail.comcast.net [76.96.62.40])
	by mx1.freebsd.org (Postfix) with ESMTP id A9C0F8FC13
	for <freebsd-fs@freebsd.org>; Thu,  5 May 2011 10:13:44 +0000 (UTC)
Received: from omta17.westchester.pa.mail.comcast.net ([76.96.62.89])
	by qmta04.westchester.pa.mail.comcast.net with comcast
	id fm3S1g0041vXlb854mDk5V; Thu, 05 May 2011 10:13:44 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta17.westchester.pa.mail.comcast.net with comcast
	id fmDj1g0091t3BNj3dmDjdG; Thu, 05 May 2011 10:13:44 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id E1054102C36; Thu,  5 May 2011 03:13:41 -0700 (PDT)
Date: Thu, 5 May 2011 03:13:41 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Robert Schulze <rs@bytecamp.net>
Message-ID: <20110505101341.GA10618@icarus.home.lan>
References: <4DC25DA6.3060009@bytecamp.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4DC25DA6.3060009@bytecamp.net>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
Subject: Re: zfs l2arc issue
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 05 May 2011 10:13:45 -0000

On Thu, May 05, 2011 at 10:19:50AM +0200, Robert Schulze wrote:
> we are running an NFS server with the following pool setup:
> 
> home        ONLINE       0     0     0
> 	  raidz2    ONLINE       0     0     0
> 	    da1     ONLINE       0     0     0
> 	    da2     ONLINE       0     0     0
> 	    da3     ONLINE       0     0     0
> 	    da4     ONLINE       0     0     0
> 	    da5     ONLINE       0     0     0
> 	  raidz2    ONLINE       0     0     0
> 	    da6     ONLINE       0     0     0
> 	    da7     ONLINE       0     0     0
> 	    da8     ONLINE       0     0     0
> 	    da9     ONLINE       0     0     0
> 	    da10    ONLINE       0     0     0
> 	logs        ONLINE       0     0     0
> 	  mirror    ONLINE       0     0     0
> 	    da12    ONLINE       0     0     0
> 	    da13    ONLINE       0     0     0
> 	cache
> 	  ad4       ONLINE       0     0     0
> 	  ad8       ONLINE       0     0     0
> 
> 
> All drives except the caching SSDs are attached to a LSI 9690SA-8I.
> The system is equipped with 32 GB RAM, and runs with a load of <1,
> please note: we are running 8.0, yet, since there was one issue with
> ZFS which blocked the upgrade to 8-STABLE.
> 
> After about 100d uptime, we had a sudden large increase in load of
> about 5-7, nfsd had 100-400% WCPU. Also an rsync downloading files
> from that machine was very slow.
> 
> We didn't really narrow down the problem, we had to reboot the
> machine because performance was nearly completely absent. After
> reboot, system performance became normal.
> 
> Could this problem be related to the caching SSDs beeing full? Cache
> consists of two 76 GB SSDs, after warming up, only 8 MB are free on
> each disk.
> Is ZFS supposed to fill arbitrary large caches? I think of doubling
> the cache and then ending up with fully filled SSDs again. For if,
> could l2arc be limited somehow, so that SSDs don't get written full?
> 
> Could this behaviour also appear in 8-STABLE?

To readers: make sure you note this user is running either 8.0-RELEASE
or 8.0-STABLE.  ZFS during that time is very different and **many**
pieces to its innards and tweaking/tuning pieces are different now.

- It would help if we could match disk types (SSDs, etc.) to a device
string. "camcontrol devlist -v" would be useful on this machine.

- nfsd taking up 100-400% CPU (that has been addressed in a later
release by the way; it will show 100% total for all 4 cores; I believe
"top -C" changes the behaviour) doesn't tell us much.  What was nfsd
actually *doing* during that time?  Could you "procstat -kk PID"?
Did you try using "ktrace -i -t+ -p PID" to see what syscalls it was
making?

- Have you done any system tuning on this machine for ZFS?  It's very
important that you provide the following:

  - uname -a (you can hide/XXX-out the machine name).  This will
    provide both the exact build date (which hopefully will match
    what time your kernel sources were synced), and whether or not
    the machine is i386 or amd64
  - Contents of /etc/sysctl.conf
  - Contents of /boot/loader.conf
  - Contents of /etc/rc.conf (you can XXX out machine names, IPs, etc.)
  - Output from dmesg (after a fresh reboot is fine)
  - Output from "sysctl -a vfs.zfs"
  - Output from "sysctl -a kstat.zfs"
  - Output from "top" when the issue is occurring; interested mainly
    in the high-CPU-usage processes as well as all the system/memory
    statistics
  - Output from "zpool iostat -v 1" when the issue is occurring.

I should warn you in advance: you're asking for assistance with
something that's "fairly old", and as I stated in the "To readers"
section, ZFS on 8.0 is very different than 8.2.  There are all sorts of
tunings/adjustments that are required there that are not on 8.2.

I think most of us would like to know what single ZFS issue is keeping
you from upgrading the machine to RELENG_8 / 8.2-STABLE.  I think
overall it might make the most sense to address or fix that problem for
you and then have you try 8.2-STABLE to see if the above issue persists.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Thu May  5 10:39:37 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1EB571065673
	for <freebsd-fs@freebsd.org>; Thu,  5 May 2011 10:39:37 +0000 (UTC)
	(envelope-from rs@bytecamp.net)
Received: from mail.bytecamp.net (mail.bytecamp.net [212.204.60.9])
	by mx1.freebsd.org (Postfix) with ESMTP id 6797F8FC19
	for <freebsd-fs@freebsd.org>; Thu,  5 May 2011 10:39:35 +0000 (UTC)
Received: (qmail 78909 invoked by uid 89); 5 May 2011 12:39:35 +0200
Received: from stella.bytecamp.net (HELO ?212.204.60.37?)
	(rs%bytecamp.net@212.204.60.37)
	by mail.bytecamp.net with CAMELLIA256-SHA encrypted SMTP;
	5 May 2011 12:39:35 +0200
Message-ID: <4DC27E66.70904@bytecamp.net>
Date: Thu, 05 May 2011 12:39:34 +0200
From: Robert Schulze <rs@bytecamp.net>
Organization: bytecamp GmbH
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
	rv:1.9.2.17) Gecko/20110424 Thunderbird/3.1.10
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
References: <4DC25DA6.3060009@bytecamp.net>
	<20110505101341.GA10618@icarus.home.lan>
In-Reply-To: <20110505101341.GA10618@icarus.home.lan>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: zfs l2arc issue
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 05 May 2011 10:39:37 -0000

Hi,

Am 05.05.2011 12:13, schrieb Jeremy Chadwick:
> I think most of us would like to know what single ZFS issue is keeping
> you from upgrading the machine to RELENG_8 / 8.2-STABLE.  I think
> overall it might make the most sense to address or fix that problem for
> you and then have you try 8.2-STABLE to see if the above issue persists.
>

there _was_ a problem causing the kernel to panic with highly nested 
filesystems (kern/154681 thread stack size too small), which was fixed 
by avg@ in mid march. A panic every three days is not tolerable in 
production use, so we waited with upgrading.

Of course I know, that issues with old FreeBSD releases are not very 
gladly seen on this list, well, we will upgrade the machine in the 
upcoming days and hope for the best. *sigh*

with kind regards,
Robert Schulze

From owner-freebsd-fs@FreeBSD.ORG  Thu May  5 13:32:34 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3F4631065674;
	Thu,  5 May 2011 13:32:34 +0000 (UTC)
	(envelope-from pawel@dawidek.net)
Received: from mail.garage.freebsd.pl (60.wheelsystems.com [83.12.187.60])
	by mx1.freebsd.org (Postfix) with ESMTP id D17588FC18;
	Thu,  5 May 2011 13:32:33 +0000 (UTC)
Received: by mail.garage.freebsd.pl (Postfix, from userid 65534)
	id 226D14569A; Thu,  5 May 2011 15:32:32 +0200 (CEST)
Received: from localhost (public-gprs14895.centertel.pl [87.96.58.47])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.garage.freebsd.pl (Postfix) with ESMTP id 287CD45684;
	Thu,  5 May 2011 15:32:21 +0200 (CEST)
Date: Thu, 5 May 2011 15:31:56 +0200
From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
To: Alexander Leidinger <Alexander@Leidinger.net>
Message-ID: <20110505133156.GE14661@garage.freebsd.pl>
References: <4DBBB20A.5050102@FreeBSD.org>
	<20110430072831.GA65598@icarus.home.lan>
	<20110501000656.00007ea1@unknown>
	<20110501133752.GC3245@garage.freebsd.pl>
	<20110503134826.712070yt2urhxp8g@webmail.leidinger.net>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="imjhCm/Pyz7Rq5F2"
Content-Disposition: inline
In-Reply-To: <20110503134826.712070yt2urhxp8g@webmail.leidinger.net>
X-OS: FreeBSD 9.0-CURRENT amd64
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on 
	mail.garage.freebsd.pl
X-Spam-Level: 
X-Spam-Status: No, score=-2.6 required=4.5 tests=BAYES_00 autolearn=ham 
	version=3.0.4
Cc: freebsd-fs@FreeBSD.org, Alexander Motin <mav@FreeBSD.org>
Subject: Re: TRIM clustering
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 05 May 2011 13:32:34 -0000


--imjhCm/Pyz7Rq5F2
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Tue, May 03, 2011 at 01:48:26PM +0200, Alexander Leidinger wrote:
> Quoting Pawel Jakub Dawidek <pjd@FreeBSD.org> (from Sun, 1 May 2011
> 15:37:52 +0200):
>=20
> >On Sun, May 01, 2011 at 12:06:56AM +0200, Alexander Leidinger wrote:
> >>On Sat, 30 Apr 2011 00:28:31 -0700 Jeremy Chadwick
> >><freebsd@jdc.parodius.com> wrote:
> >>
> >>> On Sat, Apr 30, 2011 at 09:54:02AM +0300, Alexander Motin wrote:
> >>
> >>> Other notes: TRIM needs to be supported on swap as well, and in my
> >>> opinion this is just as important as it being in UFS.  I'm not sure
> >>> how one would implement that.
> >>
> >>This brings up the question if a ZFS cache (where the contents do not
> >>survive a reboot) is completely TRIMmed before used (and normally
> >>trimmed during use)...
> >
> >It is not trimmed at all.
>=20
> This does not sound like the optimal solution... is there a way to
> know the first access after boot/attach to a cache device? If yes,
> would it be possible to TRIM the complete provider (except for some
> static data which needs to be there) from this place? This would not
> solve the not TRIMmed during use part, put at least a
> reboot/reattach could provide a sane state.

Doing TRIM for cache devices before first use might be slightly useful,
but it may make the boot time longer. L2ARC is designed to work with
very slow devices - if they cannot keep up we will simply not evict
cache from ARC to L2ARC. That's not a big problem.

Doing TRIM for cache devices at run time seems pointless to me. Optimal
use is when cache device is 100% full, so new data replaces old data and
there is no window where we could put TRIM. We would need to replace
writes with trim+write, which will increase the latency.

TRIM will be more useful for regular data within a pool and most useful
for log devices as we do free blocks there and this is where latency is
critical (log devices are there to reduce latency).

--=20
Pawel Jakub Dawidek                       http://www.wheelsystems.com
FreeBSD committer                         http://www.FreeBSD.org
Am I Evil? Yes, I Am!                     http://yomoli.com

--imjhCm/Pyz7Rq5F2
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (FreeBSD)

iEYEARECAAYFAk3CpssACgkQForvXbEpPzT6+QCdFVDXFHUJmgrv4BqkgWeLbqn2
bAoAoLyI0fjfMP5ZLAo6WS94/jevKKGh
=6roC
-----END PGP SIGNATURE-----

--imjhCm/Pyz7Rq5F2--

From owner-freebsd-fs@FreeBSD.ORG  Thu May  5 14:10:45 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 15DD71065670
	for <freebsd-fs@freebsd.org>; Thu,  5 May 2011 14:10:45 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
	[131.104.91.44])
	by mx1.freebsd.org (Postfix) with ESMTP id B171F8FC21
	for <freebsd-fs@freebsd.org>; Thu,  5 May 2011 14:10:44 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApwEAKauwk2DaFvO/2dsb2JhbACEUKJdtEGRL4EqhF0Ej0qOaw
X-IronPort-AV: E=Sophos;i="4.64,319,1301889600"; d="scan'208";a="121340463"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 05 May 2011 09:59:25 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id E5E33793A7
	for <freebsd-fs@freebsd.org>; Thu,  5 May 2011 09:59:25 -0400 (EDT)
Date: Thu, 5 May 2011 09:59:25 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: FreeBSD FS <freebsd-fs@freebsd.org>
Message-ID: <237603556.1045829.1304603965877.JavaMail.root@erie.cs.uoguelph.ca>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Subject: fixing NFS related sysctl naming
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 05 May 2011 14:10:45 -0000

Hi,

Right now there are separate name paths for sysctls used by the
two NFS clients, which is awkward since scripts like to play
with them (currently using "vfs.nfs" which is the old one).

One thought I had was moving the SYSCTL()s and the global
variables they manipulate into the "nfslock" modules, which
is shared by both clients, so that changing "vfs.nfs.xxx"
will affect both NFS clients concurrently.

How does this idea sound?

Any other suggestions on how to best deal with this?

Thanks in advance for any comment, rick

From owner-freebsd-fs@FreeBSD.ORG  Thu May  5 14:40:19 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BF6441065674;
	Thu,  5 May 2011 14:40:19 +0000 (UTC)
	(envelope-from alexander@leidinger.net)
Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de
	[217.11.53.44])
	by mx1.freebsd.org (Postfix) with ESMTP id 6D1848FC14;
	Thu,  5 May 2011 14:40:19 +0000 (UTC)
Received: from outgoing.leidinger.net (p5B155AFC.dip.t-dialin.net
	[91.21.90.252])
	by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id 7D59C844018;
	Thu,  5 May 2011 16:40:05 +0200 (CEST)
Received: from webmail.leidinger.net (webmail.Leidinger.net
	[IPv6:fd73:10c7:2053:1::2:102])
	by outgoing.leidinger.net (Postfix) with ESMTP id 3529A11FB;
	Thu,  5 May 2011 16:40:02 +0200 (CEST)
Received: (from www@localhost)
	by webmail.leidinger.net (8.14.4/8.13.8/Submit) id p45Ee1ll098311;
	Thu, 5 May 2011 16:40:01 +0200 (CEST)
	(envelope-from Alexander@Leidinger.net)
Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by
	webmail.leidinger.net (Horde Framework) with HTTP; Thu, 05 May 2011
	16:40:01 +0200
Message-ID: <20110505164001.79532nb02isxjlxc@webmail.leidinger.net>
Date: Thu, 05 May 2011 16:40:01 +0200
From: Alexander Leidinger <Alexander@Leidinger.net>
To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
References: <4DBBB20A.5050102@FreeBSD.org>
	<20110430072831.GA65598@icarus.home.lan>
	<20110501000656.00007ea1@unknown>
	<20110501133752.GC3245@garage.freebsd.pl>
	<20110503134826.712070yt2urhxp8g@webmail.leidinger.net>
	<20110505133156.GE14661@garage.freebsd.pl>
In-Reply-To: <20110505133156.GE14661@garage.freebsd.pl>
MIME-Version: 1.0
Content-Type: text/plain;
 charset=UTF-8;
 DelSp="Yes";
 format="flowed"
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.6)
X-EBL-MailScanner-Information: Please contact the ISP for more information
X-EBL-MailScanner-ID: 7D59C844018.AE39A
X-EBL-MailScanner: Found to be clean
X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN,
	SpamAssassin (not cached, score=0, required 6, autolearn=disabled)
X-EBL-MailScanner-From: alexander@leidinger.net
X-EBL-MailScanner-Watermark: 1305211206.33085@AJSavKJQhc/K66neZSmg+w
X-EBL-Spam-Status: No
Cc: freebsd-fs@FreeBSD.org, Alexander Motin <mav@FreeBSD.org>
Subject: Re: TRIM clustering
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 05 May 2011 14:40:19 -0000

Quoting Pawel Jakub Dawidek <pjd@FreeBSD.org> (from Thu, 5 May 2011  
15:31:56 +0200):

> On Tue, May 03, 2011 at 01:48:26PM +0200, Alexander Leidinger wrote:
>> Quoting Pawel Jakub Dawidek <pjd@FreeBSD.org> (from Sun, 1 May 2011
>> 15:37:52 +0200):
>>
>> >On Sun, May 01, 2011 at 12:06:56AM +0200, Alexander Leidinger wrote:
>> >>On Sat, 30 Apr 2011 00:28:31 -0700 Jeremy Chadwick
>> >><freebsd@jdc.parodius.com> wrote:
>> >>
>> >>> On Sat, Apr 30, 2011 at 09:54:02AM +0300, Alexander Motin wrote:
>> >>
>> >>> Other notes: TRIM needs to be supported on swap as well, and in my
>> >>> opinion this is just as important as it being in UFS.  I'm not sure
>> >>> how one would implement that.
>> >>
>> >>This brings up the question if a ZFS cache (where the contents do not
>> >>survive a reboot) is completely TRIMmed before used (and normally
>> >>trimmed during use)...
>> >
>> >It is not trimmed at all.
>>
>> This does not sound like the optimal solution... is there a way to

> TRIM will be more useful for regular data within a pool and most useful
> for log devices as we do free blocks there and this is where latency is
> critical (log devices are there to reduce latency).

Wait, does this mean that ZFS does not TRIM at all? I was  
understanding your first answer as the cache is not trimmed at all.

Bye,
Alexander.

-- 
If *I* had a hammer, there'd be no more folk singers.

http://www.Leidinger.net    Alexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org       netchild @ FreeBSD.org  : PGP ID = 72077137

From owner-freebsd-fs@FreeBSD.ORG  Thu May  5 16:36:41 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8F1B11065672
	for <freebsd-fs@freebsd.org>; Thu,  5 May 2011 16:36:41 +0000 (UTC)
	(envelope-from fjwcash@gmail.com)
Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com
	[209.85.213.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 478328FC0C
	for <freebsd-fs@freebsd.org>; Thu,  5 May 2011 16:36:40 +0000 (UTC)
Received: by yxl31 with SMTP id 31so1055611yxl.13
	for <freebsd-fs@freebsd.org>; Thu, 05 May 2011 09:36:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:in-reply-to:references:date
	:message-id:subject:from:to:cc:content-type;
	bh=xNUO0xrAIAVlTqZtyNrG7mYGS9y70hwYJGWHnmvzmQE=;
	b=pk7APgbd9vkW3TfIBz4Q+3mrZVjuUvPliybN9WIgPfERn/tVxi5gcnOcnIpQjB1S3k
	lOc2SrluqTkwt5zj3A6T3rcfULVrpDpG5/xKqeU4tyXvZmh7naRZQVltn0kfoV3rY2t/
	a//uEyQrAubU7fHPrBU5sdYHjHVqVmqTM0lGs=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=Bgl3RrkVcZqmWeaOTShi4OTPtEic7WRv9E/Zs1uzU5OzK0lA7UDHx+UxWEEsn3n6sB
	YQkU5vfPwrCqeGUKV60EHTbM+iw3tmqrY/vZtsufAtODIVB61+4GTzkczgeq6G/D0aVr
	3lcI0K6/pT+sS+rBPhxiygg2Ewh6/Gc4SXej0=
MIME-Version: 1.0
Received: by 10.90.113.15 with SMTP id l15mr492864agc.32.1304613400354; Thu,
	05 May 2011 09:36:40 -0700 (PDT)
Received: by 10.90.52.15 with HTTP; Thu, 5 May 2011 09:36:40 -0700 (PDT)
In-Reply-To: <20110505062246.60B561C4@server.theusgroup.com>
References: <20110505062246.60B561C4@server.theusgroup.com>
Date: Thu, 5 May 2011 09:36:40 -0700
Message-ID: <BANLkTinFy+qhEQ78Qc6sn=HNJL9NBwQx_A@mail.gmail.com>
From: Freddie Cash <fjwcash@gmail.com>
To: john@theusgroup.com
Content-Type: text/plain; charset=UTF-8
Cc: freebsd-fs@freebsd.org
Subject: Re: zfs v28 destory -r snapshot failure
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 05 May 2011 16:36:41 -0000

On Wed, May 4, 2011 at 11:22 PM, John <john@theusgroup.com> wrote:
> Applied this patch set stable-8-zfsv28-20110501.patch.xz to a fresh download
> of 8.2-release, buildworld, buildkernel, install and rebooted. Did not upgrade
> pool or filesystems.
>
> Made a snapshot of tank/foo@today, then tried to delete with
> zfs destroy -r tank@today yielded:
> cannot destroy 'tank@today': dataset does not exist
> no snapshots destroyed

So, you have a snapshot tank/foo@today, but you don't have a snapshot
tank@today, and you expect it to be able to delete the non-existent
tank@foo?

> If tank@today exists along with tank/foo@today, then the destroy works
> correctly.

Makes sense.  tank@today exists, so you can destroy it.
tank/foo@today also exists, so you can destroy it as part of the
recursion.

> Rebooted with kernel.old which is 8.2-release without the v28 patch and
> zfs destroy -r tank@today deleted tank/foo@today without an error.

That sounds like an error, since you shouldn't be able to destroy
something that doesn't exist.

But, maybe my understanding of how -r works is faulty.

-- 
Freddie Cash
fjwcash@gmail.com

From owner-freebsd-fs@FreeBSD.ORG  Thu May  5 16:39:58 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7A82B1065678
	for <freebsd-fs@freebsd.org>; Thu,  5 May 2011 16:39:58 +0000 (UTC)
	(envelope-from artemb@gmail.com)
Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com
	[209.85.216.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 33FBE8FC0C
	for <freebsd-fs@freebsd.org>; Thu,  5 May 2011 16:39:58 +0000 (UTC)
Received: by qwc9 with SMTP id 9so2018614qwc.13
	for <freebsd-fs@freebsd.org>; Thu, 05 May 2011 09:39:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type;
	bh=t+750fvyyi9Cd34g9H5+EaBnBv3K8VYorAF3/YK1gbI=;
	b=pf7Vk1Vb5L1gOzw6mXIRmdaGHF8n9YfnsXycL8MVIYBJY4BMeUM2yDjRezR5GeP49U
	+zKlRSCjVIT+4m6f7Hsz6uXU+4pKF9gum039/yBuQATWUvloG2r1xaCFvsiS4mYDFMLg
	z7Q47sRScdqvlKNR1zKsIIEF9EZZHHQG8HEeM=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type;
	b=sAlPRdPcNibShwlNGvl9KBhMQbM6LnCF187NhoccvueMRETczhkZR9Cpv/lzRkdS7K
	vZQdVTznOr26Cr7ut0vuT2wzhClwgdKrZpNyQQA1/qSaAwR02pXbvOG/1J9C30m5RPiy
	Vtwda1pUmz1KRJ6GuHf9xfjhwHZU/OFj0WyJI=
MIME-Version: 1.0
Received: by 10.229.107.38 with SMTP id z38mr1617240qco.158.1304613597545;
	Thu, 05 May 2011 09:39:57 -0700 (PDT)
Sender: artemb@gmail.com
Received: by 10.229.95.140 with HTTP; Thu, 5 May 2011 09:39:57 -0700 (PDT)
In-Reply-To: <4DC25DA6.3060009@bytecamp.net>
References: <4DC25DA6.3060009@bytecamp.net>
Date: Thu, 5 May 2011 09:39:57 -0700
X-Google-Sender-Auth: zbL4nxIhdqb46cqbjXvdaxHxP_s
Message-ID: <BANLkTimbyUEK=1TsND0z8y6QL5-EnSqnzA@mail.gmail.com>
From: Artem Belevich <art@freebsd.org>
To: Robert Schulze <rs@bytecamp.net>
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-fs@freebsd.org
Subject: Re: zfs l2arc issue
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 05 May 2011 16:39:58 -0000

On Thu, May 5, 2011 at 1:19 AM, Robert Schulze <rs@bytecamp.net> wrote:
> All drives except the caching SSDs are attached to a LSI 9690SA-8I.
> The system is equipped with 32 GB RAM, and runs with a load of <1, please
> note: we are running 8.0, yet, since there was one issue with ZFS which
> blocked the upgrade to 8-STABLE.
>
> After about 100d uptime, we had a sudden large increase in load of about
> 5-7, nfsd had 100-400% WCPU. Also an rsync downloading files from that
> machine was very slow.

There was an issue with clock_t type overflow . It was fixed in
r218429 on Feb 8th in 8-stable.
One of its effects was that it would cause L2ARC feeding thread to
spin endlessly after about a month of uptime. It's possible that there
are other scenarios where clock_t overflow in ZFS code would cause
strange things to happen. I would suggest migrating to 8-STABLE as
there were number of ZFS-related fixes committed since 8.0.

--Artem

From owner-freebsd-fs@FreeBSD.ORG  Thu May  5 17:01:14 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 052D6106566B;
	Thu,  5 May 2011 17:01:14 +0000 (UTC)
	(envelope-from pawel@dawidek.net)
Received: from mail.garage.freebsd.pl (60.wheelsystems.com [83.12.187.60])
	by mx1.freebsd.org (Postfix) with ESMTP id 9CE8F8FC1D;
	Thu,  5 May 2011 17:01:12 +0000 (UTC)
Received: by mail.garage.freebsd.pl (Postfix, from userid 65534)
	id D3D5E45CAC; Thu,  5 May 2011 19:01:11 +0200 (CEST)
Received: from localhost (public-gprs14895.centertel.pl [87.96.58.47])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.garage.freebsd.pl (Postfix) with ESMTP id 4E24F45684;
	Thu,  5 May 2011 19:01:02 +0200 (CEST)
Date: Thu, 5 May 2011 19:00:37 +0200
From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
To: Alexander Leidinger <Alexander@Leidinger.net>
Message-ID: <20110505170037.GG14661@garage.freebsd.pl>
References: <4DBBB20A.5050102@FreeBSD.org>
	<20110430072831.GA65598@icarus.home.lan>
	<20110501000656.00007ea1@unknown>
	<20110501133752.GC3245@garage.freebsd.pl>
	<20110503134826.712070yt2urhxp8g@webmail.leidinger.net>
	<20110505133156.GE14661@garage.freebsd.pl>
	<20110505164001.79532nb02isxjlxc@webmail.leidinger.net>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="lHGcFxmlz1yfXmOs"
Content-Disposition: inline
In-Reply-To: <20110505164001.79532nb02isxjlxc@webmail.leidinger.net>
X-OS: FreeBSD 9.0-CURRENT amd64
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on 
	mail.garage.freebsd.pl
X-Spam-Level: 
X-Spam-Status: No, score=-2.6 required=4.5 tests=BAYES_00 autolearn=ham 
	version=3.0.4
Cc: freebsd-fs@FreeBSD.org, Alexander Motin <mav@FreeBSD.org>
Subject: Re: TRIM clustering
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 05 May 2011 17:01:14 -0000


--lHGcFxmlz1yfXmOs
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, May 05, 2011 at 04:40:01PM +0200, Alexander Leidinger wrote:
> >>>>This brings up the question if a ZFS cache (where the contents do not
> >>>>survive a reboot) is completely TRIMmed before used (and normally
> >>>>trimmed during use)...
> >>>
> >>>It is not trimmed at all.
> >>
> >>This does not sound like the optimal solution... is there a way to
>=20
> >TRIM will be more useful for regular data within a pool and most useful
> >for log devices as we do free blocks there and this is where latency is
> >critical (log devices are there to reduce latency).
>=20
> Wait, does this mean that ZFS does not TRIM at all? I was
> understanding your first answer as the cache is not trimmed at all.

You asked for cache and I answered about cache, but ZFS does not TRIM in
general.

--=20
Pawel Jakub Dawidek                       http://www.wheelsystems.com
FreeBSD committer                         http://www.FreeBSD.org
Am I Evil? Yes, I Am!                     http://yomoli.com

--lHGcFxmlz1yfXmOs
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (FreeBSD)

iEYEARECAAYFAk3C17QACgkQForvXbEpPzSK/wCgy0KzaVqs5NDHmib8NnlBdyUl
phgAoNTfMDvlX/weLtSpUz3fyPWjZorq
=QJZr
-----END PGP SIGNATURE-----

--lHGcFxmlz1yfXmOs--

From owner-freebsd-fs@FreeBSD.ORG  Thu May  5 17:53:38 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BCDC5106566B
	for <freebsd-fs@freebsd.org>; Thu,  5 May 2011 17:53:38 +0000 (UTC)
	(envelope-from toasty@dragondata.com)
Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com
	[209.85.214.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 81A518FC16
	for <freebsd-fs@freebsd.org>; Thu,  5 May 2011 17:53:38 +0000 (UTC)
Received: by iwn33 with SMTP id 33so2836842iwn.13
	for <freebsd-fs@freebsd.org>; Thu, 05 May 2011 10:53:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=dragondata.com; s=google;
	h=domainkey-signature:from:content-type:content-transfer-encoding
	:subject:date:message-id:to:mime-version:x-mailer;
	bh=+4yRosvY4OeW8+IRJ+ZBkW4eIlFoZpC24Zc1S17CZvU=;
	b=krWbJ57QBWUNbQqiPPx2+LmjBiInULnyU9Vprx7o1iSmfjxwFLE2+JEnch/nVjhJXD
	XCuZr6RuKg4qQjDL0OxnKJVZi6t+b56wpHH9I3DwOOZ5gavTMJDRRQEzTjT9Nr05QwtK
	MOM+OpiiND3zjDVOQVqka2NX35m3RXFSxItKM=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=dragondata.com; s=google;
	h=from:content-type:content-transfer-encoding:subject:date:message-id
	:to:mime-version:x-mailer;
	b=ixB7kNDNxuD+FITjij7L/9MQGKsfNZbLOKeIC0smCqZd7DfiRd9XI7zzCURN5z7pG9
	Mo+f+e8kphuvEI8WdXcLck1bzbqqU4pDOdYxj67OPPfd+lmlpmzWPE3Cg7AW4Q+UTkok
	3ThqZ02ogE2w1xUacXbLy9jSwdFu39M8CZVDQ=
Received: by 10.43.58.148 with SMTP id wk20mr1341731icb.242.1304616329041;
	Thu, 05 May 2011 10:25:29 -0700 (PDT)
Received: from vpn177.ord02.your.org (vpn177.ord02.your.org [204.9.55.177])
	by mx.google.com with ESMTPS id g16sm974803ibb.37.2011.05.05.10.25.26
	(version=TLSv1/SSLv3 cipher=OTHER);
	Thu, 05 May 2011 10:25:27 -0700 (PDT)
From: Kevin Day <toasty@dragondata.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Date: Thu, 5 May 2011 12:25:24 -0500
Message-Id: <C2BDEBA9-DD29-4B0B-B125-89B93F5997BA@dragondata.com>
To: freebsd-fs@freebsd.org
Mime-Version: 1.0 (Apple Message framework v1084)
X-Mailer: Apple Mail (2.1084)
Subject: "gpart show" stuck in loop
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 05 May 2011 17:53:38 -0000


We've had one of our boxes getting stuck with "gpart show" (called from =
rc startup scripts) consuming 100% cpu after each reboot. Manually =
running "gpart show" gives me:

# gpart show |more
=3D>       63  715571136  amrd0  MBR  (341G)
         63  715567167      1  freebsd  [active]  (341G)
  715567230       3969         - free -  (1.9M)

=3D>        0  715567167  amrd0s1  BSD  (341G)
          0  696254464        1  freebsd-ufs  (332G)
  696254464   19312703        2  freebsd-swap  (9.2G)

=3D>        63  5860573110  da0  MBR  (2.7T)
          63  2147472747    1  freebsd  [active]  (1.0T)
  2147472810  2147472810    2  freebsd  [active]  (1.0T)
  4294945620  -2729352721    3  freebsd  [active]  ()
  1565592899   581879911       - free -  (277G)
  2147472810  2147472810    2  freebsd  [active]  (1.0T)
  4294945620  -2729352721    3  freebsd  [active]  ()
  1565592899   581879911       - free -  (277G)
  2147472810  2147472810    2  freebsd  [active]  (1.0T)
  4294945620  -2729352721    3  freebsd  [active]  ()
  1565592899   581879911       - free -  (277G)
  2147472810  2147472810    2  freebsd  [active]  (1.0T)
  4294945620  -2729352721    3  freebsd  [active]  ()
  1565592899   581879911       - free -  (277G)
  2147472810  2147472810    2  freebsd  [active]  (1.0T)
  4294945620  -2729352721    3  freebsd  [active]  ()
  1565592899   581879911       - free -  (277G)
(repeating forever)


I'm guessing something is corrupt in the partition table. I'm happy to =
file a PR on this, but I can only leave this untouched for a day or two =
max before I'm going to have to wipe this and start over for a new =
customer who needs this storage array. Is there anything anyone could =
suggest looking at or preserving before I'm forced to delete this?=20

The storage system came to me configured like this, I don't know what =
the previous owner was attempting to do, or how they ended up with the =
partitions like this.

-- Kevin


da0 at mpt0 bus 0 scbus0 target 0 lun 0
da0: <APPLE Xserve RAID 1.51> Fixed Direct Access SCSI-5 device=20
da0: 100.000MB/s transfers
da0: Command Queueing enabled
da0: 2861608MB (5860573184 512 byte sectors: 255H 63S/T 364803C)


# fdisk da0
******* Working on device /dev/da0 *******
parameters extracted from in-core disklabel are:
cylinders=3D364803 heads=3D255 sectors/track=3D63 (16065 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=3D364803 heads=3D255 sectors/track=3D63 (16065 blks/cyl)

Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
    start 63, size 2147472747 (1048570 Meg), flag 80 (active)
        beg: cyl 0/ head 1/ sector 1;
        end: cyl 1023/ head 254/ sector 63
The data for partition 2 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
    start 2147472810, size 2147472810 (1048570 Meg), flag 80 (active)
        beg: cyl 1023/ head 255/ sector 63;
        end: cyl 1023/ head 254/ sector 63
The data for partition 3 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
    start 4294945620, size 1565614575 (764460 Meg), flag 80 (active)
        beg: cyl 1023/ head 255/ sector 63;
        end: cyl 1023/ head 165/ sector 59
The data for partition 4 is:
<UNUSED>


From owner-freebsd-fs@FreeBSD.ORG  Fri May  6 01:31:02 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 118D0106566B
	for <freebsd-fs@freebsd.org>; Fri,  6 May 2011 01:31:02 +0000 (UTC)
	(envelope-from marcel@xcllnt.net)
Received: from mail.xcllnt.net (mail.xcllnt.net [70.36.220.4])
	by mx1.freebsd.org (Postfix) with ESMTP id 894458FC12
	for <freebsd-fs@freebsd.org>; Fri,  6 May 2011 01:31:01 +0000 (UTC)
Received: from sa-nc-mfg-210.static.jnpr.net (natint3.juniper.net
	[66.129.224.36]) (authenticated bits=0)
	by mail.xcllnt.net (8.14.4/8.14.4) with ESMTP id p45LPd57054019
	(version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO);
	Thu, 5 May 2011 14:25:44 -0700 (PDT)
	(envelope-from marcel@xcllnt.net)
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
From: Marcel Moolenaar <marcel@xcllnt.net>
In-Reply-To: <C2BDEBA9-DD29-4B0B-B125-89B93F5997BA@dragondata.com>
Date: Thu, 5 May 2011 14:25:35 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <FCA2B7F6-F9D1-4DA2-B22C-DEFBC1B55E1B@xcllnt.net>
References: <C2BDEBA9-DD29-4B0B-B125-89B93F5997BA@dragondata.com>
To: Kevin Day <toasty@dragondata.com>
X-Mailer: Apple Mail (2.1084)
Cc: freebsd-fs@freebsd.org
Subject: Re: "gpart show" stuck in loop
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 06 May 2011 01:31:02 -0000


On May 5, 2011, at 10:25 AM, Kevin Day wrote:

>=20
> We've had one of our boxes getting stuck with "gpart show" (called =
from rc startup scripts) consuming 100% cpu after each reboot. Manually =
running "gpart show" gives me:

Can you send me a binary image of the first sector of da0 privately
and also tell me what FreeBSD version you're using.

Thanks,

--=20
Marcel Moolenaar
marcel@xcllnt.net


From owner-freebsd-fs@FreeBSD.ORG  Fri May  6 03:12:09 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 81E21106566B
	for <freebsd-fs@freebsd.org>; Fri,  6 May 2011 03:12:09 +0000 (UTC)
	(envelope-from marcel@xcllnt.net)
Received: from mail.xcllnt.net (mail.xcllnt.net [70.36.220.4])
	by mx1.freebsd.org (Postfix) with ESMTP id 559F28FC12
	for <freebsd-fs@freebsd.org>; Fri,  6 May 2011 03:12:09 +0000 (UTC)
Received: from dhcp-192-168-2-13.wifi.xcllnt.net (atm.xcllnt.net [70.36.220.6])
	(authenticated bits=0)
	by mail.xcllnt.net (8.14.4/8.14.4) with ESMTP id p463BxrZ001939
	(version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO);
	Thu, 5 May 2011 20:12:05 -0700 (PDT)
	(envelope-from marcel@xcllnt.net)
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
From: Marcel Moolenaar <marcel@xcllnt.net>
In-Reply-To: <FCA2B7F6-F9D1-4DA2-B22C-DEFBC1B55E1B@xcllnt.net>
Date: Thu, 5 May 2011 20:11:59 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <6B7B3E48-08D5-47D1-85B4-FAA1EEE6764C@xcllnt.net>
References: <C2BDEBA9-DD29-4B0B-B125-89B93F5997BA@dragondata.com>
	<FCA2B7F6-F9D1-4DA2-B22C-DEFBC1B55E1B@xcllnt.net>
To: Marcel Moolenaar <marcel@xcllnt.net>
X-Mailer: Apple Mail (2.1084)
Cc: freebsd-fs@freebsd.org
Subject: Re: "gpart show" stuck in loop
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 06 May 2011 03:12:09 -0000


On May 5, 2011, at 2:25 PM, Marcel Moolenaar wrote:

>=20
> On May 5, 2011, at 10:25 AM, Kevin Day wrote:
>=20
>>=20
>> We've had one of our boxes getting stuck with "gpart show" (called =
from rc startup scripts) consuming 100% cpu after each reboot. Manually =
running "gpart show" gives me:
>=20
> Can you send me a binary image of the first sector of da0 privately
> and also tell me what FreeBSD version you're using.

(after receiving the dump)

Hi Kevin,

I reproduced the problem:

ns1% sudo mdconfig -a -t malloc -s 5860573173
md0
ns1% sudo gpart create -s mbr md0
md0 created
ns1% gpart show md0
=3D>        63  4294967229  md0  MBR  (2.7T)
          63  4294967229       - free -  (2.0T)

ns1% sudo dd if=3Dkevin-day.mbr of=3D/dev/md0
8+0 records in
8+0 records out
4096 bytes transferred in 0.006988 secs (586144 bytes/sec)
ns1% gpart show md0
=3D>        63  5860573110  md0  MBR  (2.7T)
          63  2147472747    1  freebsd  [active]  (1.0T)
  2147472810  2147472810    2  freebsd  [active]  (1.0T)
  4294945620  -2729352721    3  freebsd  [active]  ()
  1565592899   581879911       - free -  (277G)
  2147472810  2147472810    2  freebsd  [active]  (1.0T)
  4294945620  -2729352721    3  freebsd  [active]  ()
  1565592899   581879911       - free -  (277G)
  2147472810  2147472810    2  freebsd  [active]  (1.0T)
  4294945620  -2729352721    3  freebsd  [active]  ()
  1565592899   581879911       - free -  (277G)
	^C


The first problem you have is that the MBR has overflows.
As you can see from my initial MBR, only 2.0TB out of the
2.7T can be addressed, whereas yours addresses the whole
2.7T. There must be an overflow condition.

The second problem is that more than 1 slice is marked
active.

Now, on to the infinite recursion in gpart. The XML has
the following pertaining the slices:

        <provider id=3D"0xffffff0029ff9900">
          <geom ref=3D"0xffffff002e742d00"/>
          <mode>r0w0e0</mode>
          <name>md0s3</name>
          <mediasize>-1397428593152</mediasize>
          <sectorsize>512</sectorsize>
          <config>
            <start>4294945620</start>
            <end>1565592898</end>
            <index>3</index>
            <type>freebsd</type>
            <offset>2199012157440</offset>
            <length>18446742676280958464</length>
            <rawtype>165</rawtype>
            <attrib>active</attrib>
          </config>
        </provider>

Notice how mediasize is negative. This is a bug in the
kernel. This is also what leads to the recursion in gpart,
because gpart looks up the next partition on the disk,
given the LBA of the next sector following the partition
just processed. This allows gpart to detect free space
(the next partition found doesn't start at the given LBA)
and it allows gpart to print the partitions in order on
the disk. In any case: since the end of slice 3 is before
the start of slice 3 and even before the start of slice 2,
due to its negative size, gpart will continuously find the
same partitions:
1.  After partition 3 the "cursor" is at 1565592899,
2.  The next partition found is partition 2, at 2147472810
3.  Therefore, 1565592899-2147472810 is free space
4.  Partition 2 is printed, and partition 3 is found next
5.  Partition 3 is printed and due to the negative size:
    goto 1

I think we should do things:
1.  Protect the gpart tool against this,
2.  Fix the kernel to simply reject partitions that
    fall outside of the addressable space (as determined
    by the limitations of the scheme).

In your case it would mean that slice 3 would result
in slice 3 being inaccessable.

Given that you've been hit by this: do you feel that such
a change would be a better failure mode?

--=20
Marcel Moolenaar
marcel@xcllnt.net


From owner-freebsd-fs@FreeBSD.ORG  Fri May  6 06:30:04 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4C1091065672
	for <freebsd-fs@freebsd.org>; Fri,  6 May 2011 06:30:04 +0000 (UTC)
	(envelope-from toasty@dragondata.com)
Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com
	[209.85.214.182])
	by mx1.freebsd.org (Postfix) with ESMTP id DEEB58FC17
	for <freebsd-fs@freebsd.org>; Fri,  6 May 2011 06:30:01 +0000 (UTC)
Received: by iwn33 with SMTP id 33so3398993iwn.13
	for <freebsd-fs@freebsd.org>; Thu, 05 May 2011 23:30:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=dragondata.com; s=google;
	h=domainkey-signature:subject:mime-version:content-type:from
	:in-reply-to:date:cc:content-transfer-encoding:message-id:references
	:to:x-mailer; bh=DjXDjVWUjuAzLLchb6K/E+cQ+5M2VxkhGvA+/4pgUfw=;
	b=nZBMT34w2PbQStFWMugQEd9jWY0XWiz/qSBT3jdO0en3eSCuaP4XagTnSU2sxd3i9P
	fD6ZthMuTpqbiJ/gp4r1EfeULXxtcNCLZrQ90gvUaB7x3yZYBev1qitd6nt7UA3UJ0Or
	WGLn8Br0EotADn7jz3IPr32KIrbTUEHkpN2cs=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=dragondata.com; s=google;
	h=subject:mime-version:content-type:from:in-reply-to:date:cc
	:content-transfer-encoding:message-id:references:to:x-mailer;
	b=PBBzyf6/qxU+PJYpEMbBl0yV3WMGSSQAI/kJzMcK91trH0Rmznh0y07BdsC9qSA11e
	WSzV+mnmgcM5MGqijo8Z2FBcZSREXWYUj+cZ2CqICQBpst0fNZs1Bb7JiuKwatiRT/ij
	tjaq8svbW/Qgzy6fWcp27hh4JxxDyZ/pRN4Do=
Received: by 10.42.159.65 with SMTP id k1mr2161427icx.174.1304663401225;
	Thu, 05 May 2011 23:30:01 -0700 (PDT)
Received: from vpn168.ord02.your.org (vpn168.ord02.your.org [204.9.55.168])
	by mx.google.com with ESMTPS id u17sm1229948ibm.28.2011.05.05.23.29.59
	(version=TLSv1/SSLv3 cipher=OTHER);
	Thu, 05 May 2011 23:29:59 -0700 (PDT)
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
From: Kevin Day <toasty@dragondata.com>
In-Reply-To: <6B7B3E48-08D5-47D1-85B4-FAA1EEE6764C@xcllnt.net>
Date: Fri, 6 May 2011 01:29:57 -0500
Content-Transfer-Encoding: quoted-printable
Message-Id: <4A043649-429E-4CAC-8DB2-2275ECF552A0@dragondata.com>
References: <C2BDEBA9-DD29-4B0B-B125-89B93F5997BA@dragondata.com>
	<FCA2B7F6-F9D1-4DA2-B22C-DEFBC1B55E1B@xcllnt.net>
	<6B7B3E48-08D5-47D1-85B4-FAA1EEE6764C@xcllnt.net>
To: Marcel Moolenaar <marcel@xcllnt.net>
X-Mailer: Apple Mail (2.1084)
Cc: freebsd-fs@freebsd.org
Subject: Re: "gpart show" stuck in loop
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 06 May 2011 06:30:04 -0000


On May 5, 2011, at 10:11 PM, Marcel Moolenaar wrote:
> Hi Kevin,
>=20
> I reproduced the problem:
>=20

Yay!

> The first problem you have is that the MBR has overflows.
> As you can see from my initial MBR, only 2.0TB out of the
> 2.7T can be addressed, whereas yours addresses the whole
> 2.7T. There must be an overflow condition.
>=20
> The second problem is that more than 1 slice is marked
> active.

Yeah, I'm not exactly sure how the previous user of this storage array =
ended up with this MBR. I believe he was using it in FreeBSD, but =
probably something much older (6.x?). I don't know if it was actually =
working or not with all the partitions, but I honestly can't see how.

> I think we should do things:
> 1.  Protect the gpart tool against this,
> 2.  Fix the kernel to simply reject partitions that
>    fall outside of the addressable space (as determined
>    by the limitations of the scheme).
>=20
> In your case it would mean that slice 3 would result
> in slice 3 being inaccessable.
>=20
> Given that you've been hit by this: do you feel that such
> a change would be a better failure mode?

Definitely. As it stands now, slice 3 isn't accessible anyway:

# dd if=3D/dev/da0s3 of=3D/dev/null=20
dd: /dev/da0s3: Input/output error
0+0 records in
0+0 records out
0 bytes transferred in 0.000233 secs (0 bytes/sec)

So allowing the rc startup to finish without hanging would be much =
improved.

Thanks for the speedy answer. :)

-- Kevin


From owner-freebsd-fs@FreeBSD.ORG  Fri May  6 08:14:10 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A1E991065672;
	Fri,  6 May 2011 08:14:10 +0000 (UTC) (envelope-from jh@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 79AB78FC1A;
	Fri,  6 May 2011 08:14:10 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p468EAqw004992;
	Fri, 6 May 2011 08:14:10 GMT (envelope-from jh@freefall.freebsd.org)
Received: (from jh@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p468E9a6004986;
	Fri, 6 May 2011 08:14:09 GMT (envelope-from jh)
Date: Fri, 6 May 2011 08:14:09 GMT
Message-Id: <201105060814.p468E9a6004986@freefall.freebsd.org>
To: vk@dss.kbb.ru, jh@FreeBSD.org, freebsd-fs@FreeBSD.org
From: jh@FreeBSD.org
Cc: 
Subject: Re: kern/149022: [hang] File system operations hangs with suspfs
	state
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 06 May 2011 08:14:10 -0000

Synopsis: [hang] File system operations hangs with suspfs state

State-Changed-From-To: feedback->closed
State-Changed-By: jh
State-Changed-When: Fri May 6 08:14:09 UTC 2011
State-Changed-Why: 
Feedback timeout.

http://www.freebsd.org/cgi/query-pr.cgi?pr=149022

From owner-freebsd-fs@FreeBSD.ORG  Fri May  6 08:19:07 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 527C1106566C;
	Fri,  6 May 2011 08:19:07 +0000 (UTC) (envelope-from jh@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 2B2508FC19;
	Fri,  6 May 2011 08:19:07 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p468J7pC006395;
	Fri, 6 May 2011 08:19:07 GMT (envelope-from jh@freefall.freebsd.org)
Received: (from jh@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p468J6dZ006390;
	Fri, 6 May 2011 08:19:06 GMT (envelope-from jh)
Date: Fri, 6 May 2011 08:19:06 GMT
Message-Id: <201105060819.p468J6dZ006390@freefall.freebsd.org>
To: k0802647@telus.net, jh@FreeBSD.org, freebsd-fs@FreeBSD.org
From: jh@FreeBSD.org
Cc: 
Subject: Re: kern/154228: [md] md getting stuck in wdrain state
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 06 May 2011 08:19:07 -0000

Synopsis: [md] md getting stuck in wdrain state

State-Changed-From-To: feedback->patched
State-Changed-By: jh
State-Changed-When: Fri May 6 08:16:03 UTC 2011
State-Changed-Why: 
Fixed in head (r217880) and stable/8 (r218188).

http://www.freebsd.org/cgi/query-pr.cgi?pr=154228

From owner-freebsd-fs@FreeBSD.ORG  Fri May  6 20:34:02 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2C1831065670
	for <freebsd-fs@freebsd.org>; Fri,  6 May 2011 20:34:02 +0000 (UTC)
	(envelope-from unix.co@gmail.com)
Received: from mail-pv0-f182.google.com (mail-pv0-f182.google.com
	[74.125.83.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 041D98FC15
	for <freebsd-fs@freebsd.org>; Fri,  6 May 2011 20:34:01 +0000 (UTC)
Received: by pvg11 with SMTP id 11so2080741pvg.13
	for <freebsd-fs@freebsd.org>; Fri, 06 May 2011 13:34:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:date:message-id:subject:from:to
	:content-type; bh=ZLZwCcnkhYjWKi6EEucEhtiswI/tQHycfl+yplfhEB8=;
	b=I/QXXTfocZWt7kUd3al5CeAvRDujpfLntsb+4l3+4mF+rI0Uj68Wq1L2WmE0Ei8NYf
	46GzffhDcOndkX/kv1JmaK4uU9AhulbLxUDRwiG5SJfY2Q4X+TnknZjAV0QvIFlw+7EV
	eD+ZdN0M0Gbbg6vDHMAwz2HAkRAnyQSmiSG0M=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:date:message-id:subject:from:to:content-type;
	b=WjLeI/NEdVxG00NNxOuwnoTl4grocfJRmTYExRhE85NlvfQI36GIfISjV3cuUfOVAy
	f7v9Ekq86BggG7W/Fm5X4NVAZA3OOuRbU/CRcmnILb0o9Kmgwjq6DQJ5u5JD+KV9HpJt
	4f9Qr19Dvp14tivuIOys8DF+F8jH3OxFaudH8=
MIME-Version: 1.0
Received: by 10.68.60.33 with SMTP id e1mr2816406pbr.174.1304712723322; Fri,
	06 May 2011 13:12:03 -0700 (PDT)
Received: by 10.68.54.41 with HTTP; Fri, 6 May 2011 13:12:03 -0700 (PDT)
Date: Sat, 7 May 2011 01:12:03 +0500
Message-ID: <BANLkTi=6yKV-bDzkMxjvggT2Gme64Tiufw@mail.gmail.com>
From: "Tears !" <unix.co@gmail.com>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: Remote address not configured ??
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 06 May 2011 20:34:02 -0000

Hi

I am trying to create hast0 pool

hastctl create hast0

But i am getting error

[ERROR] Remote address not configured for resource hast0.

Here is my hast.conf both nodes are accessible and both side same hast.conf

resource hast0 {
on s1 {
local /dev/ad3
remote 87.96.41.150
}
on s2 {
local /dev/ad3
remote 87.96.41.146
}
}

How to solve this ?

Best Regards

Umar

From owner-freebsd-fs@FreeBSD.ORG  Sat May  7 05:36:13 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EE1E71065676
	for <freebsd-fs@freebsd.org>; Sat,  7 May 2011 05:36:13 +0000 (UTC)
	(envelope-from igorz@yandex.ru)
Received: from forward1.mail.yandex.net (forward1.mail.yandex.net [77.88.46.6])
	by mx1.freebsd.org (Postfix) with ESMTP id A04B88FC16
	for <freebsd-fs@freebsd.org>; Sat,  7 May 2011 05:36:13 +0000 (UTC)
Received: from web53.yandex.ru (web53.yandex.ru [77.88.47.159])
	by forward1.mail.yandex.net (Yandex) with ESMTP id C5F4E124312A
	for <freebsd-fs@freebsd.org>; Sat,  7 May 2011 09:20:58 +0400 (MSD)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail;
	t=1304745658; bh=Sqpf6dUA881NpTWu1h0UhcrVpLtaqsHXkB2UepiUTDk=;
	h=From:To:Subject:MIME-Version:Message-Id:Date:
	Content-Transfer-Encoding:Content-Type;
	b=Gxm3o8de/bf7HZLXTFtrnRiGYycbUdU/bmGn7d20MOEPRlJfAwJbcyNWpnxgVH1Sc
	tWzEpINnkls1R7ArXmM0NHdxfy1/uRth9CtQIwranzkKNcd6wfW+HXG5yD7U/xrGNE
	ChB/N5tOOJAI8dSgf6I2grRHWYSpx6xpBf3kLAnU=
Received: from localhost (localhost.localdomain [127.0.0.1])
	by web53.yandex.ru (Yandex) with ESMTP id BB530358331
	for <freebsd-fs@freebsd.org>; Sat,  7 May 2011 09:20:58 +0400 (MSD)
X-Yandex-Spam: 1
Received: from ppp85-141-219-114.pppoe.mtu-net.ru
	(ppp85-141-219-114.pppoe.mtu-net.ru [85.141.219.114]) by
	mail.yandex.ru with HTTP; Sat, 07 May 2011 09:20:57 +0400
From: Igor Zabelin <igorz@yandex.ru>
To: freebsd-fs@freebsd.org
MIME-Version: 1.0
Message-Id: <210021304745658@web53.yandex.ru>
Date: Sat, 07 May 2011 09:20:57 +0400
X-Mailer: Yamail [ http://yandex.ru ] 5.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain
Subject: ZFS can't mount filesystem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 07 May 2011 05:36:14 -0000

Hi,

I have trouble with ZFS. One of set filesystems can't mount.
zpool scrub is not doing anything
ZFS reports an error when geting the properties.
SMART extended offline test for each disk completed without error.
It's possible to recover data? Mount ignoring errors?

FreeBSD 8.2-RELEASE 

ZFS reports an error when geting the properties.

# zfs get all tank/var

[skip normal output]
internal error: unable to get version property
internal error: unable to get utf8only property
internal error: unable to get normalization property
internal error: unable to get casesensitivity property
[skip normal output]

# zpool status -v tank
  pool: tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub stopped after 0h0m with 0 errors on Sat May  7 08:09:35 2011
config:

        NAME           STATE     READ WRITE CKSUM
        tank           ONLINE       0     0    36
          raidz1       ONLINE       0     0   144
            gpt/disk5  ONLINE       0     0     0
            gpt/disk6  ONLINE       0     0     0
            gpt/disk7  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        tank/var:<0x0>


From owner-freebsd-fs@FreeBSD.ORG  Sat May  7 06:44:06 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B49A6106564A
	for <freebsd-fs@freebsd.org>; Sat,  7 May 2011 06:44:06 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta06.westchester.pa.mail.comcast.net
	(qmta06.westchester.pa.mail.comcast.net [76.96.62.56])
	by mx1.freebsd.org (Postfix) with ESMTP id 625C68FC08
	for <freebsd-fs@freebsd.org>; Sat,  7 May 2011 06:44:06 +0000 (UTC)
Received: from omta23.westchester.pa.mail.comcast.net ([76.96.62.74])
	by qmta06.westchester.pa.mail.comcast.net with comcast
	id gWir1g0031c6gX856Wk6MU; Sat, 07 May 2011 06:44:06 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta23.westchester.pa.mail.comcast.net with comcast
	id gWk51g00A1t3BNj3jWk5qm; Sat, 07 May 2011 06:44:06 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id DC4F7102C19; Fri,  6 May 2011 23:44:03 -0700 (PDT)
Date: Fri, 6 May 2011 23:44:03 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Igor Zabelin <igorz@yandex.ru>
Message-ID: <20110507064403.GA4324@icarus.home.lan>
References: <210021304745658@web53.yandex.ru>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <210021304745658@web53.yandex.ru>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS can't mount filesystem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 07 May 2011 06:44:06 -0000

On Sat, May 07, 2011 at 09:20:57AM +0400, Igor Zabelin wrote:
> Hi,
> 
> I have trouble with ZFS. One of set filesystems can't mount.
> zpool scrub is not doing anything
> ZFS reports an error when geting the properties.
> SMART extended offline test for each disk completed without error.
> It's possible to recover data? Mount ignoring errors?
> 
> FreeBSD 8.2-RELEASE 
> 
> ZFS reports an error when geting the properties.
> 
> # zfs get all tank/var
> 
> [skip normal output]
> internal error: unable to get version property
> internal error: unable to get utf8only property
> internal error: unable to get normalization property
> internal error: unable to get casesensitivity property
> [skip normal output]
> 
> # zpool status -v tank
>   pool: tank
>  state: ONLINE
> status: One or more devices has experienced an error resulting in data
>         corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>         entire pool from backup.
>    see: http://www.sun.com/msg/ZFS-8000-8A
>  scrub: scrub stopped after 0h0m with 0 errors on Sat May  7 08:09:35 2011
> config:
> 
>         NAME           STATE     READ WRITE CKSUM
>         tank           ONLINE       0     0    36
>           raidz1       ONLINE       0     0   144
>             gpt/disk5  ONLINE       0     0     0
>             gpt/disk6  ONLINE       0     0     0
>             gpt/disk7  ONLINE       0     0     0
> 
> errors: Permanent errors have been detected in the following files:
> 
>         tank/var:<0x0>

Just to rule out disk problems, can you please provide "smartctl -a"
output for each of the 3 disks in the pool and be sure to state what
output matches each disk (gpt/XXX)?  A long test doesn't act act as full
validation of disk read integrity (it's slightly different than a
surface scan but not the same thing), nor does it test things like
communication between the controller and the disk.  short vs. long vs.
conveyance vs. offline vs. select SMART tests all do different things
depending on how the vendor implements them, and it varies per model of
disk; there is no standard.

Others may be able to help with pool recovery in this case, but I always
tend to resort to restoration from backups.

Developers may be interested in the output from "zdb tank", so you may
want to include that here.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Sat May  7 07:27:36 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 54FC61065676
	for <freebsd-fs@freebsd.org>; Sat,  7 May 2011 07:27:36 +0000 (UTC)
	(envelope-from to.my.trociny@gmail.com)
Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com
	[209.85.214.54])
	by mx1.freebsd.org (Postfix) with ESMTP id CDE168FC21
	for <freebsd-fs@freebsd.org>; Sat,  7 May 2011 07:27:35 +0000 (UTC)
Received: by bwz12 with SMTP id 12so4385112bwz.13
	for <freebsd-fs@freebsd.org>; Sat, 07 May 2011 00:27:34 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:from:to:cc:subject:references:x-comment-to
	:sender:date:in-reply-to:message-id:user-agent:mime-version
	:content-type; bh=BSronswnnxiVec78aDaD0szQlqD2vu52p8EEJsfcjVo=;
	b=bopXUlWwu3q0ms6skcaXGOKeeMjUetqPwRyw6tIOLoUTP64fmJXOUGZCSSqO2LojHx
	BJOrLtorgcIqn4zFvx0/crSgIhIasCX0/0tAAyIi/mRp01lmCWDw6rSzkN8TyL3oRKDQ
	4RVORs9lo+059K7s3MJ5zu6CPXL9A71sbCKjM=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=from:to:cc:subject:references:x-comment-to:sender:date:in-reply-to
	:message-id:user-agent:mime-version:content-type;
	b=RjachKAA45IZu7p0KZC86QV2wQYNfZt1Lp6wFUNn4gs99piZvBLbkV2vdL6kJ7mKLq
	BIvId/z0RYtEHc2z4sbyPtNu9i5bABtUNngVirNF/Ivxpc8TcrNlxOXvJm+fkuWBiPsb
	pQWKjUF0KeKfqm/KAMMv0EtaEhlB9I68uHnBA=
Received: by 10.205.24.9 with SMTP id rc9mr3956140bkb.92.1304753254502;
	Sat, 07 May 2011 00:27:34 -0700 (PDT)
Received: from localhost ([95.69.172.154])
	by mx.google.com with ESMTPS id y22sm2427796bku.8.2011.05.07.00.27.32
	(version=TLSv1/SSLv3 cipher=OTHER);
	Sat, 07 May 2011 00:27:33 -0700 (PDT)
From: Mikolaj Golub <trociny@freebsd.org>
To: "Tears !" <unix.co@gmail.com>
References: <BANLkTi=6yKV-bDzkMxjvggT2Gme64Tiufw@mail.gmail.com>
X-Comment-To: Tears !
Sender: Mikolaj Golub <to.my.trociny@gmail.com>
Date: Sat, 07 May 2011 10:27:30 +0300
In-Reply-To: <BANLkTi=6yKV-bDzkMxjvggT2Gme64Tiufw@mail.gmail.com> (Tears !'s
	message of "Sat, 7 May 2011 01:12:03 +0500")
Message-ID: <86wri3dl7h.fsf@kopusha.home.net>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: freebsd-fs@freebsd.org
Subject: Re: Remote address not configured ??
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 07 May 2011 07:27:36 -0000


On Sat, 7 May 2011 01:12:03 +0500 Tears ! wrote:

 T!> Hi

 T!> I am trying to create hast0 pool

 T!> hastctl create hast0

 T!> But i am getting error

 T!> [ERROR] Remote address not configured for resource hast0.

 T!> Here is my hast.conf both nodes are accessible and both side same hast.conf

 T!> resource hast0 {
 T!> on s1 {
 T!> local /dev/ad3
 T!> remote 87.96.41.150
 T!> }
 T!> on s2 {
 T!> local /dev/ad3
 T!> remote 87.96.41.146
 T!> }
 T!> }

 T!> How to solve this ?

It looks like hastd can't find configuration for its node. Is s1 and s2 are
real hostnames of your hosts? As it is stated in hast.conf(5):

     The <node> argument can be replaced either by a full hostname as obtained
     by gethostname(3), only first part of the hostname, or by node's UUID as
     found in the kern.hostuuid sysctl(8) variable.

What version of FreeBSD are you running? I suspect some release, because in
STABLE and CURRENT you would have more plain message in such case: "No
resource configuration for this node...".

-- 
Mikolaj Golub

From owner-freebsd-fs@FreeBSD.ORG  Sat May  7 08:57:04 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 67C3A1065677;
	Sat,  7 May 2011 08:57:04 +0000 (UTC)
	(envelope-from unix.co@gmail.com)
Received: from mail-pv0-f182.google.com (mail-pv0-f182.google.com
	[74.125.83.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 369288FC0A;
	Sat,  7 May 2011 08:57:03 +0000 (UTC)
Received: by pvg11 with SMTP id 11so2291591pvg.13
	for <multiple recipients>; Sat, 07 May 2011 01:57:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:in-reply-to:references:date
	:message-id:subject:from:to:cc:content-type;
	bh=mOerKN8A7mHWnlIXnZh+QvhV8c6OLIUm2yH0w2amDvc=;
	b=WsTNY7vkgjzz7aYkIPInZY/iINcuAFG2dDvgUHAEOjLFHmRUN4uLtFp27v6JAqzw8q
	J68qZjHCYEyp7GrAR4T9cuA7C+wP7zhQ37GP6RBOzyam30CJKVRtRTN+B8qijXJZG6ys
	QQgBHoa3wtiRApzB2CMp6FUSbPPAwZMkQxSBA=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=PIT9j9Nf53d4B505w8bUagecbIjvY7l87DjbRqHN0l9tkerrd2a0q+EweYDPe9bAsm
	baAAs1eVZzuevwPwhMolPqOfJ8rSWeVgpAE1og2Hh4AeuFIepMICPkAHNryduOG75jnR
	pZbhFOKw4/MaGKNK8+vS9O1o3MO9pPgKYXeOo=
MIME-Version: 1.0
Received: by 10.68.0.69 with SMTP id 5mr6128992pbc.241.1304758623707; Sat, 07
	May 2011 01:57:03 -0700 (PDT)
Received: by 10.68.54.41 with HTTP; Sat, 7 May 2011 01:57:03 -0700 (PDT)
In-Reply-To: <86wri3dl7h.fsf@kopusha.home.net>
References: <BANLkTi=6yKV-bDzkMxjvggT2Gme64Tiufw@mail.gmail.com>
	<86wri3dl7h.fsf@kopusha.home.net>
Date: Sat, 7 May 2011 13:57:03 +0500
Message-ID: <BANLkTik72QPEAZwzPSazcbJcFUumiq6EeQ@mail.gmail.com>
From: "Tears !" <unix.co@gmail.com>
To: Mikolaj Golub <trociny@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-fs@freebsd.org
Subject: Re: Remote address not configured ??
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 07 May 2011 08:57:04 -0000

Hi Mikolaj,

Thanks a lot its work after changing *s1* with system *hostname*

Best Regards,

Umar

On Sat, May 7, 2011 at 12:27 PM, Mikolaj Golub <trociny@freebsd.org> wrote:

>
> On Sat, 7 May 2011 01:12:03 +0500 Tears ! wrote:
>
>  T!> Hi
>
>  T!> I am trying to create hast0 pool
>
>  T!> hastctl create hast0
>
>  T!> But i am getting error
>
>  T!> [ERROR] Remote address not configured for resource hast0.
>
>  T!> Here is my hast.conf both nodes are accessible and both side same
> hast.conf
>
>  T!> resource hast0 {
>  T!> on s1 {
>  T!> local /dev/ad3
>  T!> remote 87.96.41.150
>  T!> }
>  T!> on s2 {
>  T!> local /dev/ad3
>  T!> remote 87.96.41.146
>  T!> }
>  T!> }
>
>  T!> How to solve this ?
>
> It looks like hastd can't find configuration for its node. Is s1 and s2 are
> real hostnames of your hosts? As it is stated in hast.conf(5):
>
>     The <node> argument can be replaced either by a full hostname as
> obtained
>     by gethostname(3), only first part of the hostname, or by node's UUID
> as
>     found in the kern.hostuuid sysctl(8) variable.
>
> What version of FreeBSD are you running? I suspect some release, because in
> STABLE and CURRENT you would have more plain message in such case: "No
> resource configuration for this node...".
>
> --
> Mikolaj Golub
>


-- 
Umar Draz
Network Administrator

From owner-freebsd-fs@FreeBSD.ORG  Sat May  7 19:02:39 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 30D4D106564A
	for <freebsd-fs@freebsd.org>; Sat,  7 May 2011 19:02:39 +0000 (UTC)
	(envelope-from artemb@gmail.com)
Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com
	[209.85.216.54])
	by mx1.freebsd.org (Postfix) with ESMTP id E14508FC16
	for <freebsd-fs@freebsd.org>; Sat,  7 May 2011 19:02:37 +0000 (UTC)
Received: by qwc9 with SMTP id 9so3128911qwc.13
	for <freebsd-fs@freebsd.org>; Sat, 07 May 2011 12:02:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=zGvD/qi9gu0x0PVfSz1iIekuryOsh2pEnQ71i9H6Gc8=;
	b=BkFDc8DFWrf4IzS0BT727nr+Lln4SoMg8INI4bQaCDqIPsrrtN4fixZK2clsX111p5
	EKQzOiL/1ihmHfQz74EFKqygJOt2ena2b+KeS6plVbibQ5iiqKXyPyoX5P+jw97NQUc3
	2vG4+itktI0muzhLNbI3UJDG0g94aECbYLG18=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	b=qY4AV9n+7DOifGbtbw2davmaEYvzSEV9KyWNMm20Izwe66SNQvswjP8WlYUxxAicit
	ggoeQkp3E81wJIDeVHX0IN4DVn/M0jXlOX+U4IkogArwrpCazRVsi6M0ou+4tVZOsl9Y
	r8+s7hHB5tL43t+D57+/OyIyV6wZvRvKyLaek=
MIME-Version: 1.0
Received: by 10.229.46.67 with SMTP id i3mr3412290qcf.234.1304794955473; Sat,
	07 May 2011 12:02:35 -0700 (PDT)
Sender: artemb@gmail.com
Received: by 10.229.95.140 with HTTP; Sat, 7 May 2011 12:02:35 -0700 (PDT)
In-Reply-To: <210021304745658@web53.yandex.ru>
References: <210021304745658@web53.yandex.ru>
Date: Sat, 7 May 2011 12:02:35 -0700
X-Google-Sender-Auth: FdfRFOeY7Xw91wAtee3Fo3jOwq4
Message-ID: <BANLkTim+u4MuQP3N7DfR3YkUtSpZvyNu8Q@mail.gmail.com>
From: Artem Belevich <art@freebsd.org>
To: Igor Zabelin <igorz@yandex.ru>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS can't mount filesystem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 07 May 2011 19:02:39 -0000

On Fri, May 6, 2011 at 10:20 PM, Igor Zabelin <igorz@yandex.ru> wrote:
> Hi,
>
> I have trouble with ZFS. One of set filesystems can't mount.
> zpool scrub is not doing anything
> ZFS reports an error when geting the properties.
> SMART extended offline test for each disk completed without error.
> It's possible to recover data? Mount ignoring errors?
>
> FreeBSD 8.2-RELEASE
>
> ZFS reports an error when geting the properties.
>
> # zfs get all tank/var
>
> [skip normal output]
> internal error: unable to get version property
> internal error: unable to get utf8only property
> internal error: unable to get normalization property
> internal error: unable to get casesensitivity property
> [skip normal output]
>
> # zpool status -v tank
> =A0pool: tank
> =A0state: ONLINE
> status: One or more devices has experienced an error resulting in data
> =A0 =A0 =A0 =A0corruption. =A0Applications may be affected.
> action: Restore the file in question if possible. =A0Otherwise restore th=
e
> =A0 =A0 =A0 =A0entire pool from backup.
> =A0 see: http://www.sun.com/msg/ZFS-8000-8A
> =A0scrub: scrub stopped after 0h0m with 0 errors on Sat May =A07 08:09:35=
 2011
> config:
>
> =A0 =A0 =A0 =A0NAME =A0 =A0 =A0 =A0 =A0 STATE =A0 =A0 READ WRITE CKSUM
> =A0 =A0 =A0 =A0tank =A0 =A0 =A0 =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =
=A0 =A036
> =A0 =A0 =A0 =A0 =A0raidz1 =A0 =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =
144
> =A0 =A0 =A0 =A0 =A0 =A0gpt/disk5 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =
=A0 0
> =A0 =A0 =A0 =A0 =A0 =A0gpt/disk6 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =
=A0 0
> =A0 =A0 =A0 =A0 =A0 =A0gpt/disk7 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =
=A0 0
>
> errors: Permanent errors have been detected in the following files:
>
> =A0 =A0 =A0 =A0tank/var:<0x0>

It may be good idea to test  RAM on your system first. ZFS with its
data consistency checks is often the first thing tripped by bad RAM.

--Artem


>
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>