From owner-freebsd-fs@FreeBSD.ORG Sun May 1 00:09:22 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BE0FD1065700 for ; Sun, 1 May 2011 00:09:22 +0000 (UTC) (envelope-from mm@FreeBSD.org) Received: from mail.vx.sk (mail.vx.sk [IPv6:2a01:4f8:100:1043::3]) by mx1.freebsd.org (Postfix) with ESMTP id 1999F8FC0C for ; Sun, 1 May 2011 00:09:22 +0000 (UTC) Received: from core.vx.sk (localhost [127.0.0.1]) by mail.vx.sk (Postfix) with ESMTP id 4C19314D14A; Sun, 1 May 2011 02:09:21 +0200 (CEST) X-Virus-Scanned: amavisd-new at mail.vx.sk Received: from mail.vx.sk ([127.0.0.1]) by core.vx.sk (mail.vx.sk [127.0.0.1]) (amavisd-new, port 10024) with LMTP id S9qLTD8AEB-U; Sun, 1 May 2011 02:09:19 +0200 (CEST) Received: from [10.9.8.1] (chello085216231078.chello.sk [85.216.231.78]) by mail.vx.sk (Postfix) with ESMTPSA id 8D7C614D133; Sun, 1 May 2011 02:09:18 +0200 (CEST) Message-ID: <4DBCA4AE.3090506@FreeBSD.org> Date: Sun, 01 May 2011 02:09:18 +0200 From: Martin Matuska User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; sk; rv:1.8.1.23) Gecko/20090812 Lightning/0.9 Thunderbird/2.0.0.23 Mnenhy/0.7.5.0 MIME-Version: 1.0 To: Pierre Lamy References: <4DB8EF02.8060406@bk.ru> <20110430001524.GA58845@icarus.home.lan> <4DBC2E46.9060404@userid.org> In-Reply-To: <4DBC2E46.9060404@userid.org> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: freebsd-fs@freebsd.org, Volodymyr Kostyrko Subject: Re: ZFS v28 for 8.2-STABLE X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 May 2011 00:09:22 -0000 We plan to MFC v28. But as this change is quite intrusive to the users, there is no way back if you upgrade your pool (not upgrading bootcode = not able to boot = saved by mfsBSD). It will happen when we think it is stable enough to be in STABLE. As of me, I am not using it in serious production yet (I am very happy with v15 + latest patches), but my development servers with v28 seem pretty stable. I have updated patch to reflect latest changes (grab latest one): http://people.freebsd.org/~mm/patches/zfs/v28/ As to your setup, have you tried using a partition as a log device? File-based devices are generally considered experimental in all ZFS implementations (including Solaris). Dňa 30.04.2011 17:44, Pierre Lamy wrote / napísal(a): > On 4/29/2011 8:15 PM, Jeremy Chadwick wrote: >> On Fri, Apr 29, 2011 at 11:20:21PM +0300, Volodymyr Kostyrko wrote: >>> 28.04.2011 07:37, Ruslan Yakovlev wrote: >>>> Does actually patch exist for 8.2-STABLE ? >>>> I probe >>>> http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20110317.patch.xz >>>> >>>> >>>> Building failed with: >>>> can't cd to /usr/src/cddl/usr.bin/zstreamdump >>>> Also sys/cddl/compat/opensolaris/sys/sysmacros.h failed to patch. >>>> >>>> Current FreeBSD 8.2-STABLE #35 Mon Apr 18 03:40:38 EEST 2011 i386 >>>> periodically frozen on high load like backup by rsync or find -sx ... >>>> (from default cron tasks). >>> Well ZFSv28 should be very close to STABLE for now? >>> >>> http://lists.freebsd.org/pipermail/freebsd-current/2011-February/023152.html >>> >> It's now a matter of opinion. The whole idea of ZFSv28 being committed >> to HEAD was to be tested. I haven't seen any indication of a progress >> report provided for anything on HEAD that pertains to ZFSv28, have you? >> >> Furthermore, the FreeBSD Quarterly Status Report just came out on 04/27 >> for the months of January-March (almost a 2 month delay, sigh): >> >> 1737 04/27 10:58 Daniel Gerzo ( 41K) FreeBSD Status Report >> January-March, 2011 >> >> http://www.freebsd.org/news/status/report-2011-01-2011-03.html >> >> Which states that ZFSv28 is "now available in CURRENT", which we've >> known for months: >> >> http://www.freebsd.org/news/status/report-2011-01-2011-03.html#ZFSv28-available-in-FreeBSD-9-CURRENT >> >> >> But again, no progress report, so nobody except those who follow >> HEAD/CURRENT know what the progress is. And that progress has not been >> relayed to any of the non-HEAD/CURRENT lists. >> >> I'm a total hard-ass about this stuff, and have been for years, because >> it all boils down to communication (or lack there-of). It seems very >> hasty to say "Yeah! MFC this!" when we (folks who only follow STABLE) >> have absolutely no idea if what's in CURRENT is actually broken in some >> way or if there are outstanding problems -- and if there are, what those >> are so users can be aware of them in advance. >> > > Hello, > > Here's a summary of my recent end-user work with ZFS on -current. I > recently was lucky enough to purchase 2 NAS systems, which consist of 2 > cheap new PCs loaded with 6 HD, one is a simple gpt boot device 1x 1tb > and 5x 2tb data drives. The mobo has 6 sata connectors but I needed to > purchase an additional PCI-E sata adapter since the DVD also uses a sata > port. The system has 4gb memory and a new inexpensive quad core AMD CPU. > > I've been running it (recent -current) for a couple of weeks with heavy > single-user use. 2.5tb/7.1tb. > > The only problem I found, was that deleting a file-backed log device > from a degraded pool would immediately panic the system. I'm not running > stock -current so I didn't report it. > > Resilvering seems absurdly slow, but since I won't be doing it much also > didn't care. My NAS is side by side redundant, so if resilvering takes > more than 2 days I would just replicate off of my other NAS. > > Throughput without a log device was in the range of 30mb/sec (3% of my > 1gb interface). Adding a file-backed log device on a UFS partition that > is used for boot, resulted in a 10x jump, saturating the SATA bus that I > was sending data from over the network. It spiked up to 30% of interface > throughput/max bus speed for disk, and did not vary much. This resolved > the issues I saw that a lot of other people have posted about on the > internet, about very spiky data transfers. I first used a 40mb/sec > throughput USB device as the log device, which showed a dramatic > smoothness in data transfer, but still had ~15 seconds where no data > would xfer, while it was flushed from USB to disk. After researching I > discovered that I could use a file backed log device and this fixed all > the problems about spiky data transfers. > > Before that I had tuned the sysctl's as the poor out of the box settings > were giving me very slow speeds (in the range of 1% network throughput, > before log device). I played around with the vfs.zfs tunables but found > that I did not need to after I added the log device, and the out of the > box settings for that sysctl tree were just fine. > > I had first set this up before CAM was added to -current as default, and > did not use labels. Due to troubleshooting some unrelated disk issues, I > ended up switching to CAM without problems, and subsequently labeled the > disks (recreated the zpool after the labeling). I am now using CAM and > AHCI without any issues. > > Here are some personal notes about the tunables I set, I am sure they > are not all helpful. I didn't add them one by one, I simply mass changed > them and saw a positive result. Also noted are the commands I used and > current system status. > > sysctl -w net.inet.tcp.sendspace=373760 > sysctl -w net.inet.tcp.recvspace=373760 > sysctl -w net.local.stream.sendspace=82320 > sysctl -w net.local.stream.recvspace=82320 > sysctl -w vfs.zfs.prefetch_disable=1 > sysctl -w net.local.stream.recvspace=373760 > sysctl -w net.local.stream.sendspace=373760 > sysctl -w net.local.inflight=1 > sysctl -w net.inet.tcp.ecn.enable=1 > sysctl -w net.inet.flowtable.enable=0 > sysctl -w net.raw.recvspace=373760 > sysctl -w net.raw.sendspace=373760 > sysctl -w net.inet.tcp.local_slowstart_flightsize=10 > sysctl -a net.inet.tcp.delayed_ack=0 > sysctl -w kern.maxvnodes=600000 > sysctl -w net.local.dgram.recvspace=8192 > sysctl -w net.local.dgram.maxdgram=8192 > sysctl -w net.inet.tcp.slowstart_flightsize=10 > sysctl -w net.inet.tcp.path_mtu_discovery=0 > > [/var/preserve/root] # glabel label g_ada0 /dev/ada0 > [/var/preserve/root] # glabel label g_ada1 /dev/ada1 > [/var/preserve/root] # glabel label g_ada3 /dev/ada3 > [/var/preserve/root] # glabel label g_ada4 /dev/ada4 > [/var/preserve/root] # glabel label g_ada5 /dev/ada5 > > Labels so that later I will be able to more easily identify disks. My > mobo has a single ata bus slave port for SATA. That disk would > "disappear" from the box. Moving the drive to a master sata port > resolved the issue (? very odd). > > gnop create -S 4096 /dev/label/g_ada0 > mkdir /var/preserve/zfs > dd if=/dev/zero of=/var/preserve/zfs/log_device bs=1m count=5000 > zpool create -f tank raidz /dev/label/g_ada0.nop /dev/label/g_ada1 > /dev/label/g_ada3 /dev/label/g_ada4 /dev/label/g_ada5 log > /var/preserve/zfs/log_device > > The 4 above lines are to set the alignment to 4kb, to create a file > backed log device, and create the pool. > > zfs set atime=off tank > > I decided not to use dedup, because my files don't have a lot of dup. > They're mostly large media files, ISOs etc. > > [/var/preserve/root] # zpool status > pool: tank > state: ONLINE > scan: none requested > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > raidz1-0 ONLINE 0 0 0 > label/g_ada0 ONLINE 0 0 0 > label/g_ada1 ONLINE 0 0 0 > label/g_ada3 ONLINE 0 0 0 > label/g_ada4 ONLINE 0 0 0 > label/g_ada5 ONLINE 0 0 0 > logs > /var/preserve/zfs/log_device ONLINE 0 0 0 > > errors: No known data errors > [/var/preserve/root] # > > [/var/preserve/root] # df > Filesystem Size Used Avail Capacity Mounted on > /dev/gpt/pyros-a 9.7G 3.3G 5.6G 37% / > /dev/gpt/pyros-c 884G 6.1G 808G 1% /var > tank 7.1T 2.5T 4.6T 35% /tank > [/var/preserve/root] # > > > ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 > ada0: ATA-8 SATA 3.x device > ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes) > ada0: Command Queueing enabled > ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) > ada1 at ahcich2 bus 0 scbus3 target 0 lun 0 > ada1: ATA-8 SATA 3.x device > ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada1: Command Queueing enabled > ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) > ada2 at ahcich3 bus 0 scbus4 target 0 lun 0 > ada2: ATA-8 SATA 2.x device > ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada2: Command Queueing enabled > ada2: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C) > ada3 at ahcich4 bus 0 scbus5 target 0 lun 0 > ada3: ATA-8 SATA 3.x device > ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada3: Command Queueing enabled > ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) > ada4 at ahcich5 bus 0 scbus6 target 0 lun 0 > ada4: ATA-8 SATA 3.x device > ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada4: Command Queueing enabled > ada4: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) > ada5 at ata1 bus 0 scbus8 target 0 lun 0 > ada5: ATA-8 SATA 3.x device > ada5: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes) > ada5: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) > > CPU: AMD Phenom(tm) II X4 920 Processor (2800.19-MHz K8-class CPU) > ... > real memory = 4294967296 (4096 MB) > avail memory = 3840598016 (3662 MB) > > ZFS filesystem version 5 > ZFS storage pool version 28 > > > Best practices: > > Tune the sysctls related to buffer sizes / queue depth. > Label your disks before you build the zpool. > Use gnop to 4kb align the disks. Only one disk in the pool needs this > before you create it. > Use CAM. > *** USE A LOG DEVICE! *** > > -Pierre > > > > > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Sun May 1 01:13:10 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BA0F9106564A for ; Sun, 1 May 2011 01:13:10 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 6E8C18FC0A for ; Sun, 1 May 2011 01:13:10 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEAGmwvE2DaFvO/2dsb2JhbACEUaI+iHGrLo9qhH+BAQSOeY4+ X-IronPort-AV: E=Sophos;i="4.64,295,1301889600"; d="scan'208";a="120040066" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 30 Apr 2011 21:01:08 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 81376B3F6F; Sat, 30 Apr 2011 21:01:08 -0400 (EDT) Date: Sat, 30 Apr 2011 21:01:08 -0400 (EDT) From: Rick Macklem To: Kostik Belousov Message-ID: <149943048.820546.1304211668413.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20110430223412.GS48734@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_820545_1204398810.1304211668411" X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Cc: rmacklem@freebsd.org, fs@freebsd.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 May 2011 01:13:10 -0000 ------=_Part_820545_1204398810.1304211668411 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit > I just netbooted fresh GENERIC (with irrelevant local patch) over the > pxe, and got the following: > > # df -h > Filesystem Size Used Avail Capacity Mounted on > 192.168.102.110:/usr/home/kostik/build/bsd/DEV/netboot/x -267G 130G > -539G -32% / > > On the server side, it is up-to-date stable/8 with oldnfs server, > export is > /dev/ada1p2 1.8T 129G 1.5T 8% /usr/home > > Do we have some long-typed var lurking in new nfs client code, > instead of off_t ? I am almost sure this is nfs problem, since I > booted > i386 in the same setup month ago, and did not had the compaints from > sendmail about low space on spool (which is why I noted this issue > now). > > amd64 kernel (with nfscl loaded as module) correctly reports > 192.168.102.110:/usr/home/kostik/build/bsd/DEV/netboot/x 1.8T 129G > 1.5T 8% / Oops, I never noticed that the "struct statfs" fields had been bumped to 64bits. I've attached a patch for the client. Could you please test it? (I'll look in case the server has a similar problem.) Thanks for reporting it, rick ------=_Part_820545_1204398810.1304211668411 Content-Type: text/x-patch; name=statfs.patch Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename=statfs.patch LS0tIGZzL25mc2NsaWVudC9uZnNfY2xwb3J0LmMuc2F2CTIwMTEtMDQtMzAgMjA6MTY6MzkuMDAw MDAwMDAwIC0wNDAwCisrKyBmcy9uZnNjbGllbnQvbmZzX2NscG9ydC5jCTIwMTEtMDQtMzAgMjA6 NDU6MTYuMDAwMDAwMDAwIC0wNDAwCkBAIC0zOSw2ICszOSw3IEBAIF9fRkJTRElEKCIkRnJlZUJT RDogaGVhZC9zeXMvZnMvbmZzY2xpZW4KICAqIGJlIHRoZSBlYXNpZXN0IHdheSB0byBoYW5kbGUg dGhlIHBvcnQuCiAgKi8KICNpbmNsdWRlIDxzeXMvaGFzaC5oPgorI2luY2x1ZGUgPHN5cy9saW1p dHMuaD4KICNpbmNsdWRlIDxmcy9uZnMvbmZzcG9ydC5oPgogI2luY2x1ZGUgPG5ldGluZXQvaWZf ZXRoZXIuaD4KICNpbmNsdWRlIDxuZXQvaWZfdHlwZXMuaD4KQEAgLTgzOCwyMCArODM5LDE0IEBA IHZvaWQKIG5mc2NsX2xvYWRzYmluZm8oc3RydWN0IG5mc21vdW50ICpubXAsIHN0cnVjdCBuZnNz dGF0ZnMgKnNmcCwgdm9pZCAqc3RhdGZzKQogewogCXN0cnVjdCBzdGF0ZnMgKnNicCA9IChzdHJ1 Y3Qgc3RhdGZzICopc3RhdGZzOwotCW5mc3F1YWRfdCB0cXVhZDsKIAogCWlmIChubXAtPm5tX2Zs YWcgJiAoTkZTTU5UX05GU1YzIHwgTkZTTU5UX05GU1Y0KSkgewogCQlzYnAtPmZfYnNpemUgPSBO RlNfRkFCTEtTSVpFOwotCQl0cXVhZC5xdmFsID0gc2ZwLT5zZl90Ynl0ZXM7Ci0JCXNicC0+Zl9i bG9ja3MgPSAobG9uZykodHF1YWQucXZhbCAvICgodV9xdWFkX3QpTkZTX0ZBQkxLU0laRSkpOwot CQl0cXVhZC5xdmFsID0gc2ZwLT5zZl9mYnl0ZXM7Ci0JCXNicC0+Zl9iZnJlZSA9IChsb25nKSh0 cXVhZC5xdmFsIC8gKCh1X3F1YWRfdClORlNfRkFCTEtTSVpFKSk7Ci0JCXRxdWFkLnF2YWwgPSBz ZnAtPnNmX2FieXRlczsKLQkJc2JwLT5mX2JhdmFpbCA9IChsb25nKSh0cXVhZC5xdmFsIC8gKCh1 X3F1YWRfdClORlNfRkFCTEtTSVpFKSk7Ci0JCXRxdWFkLnF2YWwgPSBzZnAtPnNmX3RmaWxlczsK LQkJc2JwLT5mX2ZpbGVzID0gKHRxdWFkLmx2YWxbMF0gJiAweDdmZmZmZmZmKTsKLQkJdHF1YWQu cXZhbCA9IHNmcC0+c2ZfZmZpbGVzOwotCQlzYnAtPmZfZmZyZWUgPSAodHF1YWQubHZhbFswXSAm IDB4N2ZmZmZmZmYpOworCQlzYnAtPmZfYmxvY2tzID0gc2ZwLT5zZl90Ynl0ZXMgLyBORlNfRkFC TEtTSVpFOworCQlzYnAtPmZfYmZyZWUgPSBzZnAtPnNmX2ZieXRlcyAvIE5GU19GQUJMS1NJWkU7 CisJCXNicC0+Zl9iYXZhaWwgPSBzZnAtPnNmX2FieXRlcyAvIE5GU19GQUJMS1NJWkU7CisJCXNi cC0+Zl9maWxlcyA9IHNmcC0+c2ZfdGZpbGVzOworCQlzYnAtPmZfZmZyZWUgPSAoc2ZwLT5zZl9m ZmlsZXMgJiBPRkZfTUFYKTsKIAl9IGVsc2UgaWYgKChubXAtPm5tX2ZsYWcgJiBORlNNTlRfTkZT VjQpID09IDApIHsKIAkJc2JwLT5mX2JzaXplID0gKGludDMyX3Qpc2ZwLT5zZl9ic2l6ZTsKIAkJ c2JwLT5mX2Jsb2NrcyA9IChpbnQzMl90KXNmcC0+c2ZfYmxvY2tzOwo= ------=_Part_820545_1204398810.1304211668411-- From owner-freebsd-fs@FreeBSD.ORG Sun May 1 02:05:08 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 551EE106566B; Sun, 1 May 2011 02:05:08 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id A151B8FC12; Sun, 1 May 2011 02:05:07 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p4124KCG074095 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 1 May 2011 05:04:20 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id p4124HdV023240; Sun, 1 May 2011 05:04:17 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p4124HvV023239; Sun, 1 May 2011 05:04:17 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 1 May 2011 05:04:17 +0300 From: Kostik Belousov To: Rick Macklem Message-ID: <20110501020417.GW48734@deviant.kiev.zoral.com.ua> References: <20110430223412.GS48734@deviant.kiev.zoral.com.ua> <149943048.820546.1304211668413.JavaMail.root@erie.cs.uoguelph.ca> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="drH2ShMMYOwzF+Tg" Content-Disposition: inline In-Reply-To: <149943048.820546.1304211668413.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: rmacklem@freebsd.org, fs@freebsd.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 May 2011 02:05:08 -0000 --drH2ShMMYOwzF+Tg Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Apr 30, 2011 at 09:01:08PM -0400, Rick Macklem wrote: > > I just netbooted fresh GENERIC (with irrelevant local patch) over the > > pxe, and got the following: > >=20 > > # df -h > > Filesystem Size Used Avail Capacity Mounted on > > 192.168.102.110:/usr/home/kostik/build/bsd/DEV/netboot/x -267G 130G > > -539G -32% / > >=20 > > On the server side, it is up-to-date stable/8 with oldnfs server, > > export is > > /dev/ada1p2 1.8T 129G 1.5T 8% /usr/home > >=20 > > Do we have some long-typed var lurking in new nfs client code, > > instead of off_t ? I am almost sure this is nfs problem, since I > > booted > > i386 in the same setup month ago, and did not had the compaints from > > sendmail about low space on spool (which is why I noted this issue > > now). > >=20 > > amd64 kernel (with nfscl loaded as module) correctly reports > > 192.168.102.110:/usr/home/kostik/build/bsd/DEV/netboot/x 1.8T 129G > > 1.5T 8% / > Oops, I never noticed that the "struct statfs" fields had been bumped > to 64bits. I've attached a patch for the client. Could you please test > it? (I'll look in case the server has a similar problem.) >=20 > Thanks for reporting it, rick Thank you for the quick fixed. Patch is fine. 192.168.102.110:/usr/home/kostik/build/bsd/DEV/netboot/x 1.7T 130G = 1.5T 8% / --drH2ShMMYOwzF+Tg Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk28v6EACgkQC3+MBN1Mb4irHACg86T0P5gwk/djC8ZcZK4NsF0p lFsAoKLODsjse8CT/Lh2pCJwtMqGekbL =6PO1 -----END PGP SIGNATURE----- --drH2ShMMYOwzF+Tg-- From owner-freebsd-fs@FreeBSD.ORG Sun May 1 03:00:43 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9A7D9106566C for ; Sun, 1 May 2011 03:00:43 +0000 (UTC) (envelope-from des@des.no) Received: from smtp.des.no (smtp.des.no [194.63.250.102]) by mx1.freebsd.org (Postfix) with ESMTP id 5757B8FC08 for ; Sun, 1 May 2011 03:00:43 +0000 (UTC) Received: from ds4.des.no (des.no [84.49.246.2]) by smtp.des.no (Postfix) with ESMTP id 91F7C1FFC35; Sun, 1 May 2011 03:00:41 +0000 (UTC) Received: by ds4.des.no (Postfix, from userid 1001) id A08DC84495; Sun, 1 May 2011 05:00:38 +0200 (CEST) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: Rick Macklem References: <1591833752.819973.1304207152724.JavaMail.root@erie.cs.uoguelph.ca> Date: Sun, 01 May 2011 05:00:38 +0200 In-Reply-To: <1591833752.819973.1304207152724.JavaMail.root@erie.cs.uoguelph.ca> (Rick Macklem's message of "Sat, 30 Apr 2011 19:45:52 -0400 (EDT)") Message-ID: <8662pvf7l5.fsf@ds4.des.no> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: RFC: make the experimental NFS subsystem the default one X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 May 2011 03:00:43 -0000 Rick Macklem writes: > "Dag-Erling Sm=C3=B8rgrav" writes: >> case "`mount -d -a -t nfs 2> /dev/null`" in >> *mount_nfs*) >> # Handle absent nfs client support >> load_kld -m nfs nfsclient || return 1 >> ;; >> esac > Yep, I spotted that, but haven`t had a chance to reproduce it and test > a fix yet. My first attempt at fixing it will be to change the line to: The simplest fix is to add a mount_oldnfs case to the switch so the script knows the old NFS stack is already loaded. DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-fs@FreeBSD.ORG Sun May 1 11:36:48 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B648D1065670; Sun, 1 May 2011 11:36:48 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id 714298FC18; Sun, 1 May 2011 11:36:48 +0000 (UTC) Received: from outgoing.leidinger.net (p5B1559A4.dip.t-dialin.net [91.21.89.164]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id BED3D84400D; Sun, 1 May 2011 13:36:33 +0200 (CEST) Received: from unknown (IO.Leidinger.net [192.168.2.110]) by outgoing.leidinger.net (Postfix) with ESMTP id 5A5F5119D; Sun, 1 May 2011 13:36:30 +0200 (CEST) Date: Sun, 1 May 2011 13:36:27 +0200 From: Alexander Leidinger To: "Emil Smolenski" Message-ID: <20110501133627.00006616@unknown> In-Reply-To: References: X-Mailer: Claws Mail 3.7.8cvs47 (GTK+ 2.16.6; i586-pc-mingw32msvc) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: BED3D84400D.A25D0 X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=-0.923, required 6, autolearn=disabled, ALL_TRUSTED -1.00, TW_ZD 0.08) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1304854595.26307@WuxNArPge6heNDqm83txWA X-EBL-Spam-Status: No Cc: freebsd-fs@freebsd.org, dfr@FreeBSD.org, jhb@FreeBSD.org Subject: Re: [ZFS] Booting from zpool created on 4k-sector drive X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 May 2011 11:36:48 -0000 On Tue, 21 Dec 2010 15:29:01 +0100 "Emil Smolenski" wrote: > Hello, > > There is a hack to force zpool creation with minimum sector size > equal to 4k: > > # gnop create -S 4096 ${DEV0} > # zpool create tank ${DEV0}.nop > # zpool export tank > # gnop destroy ${DEV0}.nop > # zpool import tank > > Zpool created this way is much faster on problematic 4k sector > drives which lies about its sector size (like WD EARS). This hack > works perfectly fine when system is running. Gnop layer is created > only for "zpool create" command -- ZFS stores information about > sector size in its metadata. After zpool creation one can export the > pool, remove gnop layer and reimport the pool. Difference can be seen > in the output from the zdb command: > > - on 512 sector device (2**9 = 512): > % zdb tank |grep ashift > ashift=9 > > - on 4096 sector device (2**12 = 4096): > % zdb tank |grep ashift > ashift=12 > > This change is permanent. The only possibility to change the value > of ashift is: zpool destroy/create and restoring pool from backup. > > But there is one problem: I cannot boot from such pool. Error message: > > ZFS: i/o error - all block copies unavailable > ZFS: can't read MOS > ZFS: unexpected object set type 0 FYI: I can boot successfully from a ZFS v28 pool which was created like this in a GPT partition (tested with 9-current). Bye, Alexander. -- http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Sun May 1 13:15:17 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ADB87106564A for ; Sun, 1 May 2011 13:15:17 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx07.syd.optusnet.com.au (fallbackmx07.syd.optusnet.com.au [211.29.132.9]) by mx1.freebsd.org (Postfix) with ESMTP id 358A28FC13 for ; Sun, 1 May 2011 13:15:16 +0000 (UTC) Received: from mail08.syd.optusnet.com.au (mail08.syd.optusnet.com.au [211.29.132.189]) by fallbackmx07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p41C4vUM003116 for ; Sun, 1 May 2011 22:04:57 +1000 Received: from c122-106-155-58.carlnfd1.nsw.optusnet.com.au (c122-106-155-58.carlnfd1.nsw.optusnet.com.au [122.106.155.58]) by mail08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p41C4q2e020107 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 1 May 2011 22:04:53 +1000 Date: Sun, 1 May 2011 22:04:52 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Rick Macklem In-Reply-To: <149943048.820546.1304211668413.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: <20110501184904.S975@besplex.bde.org> References: <149943048.820546.1304211668413.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: rmacklem@freebsd.org, fs@freebsd.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 May 2011 13:15:17 -0000 On Sat, 30 Apr 2011, Rick Macklem wrote: > Oops, I never noticed that the "struct statfs" fields had been bumped > to 64bits. I've attached a patch for the client. Could you please test > it? (I'll look in case the server has a similar problem.) Sigh, bugs in this area are very old and still present. % --- fs/nfsclient/nfs_clport.c.sav 2011-04-30 20:16:39.000000000 -0400 % +++ fs/nfsclient/nfs_clport.c 2011-04-30 20:45:16.000000000 -0400 % @@ -39,6 +39,7 @@ __FBSDID("$FreeBSD: head/sys/fs/nfsclien % * be the easiest way to handle the port. % */ % #include % +#include Only needed to implement a bug. % #include % #include % #include % @@ -838,20 +839,14 @@ void % nfscl_loadsbinfo(struct nfsmount *nmp, struct nfsstatfs *sfp, void *statfs) % { % struct statfs *sbp = (struct statfs *)statfs; % - nfsquad_t tquad; % % if (nmp->nm_flag & (NFSMNT_NFSV3 | NFSMNT_NFSV4)) { % sbp->f_bsize = NFS_FABLKSIZE; % - tquad.qval = sfp->sf_tbytes; % - sbp->f_blocks = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE)); % - tquad.qval = sfp->sf_fbytes; % - sbp->f_bfree = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE)); % - tquad.qval = sfp->sf_abytes; % - sbp->f_bavail = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE)); % - tquad.qval = sfp->sf_tfiles; % - sbp->f_files = (tquad.lval[0] & 0x7fffffff); % - tquad.qval = sfp->sf_ffiles; % - sbp->f_ffree = (tquad.lval[0] & 0x7fffffff); This mail is too short to describe all the bugs on the above. The old nfs client still has the following ones: - bogus variable tquad - bogus and broken masking for f_files by 0x7fffffff. v3 can pass us a count >= 2**31. The bogus masking breaks such counts. When f_files was only long, we had to do something for values larger than LONG_MAX. We should have clamped to LONG_MAX. (See cvtstatfs() which does this now for the corresponding problem for ostatfs().) Instead, we bogusly cast. 0x7fffffff is just a misspelling of LONG_MAX which happens to be correct for 32-bit 2's complement longs. That combined with the server also being limited to 32 bits is the one case where the cast works as intended, and even then it is quite broken -- it just loses the top bit of values between 2**31 and 2**32-1. Perhaps the protocol prohibits such values, but at least FreeBSD servers take null care not to send them -- see below. - bogus and even more broken masking for f_ffree. This was broken even when f_ffree was long and long was 32 bits. Then the mask just destroys the sign bit, which a non-broken server will have passed us as the top bit in a 32-bit unsigned value. % + sbp->f_blocks = sfp->sf_tbytes / NFS_FABLKSIZE; % + sbp->f_bfree = sfp->sf_fbytes / NFS_FABLKSIZE; % + sbp->f_bavail = sfp->sf_abytes / NFS_FABLKSIZE; The conversion for f_bavail still has sign extension bugs. f_bavail can be negative on the server. A non-broken (FreeBSD) server passes us this negative value as a uint64_t value with the top bit set. It will be >= 2**63 as an unsigned value and dividing by NFS_FABSLBKSIZE = 2**9 makes it between 2**54 and 2**55-1; all trace of its signeness is lost, exccept we know that garbage values in the range 2**54 to 2**55-1 mean this overflow error. The old nfs client still has this bug. Old versions of the old nfs client had broken scaling which paniced trying to use the values near 2**54 given by the above overflow bugs. See statfs_scale_blocks() for non-broken scaling for the corresponding problem for ostatfs(). Someone broke the old FreeBSD nfs server to work around the broken FreeBSD nfs client. It remains broken :-(. This bug is missing in the new nfs server -- it just passes the server's f_bavail :-), without paying quite enough attention to the sign bit. There is of cause a portability problem. We need to export negative values from FreeBSD servers to FreeBSD clients without breaking other combinations. The combination of the new nfs server with really old old nfs clients is broken :-). NetBSD's nfsclient has explicit code to try to handle this problem. I couldn't see how it could work -- negative values must be passed in some way, and there is nothing better than passing them as (large) positive values mod 2**64. IIRC, NetBSD changes the values but this cannot work since it loses info. Hmm, there are 3 fields to use (f_blocks, f_bfree and f_bavail). These provide some redundancy, but neither NetBSD's code nor anything that I could think of worked to recover ffs's negative avail counts from nonnegative values in these fields, and frobbing these fields would be unportable anyway. Both the nfs protocol and POSIX's statvfs() (?) API and types seem to be incapable of handing ffs's negative f_bavail counts (POSIX only has unsigned block counts...). % + sbp->f_files = sfp->sf_tfiles; Now correct, almost. As for the other fields, it tacitly assumes that the type of the lvalue is larger than the type of the lvalue. Both happen to be 64 bits here. For the signed fields, there assumption is strictly incorrect, since the lvalue is 64 bit signed while the rvalue is 64 bits unsigned. By "not paying quite enough attention to the sign bit" in the above, I mean that it is tacitly assumed that we can start with a 64-bit signed value, convert it to u_quad_t (another type error -- should not assume that u_quad_t is uint64_t), pass it through the nfs protocol, convert back to int64_t (or forget to convert back, as above), and recover the original value. This deserves special care since it abuses the protocol. % + sbp->f_ffree = (sfp->sf_ffiles & OFF_MAX); Any masking here is logically wrong, and in practice just destroys the sign bit, as described above for the 0x7fffffff mask with old 32 bit systems. Masking with OFF_MAX has additional logic errors. OFF_MAX is the maximum value for an off_t, but none of the types here has anything to do with off_t. % } else if ((nmp->nm_flag & NFSMNT_NFSV4) == 0) { % sbp->f_bsize = (int32_t)sfp->sf_bsize; % sbp->f_blocks = (int32_t)sfp->sf_blocks; I think this is just the v2 case. The old nfs client uses essentially the same bogus casts. No casts should be used (clamping should be used), but if we use casts it may be possible to use non-bogus ones. I think these are just no casts for the unsigned fields but int32_t for the signed ones. The v2 protocol is limited to 32 bits, and we can easily represent any 32-bit value since we have 64-bit fields for the lvalues. We just need to be careful with the sign bit (in the 31th bit of an unsigned value in the sfp fields), but can keep the 31th bit as an unsigned bit without problems now that the statfs fields are 64 bits. Casting for the unsigned fields now just breaks the value unnecessarily if the protocol manages to pass the 31th bit as a value bit for such fields. Servers should pay even more attention to unrepresentable bits than to sign bis, but pay considerably less. Both the old and the new nfs server blindly truncate f_bfree, etc., to 32 bits in the v2 case (except the old nfs server corrupts negative f_bavail to 0). (For v3, they tacitly assume that no truncation occurs on conversion to 64 bits.) The old nfs server also gratuitously breaks the file counts (f_files etc.) for the v3 case. It should use txdr_hyper(), but uses exdr_unsigned() plus extra code to lose 32 bits. This is fixed in NetBSD and in the new nfs server. Bruce From owner-freebsd-fs@FreeBSD.ORG Sun May 1 13:38:14 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0A5611065672; Sun, 1 May 2011 13:38:14 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.garage.freebsd.pl (60.wheelsystems.com [83.12.187.60]) by mx1.freebsd.org (Postfix) with ESMTP id AB1328FC08; Sun, 1 May 2011 13:38:13 +0000 (UTC) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 2DAA145CD9; Sun, 1 May 2011 15:38:12 +0200 (CEST) Received: from localhost (89-73-195-149.dynamic.chello.pl [89.73.195.149]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id A02CD4569A; Sun, 1 May 2011 15:38:06 +0200 (CEST) Date: Sun, 1 May 2011 15:37:52 +0200 From: Pawel Jakub Dawidek To: Alexander Leidinger Message-ID: <20110501133752.GC3245@garage.freebsd.pl> References: <4DBBB20A.5050102@FreeBSD.org> <20110430072831.GA65598@icarus.home.lan> <20110501000656.00007ea1@unknown> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="DKU6Jbt7q3WqK7+M" Content-Disposition: inline In-Reply-To: <20110501000656.00007ea1@unknown> X-OS: FreeBSD 9.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-0.6 required=4.5 tests=BAYES_00,RCVD_IN_SORBS_DUL autolearn=no version=3.0.4 Cc: freebsd-fs@freebsd.org, Alexander Motin Subject: Re: TRIM clustering X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 May 2011 13:38:14 -0000 --DKU6Jbt7q3WqK7+M Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, May 01, 2011 at 12:06:56AM +0200, Alexander Leidinger wrote: > On Sat, 30 Apr 2011 00:28:31 -0700 Jeremy Chadwick > wrote: >=20 > > On Sat, Apr 30, 2011 at 09:54:02AM +0300, Alexander Motin wrote: >=20 > > Other notes: TRIM needs to be supported on swap as well, and in my > > opinion this is just as important as it being in UFS. I'm not sure > > how one would implement that. >=20 > This brings up the question if a ZFS cache (where the contents do not > survive a reboot) is completely TRIMmed before used (and normally > trimmed during use)... It is not trimmed at all. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com --DKU6Jbt7q3WqK7+M Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAk29Yi8ACgkQForvXbEpPzSdbgCfZWoNZPhqrb4cIvpQM2hXuSGP ib4An0i7263o4rbpc1BS9OkH8cQo6XXS =8eY2 -----END PGP SIGNATURE----- --DKU6Jbt7q3WqK7+M-- From owner-freebsd-fs@FreeBSD.ORG Sun May 1 14:37:20 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 24711106566B; Sun, 1 May 2011 14:37:20 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 9EC008FC0C; Sun, 1 May 2011 14:37:19 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEAKhvvU2DaFvO/2dsb2JhbACEUaJBiHGpbo9dgSqDVYEBBI55jj4 X-IronPort-AV: E=Sophos;i="4.64,297,1301889600"; d="scan'208";a="120066265" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 01 May 2011 10:37:18 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 7E351B3F25; Sun, 1 May 2011 10:37:18 -0400 (EDT) Date: Sun, 1 May 2011 10:37:18 -0400 (EDT) From: Rick Macklem To: Bruce Evans Message-ID: <506337690.827521.1304260638431.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20110501184904.S975@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Cc: rmacklem@freebsd.org, fs@freebsd.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 May 2011 14:37:20 -0000 > > % + sbp->f_blocks = sfp->sf_tbytes / NFS_FABLKSIZE; > % + sbp->f_bfree = sfp->sf_fbytes / NFS_FABLKSIZE; > % + sbp->f_bavail = sfp->sf_abytes / NFS_FABLKSIZE; > > The conversion for f_bavail still has sign extension bugs. f_bavail > can be negative on the server. A non-broken (FreeBSD) server passes > us this negative value as a uint64_t value with the top bit set. Well, both RFC1813 (NFSv3) and RFC3530 (NFSv4) specify the value on the wire (sf_abytes) as uint64_t. Therefore a negative value can't be represented safely and non-FreeBSD clients/servers would be confused by cheating and putting the negative value on the wire. (I see you mention this further down.) The new server is broken in that it does not check for a negative value. It seems that the best approach for the server would be to send a 0 when f_bavail < 0. What else can you do without "cheating" and representing the value in a way that would be non-interoperable with non-BSD NFS clients? I agree the above is broken for the case where the high order bit of sf_abytes is set. How about the following code? sbp->f_bavail = (sfp->f_abytes & OFF_MAX) / NFS_FABLKSIZE; (Yea, I see later in the message that you don't think OFF_MAX is the appropriate way to represent the largest positive value that can be stored in int64_t. As you'll see below, I don't know the correct way to express this constant and would be happy to hear how to do it? See below for more on this.) > > Someone broke the old FreeBSD nfs server to work around the broken > FreeBSD > nfs client. It remains broken :-(. This bug is missing in the new nfs > server -- it just passes the server's f_bavail :-), without paying > quite > enough attention to the sign bit. > > There is of cause a portability problem. We need to export negative > values from FreeBSD servers to FreeBSD clients without breaking other > combinations. The combination of the new nfs server with really old > old nfs clients is broken :-). NetBSD's nfsclient has explicit code > to try to handle this problem. I couldn't see how it could work -- > negative values must be passed in some way, and there is nothing > better > than passing them as (large) positive values mod 2**64. IIRC, NetBSD > changes the values but this cannot work since it loses info. Hmm, > there are 3 fields to use (f_blocks, f_bfree and f_bavail). These > provide some redundancy, but neither NetBSD's code nor anything that > I could think of worked to recover ffs's negative avail counts from > nonnegative values in these fields, and frobbing these fields would > be unportable anyway. Both the nfs protocol and POSIX's statvfs() (?) > API and types seem to be incapable of handing ffs's negative f_bavail > counts (POSIX only has unsigned block counts...). > Well, as I noted above, all I think can be done is have the server reply 0 for the case where f_bavail is negative. (If the specs don't support negative values, that's all there is to it, I think?) > % + sbp->f_files = sfp->sf_tfiles; > > Now correct, almost. As for the other fields, it tacitly assumes that > the type of the lvalue is larger than the type of the lvalue. Both > happen to be 64 bits here. For the signed fields, there assumption > is strictly incorrect, since the lvalue is 64 bit signed while the > rvalue is 64 bits unsigned. By "not paying quite enough attention to > the sign bit" in the above, I mean that it is tacitly assumed that > we can start with a 64-bit signed value, convert it to u_quad_t > (another type error -- should not assume that u_quad_t is uint64_t), > pass it through the nfs protocol, convert back to int64_t (or forget > to convert back, as above), and recover the original value. This > deserves special care since it abuses the protocol. > Well, there are a LOT of places where the code uses u_quad_t to represent what is now uint64_t. What can I say. I wrote this code over about 10years (based on even much older code) and, being an old K&R C guy assumed that u_quad_t was the way to declare an unsigned 64bit value. If/when u_quad_t doesn't define an unsigned 64bit value like uint64_t does, I'll need a lot of warning, because I have a LOT of editting to do. Other than that, the RFCs specify sf_tfiles as uint64_t and "struct statfs" has f_files as a uint64_t. So, unless there are plans to make it signed on FreeBSD, I don't see a problem here? > % + sbp->f_ffree = (sfp->sf_ffiles & OFF_MAX); > > Any masking here is logically wrong, and in practice just destroys the > sign bit, as described above for the 0x7fffffff mask with old 32 bit > systems. Masking with OFF_MAX has additional logic errors. OFF_MAX > is the maximum value for an off_t, but none of the types here has > anything to do with off_t. > Ok, sf_ffiles is defined as uint64_t on the wire. Therefore there is no sign bit. The problem is that it could be a larger positive value than FreeBSD supports. All I wanted this code to do is make it the largest positive value that will fit in int64_t. (I used OFF_MAX because you suggested in a previous email that that was preferable to 0x7fffffffffffffffLLU for nm_maxfilesize. I don't see anything like INT64_MAX, UINT64_MAX in FreeBSD's limits.h) Would if (sfp->sf_ffiles > UINT64_MAX) sbp->f_ffree = INT64_MAX; else sbp->f_ffree = sfp->sf_ffiles; - except there isn't a UINT64_MAX, INT64_MAX defined in sys/*.h as far as I can see. How do I express these constants? Do I have to convert 0x7ffffffffffffff to decimal and use that? > % } else if ((nmp->nm_flag & NFSMNT_NFSV4) == 0) { > % sbp->f_bsize = (int32_t)sfp->sf_bsize; > % sbp->f_blocks = (int32_t)sfp->sf_blocks; > > I think this is just the v2 case. The old nfs client uses essentially > the same bogus casts. No casts should be used (clamping should be > used), but if we use casts it may be possible to use non-bogus ones. > I think these are just no casts for the unsigned fields but int32_t > for the signed ones. The v2 protocol is limited to 32 bits, and we > can easily represent any 32-bit value since we have 64-bit fields for > the lvalues. We just need to be careful with the sign bit (in the > 31th bit of an unsigned value in the sfp fields), but can keep the > 31th bit as an unsigned bit without problems now that the statfs > fields > are 64 bits. Casting for the unsigned fields now just breaks the value > unnecessarily if the protocol manages to pass the 31th bit as a value > bit for such fields. > Ok, I could take the casts off. I think the effect would be that, for the case where sf_bavail has its high order (bit 31) set, it will be seen as a larger positive value. (sf_bavail is u_int32_t) This would be correct per the RFCs, since RFC1094 defines the fields as uint32_t. Now, if servers were "cheating" and putting the negative values in the field on the wire, it will change the semantics a bit. I'll admit I tend to feel that the safest thing is to just leave it the way it is, since no one is complaining about the semantics and I'd rather not "break" anything by fixing the semantics to agree with thr RFC. > Servers should pay even more attention to unrepresentable bits than to > sign bis, but pay considerably less. Both the old and the new nfs > server > blindly truncate f_bfree, etc., to 32 bits in the v2 case (except the > old > nfs server corrupts negative f_bavail to 0). As above, I have to disagree with this. If the RFCs say it can't be negative, then sending negative values as 0 is all that can be done, as far as I can see. (I think the old server got this case correct and the new server needs to be fixed.) > (For v3, they tacitly > assume > that no truncation occurs on conversion to 64 bits.) > > The old nfs server also gratuitously breaks the file counts (f_files > etc.) for the v3 case. It should use txdr_hyper(), but uses > exdr_unsigned() > plus extra code to lose 32 bits. This is fixed in NetBSD and in the > new nfs server. > At this point, I like to leave the old server unchanged. That way, if anyone runs into problems w.r.t. differences in semantics (even if they seem to violate the RFC), they can switch to the old server while things get sorted out. rick From owner-freebsd-fs@FreeBSD.ORG Sun May 1 15:17:44 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8F2FE1065674; Sun, 1 May 2011 15:17:44 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id 5755D8FC14; Sun, 1 May 2011 15:17:44 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id p41FHgSU024474; Sun, 1 May 2011 10:17:42 -0500 (CDT) Date: Sun, 1 May 2011 10:17:42 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Alexander Motin In-Reply-To: <4DBBE985.9000701@FreeBSD.org> Message-ID: References: <20110430072831.GA65598@icarus.home.lan> <4DBBE985.9000701@FreeBSD.org> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Sun, 01 May 2011 10:17:42 -0500 (CDT) Cc: freebsd-fs@freebsd.org Subject: Re: TRIM clustering X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 May 2011 15:17:44 -0000 On Sat, 30 Apr 2011, Alexander Motin wrote: >> >> well not all devices take it as a hit.. The suggestion of some sort of >> clustering is a good one but it should be tunable. > > I believe any device should benefit from receiving single 128K request > instead of 8*16k. Just because of command processing overhead. Am I wrong? Since I have not seen it mentioned in this discussion thread yet, it is worth pointing out that if TRIM has already been issued for a block that the filesystem can not re-use that space for storage until the TRIM request is completed. Otherwise in-use blocks might get TRIMmed, resulting in filesystem destruction. If the system should spontaneously reboot, then there may be a mismatch between the filesystem's notion of free blocks and the FLASH device's notion of free blocks. In fact, if the kernel panics, the device may continue trimming blocks after the system is gone (because power is still on). Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Sun May 1 15:22:44 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 08EAB1065670; Sun, 1 May 2011 15:22:44 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id ABA558FC0C; Sun, 1 May 2011 15:22:43 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEADZ6vU2DaFvO/2dsb2JhbACEUaJDiHGpaY9dgSqDVYEBBI55jj4 X-IronPort-AV: E=Sophos;i="4.64,298,1301889600"; d="scan'208";a="119234064" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 01 May 2011 11:22:42 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id BC6B3B3F3B; Sun, 1 May 2011 11:22:42 -0400 (EDT) Date: Sun, 1 May 2011 11:22:42 -0400 (EDT) From: Rick Macklem To: Bruce Evans Message-ID: <63264466.828351.1304263362674.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <506337690.827521.1304260638431.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Cc: rmacklem@freebsd.org, fs@freebsd.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 May 2011 15:22:44 -0000 > Would > > if (sfp->sf_ffiles > UINT64_MAX) > sbp->f_ffree = INT64_MAX; > else > sbp->f_ffree = sfp->sf_ffiles; > Oops, I shouldn't have called this UINT64_MAX. What I meant was the same value as INT64_MAX, but of uint64_t. Something like: (uint64_t)INT64_MAX OR 0x7fffffffffffffff rick From owner-freebsd-fs@FreeBSD.ORG Sun May 1 15:48:12 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 974EB1065670 for ; Sun, 1 May 2011 15:48:12 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta11.emeryville.ca.mail.comcast.net (qmta11.emeryville.ca.mail.comcast.net [76.96.27.211]) by mx1.freebsd.org (Postfix) with ESMTP id 7DD388FC17 for ; Sun, 1 May 2011 15:48:12 +0000 (UTC) Received: from omta21.emeryville.ca.mail.comcast.net ([76.96.30.88]) by qmta11.emeryville.ca.mail.comcast.net with comcast id eFTZ1g0011u4NiLABFb11p; Sun, 01 May 2011 15:35:01 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta21.emeryville.ca.mail.comcast.net with comcast id eFb01g00Z1t3BNj8hFb1l1; Sun, 01 May 2011 15:35:01 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 84BFE9B418; Sun, 1 May 2011 08:35:00 -0700 (PDT) Date: Sun, 1 May 2011 08:35:00 -0700 From: Jeremy Chadwick To: Rick Macklem Message-ID: <20110501153500.GA99593@icarus.home.lan> References: <20110501184904.S975@besplex.bde.org> <506337690.827521.1304260638431.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <506337690.827521.1304260638431.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: rmacklem@freebsd.org, fs@freebsd.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 May 2011 15:48:12 -0000 [snip] On Sun, May 01, 2011 at 10:37:18AM -0400, Rick Macklem wrote: > > % + sbp->f_ffree = (sfp->sf_ffiles & OFF_MAX); > > > > Any masking here is logically wrong, and in practice just destroys the > > sign bit, as described above for the 0x7fffffff mask with old 32 bit > > systems. Masking with OFF_MAX has additional logic errors. OFF_MAX > > is the maximum value for an off_t, but none of the types here has > > anything to do with off_t. > > > > Ok, sf_ffiles is defined as uint64_t on the wire. Therefore there is > no sign bit. The problem is that it could be a larger positive value > than FreeBSD supports. All I wanted this code to do is make it the > largest positive value that will fit in int64_t. (I used OFF_MAX > because you suggested in a previous email that that was preferable > to 0x7fffffffffffffffLLU for nm_maxfilesize. I don't see anything > like INT64_MAX, UINT64_MAX in FreeBSD's limits.h) > Would > > if (sfp->sf_ffiles > UINT64_MAX) > sbp->f_ffree = INT64_MAX; > else > sbp->f_ffree = sfp->sf_ffiles; > > - except there isn't a UINT64_MAX, INT64_MAX defined in sys/*.h as > far as I can see. How do I express these constants? Do I have to > convert 0x7ffffffffffffff to decimal and use that? Aren't these effectively defined in as UQUAD_MAX and QUAD_MAX? These get translated/pulled in from , which varies per architecture. This looks like the translation based on looking at the respective include files per arch: i386: UQUAD_MAX == __UQUAD_MAX == __ULLONG_MAX == 0xffffffffffffffffULL i386: QUAD_MAX == __QUAD_MAX == __LLONG_MAX == 0x7fffffffffffffffLL amd64: UQUAD_MAX == __UQUAD_MAX == __ULONG_MAX == 0xffffffffffffffffUL amd64: QUAD_MAX == __QUAD_MAX == __LONG_MAX == 0x7fffffffffffffffL There are some #ifdef's in around some of these declarations which I don't understand (like __BSD_VISIBLE), but I would imagine the above declarations would do what you want. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Sun May 1 15:55:40 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DCDC8106566B; Sun, 1 May 2011 15:55:40 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 836E88FC0A; Sun, 1 May 2011 15:55:40 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEAEWBvU2DaFvO/2dsb2JhbACEUaJDskiPXYEqg1WBAQSOeY4+ X-IronPort-AV: E=Sophos;i="4.64,298,1301889600"; d="scan'208";a="120069861" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 01 May 2011 11:55:39 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id AA54AB3F3B; Sun, 1 May 2011 11:55:39 -0400 (EDT) Date: Sun, 1 May 2011 11:55:39 -0400 (EDT) From: Rick Macklem To: Jeremy Chadwick Message-ID: <1298790394.829218.1304265339603.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20110501153500.GA99593@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Cc: rmacklem@freebsd.org, fs@freebsd.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 May 2011 15:55:40 -0000 > > > > - except there isn't a UINT64_MAX, INT64_MAX defined in sys/*.h as > > far as I can see. How do I express these constants? Do I have to > > convert 0x7ffffffffffffff to decimal and use that? > > Aren't these effectively defined in as UQUAD_MAX and > QUAD_MAX? These get translated/pulled in from , > which varies per architecture. This looks like the translation based > on > looking at the respective include files per arch: > > i386: UQUAD_MAX == __UQUAD_MAX == __ULLONG_MAX == > 0xffffffffffffffffULL > i386: QUAD_MAX == __QUAD_MAX == __LLONG_MAX == 0x7fffffffffffffffLL > > amd64: UQUAD_MAX == __UQUAD_MAX == __ULONG_MAX == 0xffffffffffffffffUL > amd64: QUAD_MAX == __QUAD_MAX == __LONG_MAX == 0x7fffffffffffffffL > > There are some #ifdef's in around some of these > declarations which I don't understand (like __BSD_VISIBLE), but I > would > imagine the above declarations would do what you want. > Yep. And as far as I can see, OFF_MAX is defined exactly the same way for all arches. The only difference is the comments: /* max value for a quad_t */ vs /* max value for an off_t */ The post seemed to indicate that OFF_MAX wasn't the correct type and, later in it, that u_quad_t (the comment would presumably also apply to quad_t?) shouldn't be assumed the same as uint64_t. I'm happy to use anything that works, so if QUAD_MAX is preferable to OFF_MAX, I'll happily use it, rick From owner-freebsd-fs@FreeBSD.ORG Sun May 1 16:23:55 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 740D5106566C; Sun, 1 May 2011 16:23:55 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au [211.29.132.185]) by mx1.freebsd.org (Postfix) with ESMTP id 102658FC0A; Sun, 1 May 2011 16:23:54 +0000 (UTC) Received: from c122-106-155-58.carlnfd1.nsw.optusnet.com.au (c122-106-155-58.carlnfd1.nsw.optusnet.com.au [122.106.155.58]) by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p41GNojl001848 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 2 May 2011 02:23:52 +1000 Date: Mon, 2 May 2011 02:23:50 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Jeremy Chadwick In-Reply-To: <20110501153500.GA99593@icarus.home.lan> Message-ID: <20110502015700.Q2013@besplex.bde.org> References: <20110501184904.S975@besplex.bde.org> <506337690.827521.1304260638431.JavaMail.root@erie.cs.uoguelph.ca> <20110501153500.GA99593@icarus.home.lan> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: rmacklem@freebsd.org, fs@freebsd.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 May 2011 16:23:55 -0000 On Sun, 1 May 2011, Jeremy Chadwick wrote: > [snip] > On Sun, May 01, 2011 at 10:37:18AM -0400, Rick Macklem wrote: >>> % + sbp->f_ffree = (sfp->sf_ffiles & OFF_MAX); >>> >>> Any masking here is logically wrong, and in practice just destroys the >>> sign bit, as described above for the 0x7fffffff mask with old 32 bit This got a bit tangled. I will reply more to the older reply. >> - except there isn't a UINT64_MAX, INT64_MAX defined in sys/*.h as >> far as I can see. How do I express these constants? Do I have to >> convert 0x7ffffffffffffff to decimal and use that? UINT64_MAX, etc., are defined in , which doesn't even need to be included explicitly, since it is (bogusly) standard namespace pollution in . This namespace pollution gives the bizarre situation that you have to include to get the limits for basic types, but you get the limits for the fix-with types whether you want them or not, except in rare cases where is not needed for other reasons. > Aren't these effectively defined in as UQUAD_MAX and > QUAD_MAX? These get translated/pulled in from , > which varies per architecture. This looks like the translation based on > looking at the respective include files per arch: No. UQUAD_MAX and QUAD_MAX are historical mistakes. quad_t should be 4 times as large as a machine register, but for compatibility it must be precisely 64 bits. This makes it just an obfuscation in new code (code newer than 199 when int64_t became Standard with C99). > i386: UQUAD_MAX == __UQUAD_MAX == __ULLONG_MAX == 0xffffffffffffffffULL > i386: QUAD_MAX == __QUAD_MAX == __LLONG_MAX == 0x7fffffffffffffffLL > > amd64: UQUAD_MAX == __UQUAD_MAX == __ULONG_MAX == 0xffffffffffffffffUL > amd64: QUAD_MAX == __QUAD_MAX == __LONG_MAX == 0x7fffffffffffffffL > > There are some #ifdef's in around some of these > declarations which I don't understand (like __BSD_VISIBLE), but I would > imagine the above declarations would do what you want. These are just ways of spelling 2**64-1 and 2**63-1. For all fixed-with types, macros for the limits aren't really needed, since they are almost machine-dependent so you can almost write them as hex constants even more easily than you can remember where their macros are defined. But there are subleties for their types. These are visible in the above definitions. On amd64, their basic types are unsigned long and long, respectively, while on i386 their types are unsigned long long and long long. respectively. Also, type suffixes on the hex constants may be necessary for technical and bogonial reasons. Type suffixes should not be needed for unsigned constants, but header files must use them even then to prevent warnings from cc -std89 for literal constants larger than ULONG_MAX (gcc warns about this because C90 doesn;t support integer types larger than unsigned long). Type suffixes are needed for signed constants like QUAD_MAX to make the constant have a signed type instead of the default of unsigned int (for constants larger than UINT_MAX). Most code using the constants doesn't care about these subtleties, so it can use its own literal constant. Bruce From owner-freebsd-fs@FreeBSD.ORG Sun May 1 16:32:39 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 257B51065679 for ; Sun, 1 May 2011 16:32:39 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id DEB198FC17 for ; Sun, 1 May 2011 16:32:38 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id p41GWXeu024710; Sun, 1 May 2011 11:32:33 -0500 (CDT) Date: Sun, 1 May 2011 11:32:33 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Rick Macklem In-Reply-To: <640208384.682241.1303948694525.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: References: <640208384.682241.1303948694525.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Sun, 01 May 2011 11:32:34 -0500 (CDT) Cc: freebsd-fs@freebsd.org Subject: Re: make the experimental NFS subsystem the default one X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 May 2011 16:32:39 -0000 On Wed, 27 Apr 2011, Rick Macklem wrote: > > I don't know anything about ZFS, but I would think that, if you see a > major performance improvement, that ZFS isn't committing stuff to logs > so that data won't be lost. > > Maybe the ZFS folks can comment? (I don't remember seeing the details > of what you change? If you sent a patch, sorry, but I've misplaced it.) Zfs will loose as much as 5 seconds worth of data (and maybe even 10 seconds) if the data is written slowly and/or the server has quite a lot of RAM. It commits data in order so the written data will be completely coherent for that snapshot in time, but the result may still be completely corrupted from the client's perspective. 5 (or 10!) seconds of data could be quite a lot of data, and could represent entire new directory trees, or large directory trees which were removed. Individual file content could be overwritten hundreds of times before the point where the server arbitrarily decides to commit it. If the server bounces, its data won't match what the client thinks it should have. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Sun May 1 16:44:53 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D8B3C1065675; Sun, 1 May 2011 16:44:53 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 7E62F8FC12; Sun, 1 May 2011 16:44:53 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEAPiMvU2DaFvO/2dsb2JhbACEUaJCsmGPVoEqg1WBAQSOeY4+ X-IronPort-AV: E=Sophos;i="4.64,298,1301889600"; d="scan'208";a="119237914" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 01 May 2011 12:44:52 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id AE643B4163; Sun, 1 May 2011 12:44:52 -0400 (EDT) Date: Sun, 1 May 2011 12:44:52 -0400 (EDT) From: Rick Macklem To: Bruce Evans Message-ID: <1211771823.830180.1304268292625.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20110502015700.Q2013@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Cc: rmacklem@freebsd.org, fs@freebsd.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 May 2011 16:44:53 -0000 > > >> - except there isn't a UINT64_MAX, INT64_MAX defined in sys/*.h as > >> far as I can see. How do I express these constants? Do I have to > >> convert 0x7ffffffffffffff to decimal and use that? > > UINT64_MAX, etc., are defined in , which doesn't even > need > to be included explicitly, since it is (bogusly) standard namespace > pollution in . This namespace pollution gives the bizarre > situation that you have to include to get the limits > for > basic types, but you get the limits for the fix-with types whether you > want them or not, except in rare cases where is not > needed > for other reasons. > Ok, now I see them (in machine/include/_stdint.h). Appologies for the noise. I grep'd sys/sys and couldn't find anything called (U)INT64_MAX. Now, remembering that sf_abytes is uint64_t per the RFCs, what do people think of either of these? if (sfp->sf_abytes > INT64_MAX) sbp->f_bavail = INT64_MAX; else sbp->f_bavail = sfp->sf_abytes / NFS_FABLKSIZE; Or should I try and do the division to see if the large value in sf_abytes will fit in INT64_MAX after the division? Something like: int64_t tmp; tmp = sfp->sf_abytes; tmp /= NFS_FABLKSIZE; if (tmp < 0) sbp->f_bavail = INT64_MAX; else sbp->f_bavail = tmp; Neither tested, of course, rick From owner-freebsd-fs@FreeBSD.ORG Sun May 1 17:39:13 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 264F0106566B; Sun, 1 May 2011 17:39:13 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id D96A98FC17; Sun, 1 May 2011 17:39:12 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id p41HQqSu024922; Sun, 1 May 2011 12:26:52 -0500 (CDT) Date: Sun, 1 May 2011 12:26:52 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Rick Macklem In-Reply-To: <1211771823.830180.1304268292625.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: References: <1211771823.830180.1304268292625.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Sun, 01 May 2011 12:26:52 -0500 (CDT) Cc: rmacklem@freebsd.org, fs@freebsd.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 May 2011 17:39:13 -0000 On Sun, 1 May 2011, Rick Macklem wrote: > > Or should I try and do the division to see if the large > value in sf_abytes will fit in INT64_MAX after the division? Something > like: > int64_t tmp; > > tmp = sfp->sf_abytes; > tmp /= NFS_FABLKSIZE; > if (tmp < 0) > sbp->f_bavail = INT64_MAX; > else > sbp->f_bavail = tmp; That one seems better because it preserves more of the value, but perhaps this is better because it does not depend on undocumented/undefined behavior (also untested): uint64_t tmp; tmp = sfp->sf_abytes / NFS_FABLKSIZE; if (tmp > (uint64_t) INT64_MAX) sbp->f_bavail = INT64_MAX; else sbp->f_bavail = tmp; Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Sun May 1 17:52:51 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 59C87106564A; Sun, 1 May 2011 17:52:51 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au [211.29.132.185]) by mx1.freebsd.org (Postfix) with ESMTP id D65188FC17; Sun, 1 May 2011 17:52:50 +0000 (UTC) Received: from c122-106-155-58.carlnfd1.nsw.optusnet.com.au (c122-106-155-58.carlnfd1.nsw.optusnet.com.au [122.106.155.58]) by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p41Hqle7007758 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 2 May 2011 03:52:48 +1000 Date: Mon, 2 May 2011 03:52:47 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Rick Macklem In-Reply-To: <506337690.827521.1304260638431.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: <20110502022441.H2013@besplex.bde.org> References: <506337690.827521.1304260638431.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: rmacklem@freebsd.org, fs@freebsd.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 May 2011 17:52:51 -0000 On Sun, 1 May 2011, Rick Macklem wrote: >> >> % + sbp->f_blocks = sfp->sf_tbytes / NFS_FABLKSIZE; >> % + sbp->f_bfree = sfp->sf_fbytes / NFS_FABLKSIZE; >> % + sbp->f_bavail = sfp->sf_abytes / NFS_FABLKSIZE; >> >> The conversion for f_bavail still has sign extension bugs. f_bavail >> can be negative on the server. A non-broken (FreeBSD) server passes >> us this negative value as a uint64_t value with the top bit set. > > Well, both RFC1813 (NFSv3) and RFC3530 (NFSv4) specify the value on > the wire (sf_abytes) as uint64_t. Therefore a negative value can't > be represented safely and non-FreeBSD clients/servers would be > confused by cheating and putting the negative value on the wire. > (I see you mention this further down.) But it can be represented. FreeBSD servers always put it on the wire (if the server file system has a negative value) until the old nfs server broke it. I can only find a few FreeBSD clients that aren't confused by this: - most or all clients work for the v2 case, because v2 doesn't need to scale for f_bavail, and copying the 31st (unsigned) bit to the 31st (signed) bit mostly works (except everthing breaks once the absolute values exceed 2**31-1 or 2**32.1). - most clients are broken for the v3 case. For negative f_bavail, sign-extension/overflow bugs in the scaling give a value of about 2**54. Assigning this to a 32-bit f_bavail gives unobvious garbage; assigning this to a 64-bit f_bavail gives obvious garbage. - my FreeBSD-~5.2 v3 client handles negative f_bavail correctly (by scaling a signed value). It doesn't fix f_ffree. (I see negative f_bavail quite often but never run into the reserve for f_ffree.) BTW, how does scaling of block counts by NFS_FABLKSIZE in the v3 (and v4?) cases work? I can only see it in clients. Servers seem to start with block counts and never convert to byte counts. > The new server is broken in that it does not > check for a negative value. It seems that the best approach for the > server would be to send a 0 when f_bavail < 0. What else can you do Hrmph. It is servers that check and send a 0 when f_bavail < 0 that are broken. > without "cheating" and representing the value in a way that would be > non-interoperable with non-BSD NFS clients? I don't know. See the NetBSD client for some ideas. Note that for blocks there are 2 "free" fields, f_bfree and f_bavail, while for files there is only 1 (f_ffree). You would think that the redundancy for blocks would allow passing a negative value as the difference of 2 nonnegative ones, but I couldn't make this work. For FreeBSD clients that can handle this, is it possible to negotiate the handling with the server? > I agree the above is broken for the case where the high order bit > of sf_abytes is set. How about the following code? > > sbp->f_bavail = (sfp->f_abytes & OFF_MAX) / NFS_FABLKSIZE; Doesn't work at all. The byte count is typically a small negative value, say -512. This should be scaled to -1. But sbp->f_bavail = (uint64_t)-512 = 0xfffffffffffffe00. Discarding just 1 top bit from this makes little difference to it. No amount of discarding top bits works correctly, The value must be negated: sbp->f_bavail = (int64_t)sbp->f_bavail / NFS_FABLKSIZE; See the scaling function in vfs_syscalls.c for a worse method. (1 technical difference: it wants to handle all fields using the same unsigned max count, so it uses the negative of f_bavail and needs a little more code for this. 1 unportability: it scales all the fields by right shifting, but right shifting of negative values is not guaranteed to handle the sign buit the same as division by a power of 2.) > (Yea, I see later in the message that you don't think > OFF_MAX is the appropriate > way to represent the largest positive value that can be stored > in int64_t. As you'll see below, I don't know the correct way to > express this constant and would be happy to hear how to do it? > See below for more on this.) > > ... > > Other than that, the RFCs specify sf_tfiles as uint64_t and > "struct statfs" has f_files as a uint64_t. So, unless there are plans to > make it signed on FreeBSD, I don't see a problem here? The problem is for f_bavail and f_ffree in statfs. These are intentionally signed to support ffs putting negative values in them. I think the protocol specifies uint64_t for sf_abytes and sf_ffiles, so there is a minor theoretical problem even if negative values aren't supported by nfs. (sf_abytes might be 2**64-512. Perhaps this is actually physically possible using a sparse mapping. After scaling by NFS_FABLKSIZE, we can represent this value despite using a signed type, but we have to know that it really is large unsigned and not negative. sf_ffiles might be 2**64-1, but this is physically impossible.) >> % + sbp->f_ffree = (sfp->sf_ffiles & OFF_MAX); >> >> Any masking here is logically wrong, and in practice just destroys the >> sign bit, as described above for the 0x7fffffff mask with old 32 bit >> systems. Masking with OFF_MAX has additional logic errors. OFF_MAX >> is the maximum value for an off_t, but none of the types here has >> anything to do with off_t. > > Ok, sf_ffiles is defined as uint64_t on the wire. Therefore there is > no sign bit. The problem is that it could be a larger positive value > than FreeBSD supports. All I wanted this code to do is make it the Everything is 64 bits, so there is no problem in practice. The signed type for f_ffiles gives a problem in theory -- it can only represent 63-bit unsigned value, but the wire has 64. But more than 2**63-1 files is physically impossible, so there is no problem in practice. You can either assume this, or write a maze of code to handle various combinations of type sizes. I prefer to not write actual code for this. A couple of assertions that the sizes are still 64 bits should be enough. > largest positive value that will fit in int64_t. (I used OFF_MAX > because you suggested in a previous email that that was preferable > to 0x7fffffffffffffffLLU for nm_maxfilesize. I don't see anything > like INT64_MAX, UINT64_MAX in FreeBSD's limits.h) These are in via standard namespace pollution -- see another reply. > Would > > if (sfp->sf_ffiles > UINT64_MAX) > sbp->f_ffree = INT64_MAX; > else > sbp->f_ffree = sfp->sf_ffiles; s/ffree/ffiles/, and a few other fixes from a later reply (s/UINT64_MAX/ INT64_MAX). INT64_MAX is currently the correct limit, but breaks automatically if someone changes the type of f_ffree. I have some fancy macros (never fully implemented in actual code) to determine the limits from the types (sizeof(sbp->f_ffree) gives the number of bits provided it is a fixed-width type...). > - except there isn't a UINT64_MAX, INT64_MAX defined in sys/*.h as > far as I can see. How do I express these constants? Do I have to > convert 0x7ffffffffffffff to decimal and use that? Avoid them if possible. You should only need them if you clamp the values. >> % } else if ((nmp->nm_flag & NFSMNT_NFSV4) == 0) { >> % sbp->f_bsize = (int32_t)sfp->sf_bsize; >> % sbp->f_blocks = (int32_t)sfp->sf_blocks; >> >> I think this is just the v2 case. The old nfs client uses essentially >> the same bogus casts. No casts should be used (clamping should be >> used), but if we use casts it may be possible to use non-bogus ones. >> I think these are just no casts for the unsigned fields but int32_t >> for the signed ones. The v2 protocol is limited to 32 bits, and we >> can easily represent any 32-bit value since we have 64-bit fields for >> the lvalues. We just need to be careful with the sign bit (in the >> 31th bit of an unsigned value in the sfp fields), but can keep the >> 31th bit as an unsigned bit without problems now that the statfs >> fields >> are 64 bits. Casting for the unsigned fields now just breaks the value >> unnecessarily if the protocol manages to pass the 31th bit as a value >> bit for such fields. >> > Ok, I could take the casts off. I think the effect would be that, for the > case where sf_bavail has its high order (bit 31) set, it will be seen as > a larger positive value. (sf_bavail is u_int32_t) This would be correct > per the RFCs, since RFC1094 defines the fields as uint32_t. Now, if > servers were "cheating" and putting the negative values in the field on > the wire, it will change the semantics a bit. The v2 case is much closer to hitting the limits, since we can now easily have a server with >= 2**32 512-blocks and someone might want to use the v2 protocol for it. ino_t is still 32 bits so file counts can't exceed the v2 protocol limits yet, but that will change soon. > I'll admit I tend to feel that the safest thing is to just leave it > the way it is, since no one is complaining about the semantics and I'd > rather not "break" anything by fixing the semantics to agree with thr RFC. > >> Servers should pay even more attention to unrepresentable bits than to >> sign bis, but pay considerably less. Both the old and the new nfs >> server >> blindly truncate f_bfree, etc., to 32 bits in the v2 case (except the >> old >> nfs server corrupts negative f_bavail to 0). > > As above, I have to disagree with this. If the RFCs say it can't be > negative, then sending negative values as 0 is all that can be done, > as far as I can see. (I think the old server got this case correct > and the new server needs to be fixed.) The corruption also involves sending positive values as 0. E.g., 4G on the server becomes 0 on the client after blind truncation to 32 bits. Clamping to UINT32_MAX or INT32_MAX would reduce problems from this. Only the server can do the clamping, since the client has no way to tell whether 0 really means 0. Bruce From owner-freebsd-fs@FreeBSD.ORG Sun May 1 18:12:51 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A18D2106564A; Sun, 1 May 2011 18:12:51 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au [211.29.132.185]) by mx1.freebsd.org (Postfix) with ESMTP id 4098B8FC14; Sun, 1 May 2011 18:12:50 +0000 (UTC) Received: from c122-106-155-58.carlnfd1.nsw.optusnet.com.au (c122-106-155-58.carlnfd1.nsw.optusnet.com.au [122.106.155.58]) by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p41IClBc016305 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 2 May 2011 04:12:49 +1000 Date: Mon, 2 May 2011 04:12:47 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Rick Macklem In-Reply-To: <1211771823.830180.1304268292625.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: <20110502035720.F2645@besplex.bde.org> References: <1211771823.830180.1304268292625.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: rmacklem@freebsd.org, fs@freebsd.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 May 2011 18:12:51 -0000 On Sun, 1 May 2011, Rick Macklem wrote: >> UINT64_MAX, etc., are defined in , which doesn't even >> need >> to be included explicitly, since it is (bogusly) standard namespace >> pollution in . This namespace pollution gives the bizarre >> ... > Ok, now I see them (in machine/include/_stdint.h). Appologies for the > noise. I grep'd sys/sys and couldn't find anything called (U)INT64_MAX. > > Now, remembering that sf_abytes is uint64_t per the RFCs, what do people > think of either of these? > > if (sfp->sf_abytes > INT64_MAX) > sbp->f_bavail = INT64_MAX; > else > sbp->f_bavail = sfp->sf_abytes / NFS_FABLKSIZE; You don't need to do anything at runtime, since everything is 64 bits and f_bavail is a block count while sf_abytes is a byte count. 1 bit is lost to the sign bit in f_bavail, but 9 bits are gained by scaling by NFS_FABLKSIZE, leaving 8 bits to spare. Calculating the limit at runtime would give INT64_MAX / NFS_FABSBLKSIZE, or perhaps 1 more than that (to round up instead of down). You might still want to use an out-of-band limit like INT64_MAX for technical reasons, but that risks more bugs (for example, anything converting INT64_MAX / NFS_FABSBLKSIZE + 1 "back" to a byte count would overflow and anything converting INT64_MAX "back" to a byte count would overflow even uint64_t. > Or should I try and do the division to see if the large > value in sf_abytes will fit in INT64_MAX after the division? Something > like: Runtime tests have the advantage of continuing to work if someone changes the types, provided they are robust, but making them robust is too hard here. Robust test's can't simply use INT64_MAX, since INT64_MAX is only the max if the type is int64_t... Bruce From owner-freebsd-fs@FreeBSD.ORG Sun May 1 20:27:25 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1E1DC106566B; Sun, 1 May 2011 20:27:25 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id A28378FC0A; Sun, 1 May 2011 20:27:24 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEALjBvU2DaFvO/2dsb2JhbACEUaJCiHGpBI9JhH+BAQSOeY4+ X-IronPort-AV: E=Sophos;i="4.64,299,1301889600"; d="scan'208";a="120083406" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 01 May 2011 16:27:23 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 953DCB3FB5; Sun, 1 May 2011 16:27:23 -0400 (EDT) Date: Sun, 1 May 2011 16:27:23 -0400 (EDT) From: Rick Macklem To: Bruce Evans Message-ID: <733531363.835298.1304281643548.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20110502035720.F2645@besplex.bde.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_835297_766810430.1304281643545" X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Cc: rmacklem@freebsd.org, kib@freebsd.org, fs@freebsd.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 May 2011 20:27:25 -0000 ------=_Part_835297_766810430.1304281643545 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit > On Sun, 1 May 2011, Rick Macklem wrote: > > >> UINT64_MAX, etc., are defined in , which doesn't even > >> need > >> to be included explicitly, since it is (bogusly) standard namespace > >> pollution in . This namespace pollution gives the > >> bizarre > >> ... > > > Ok, now I see them (in machine/include/_stdint.h). Appologies for > > the > > noise. I grep'd sys/sys and couldn't find anything called > > (U)INT64_MAX. > > > > Now, remembering that sf_abytes is uint64_t per the RFCs, what do > > people > > think of either of these? > > > > if (sfp->sf_abytes > INT64_MAX) > > sbp->f_bavail = INT64_MAX; > > else > > sbp->f_bavail = sfp->sf_abytes / NFS_FABLKSIZE; > > You don't need to do anything at runtime, since everything is 64 bits > and f_bavail is a block count while sf_abytes is a byte count. 1 bit > is lost to the sign bit in f_bavail, but 9 bits are gained by scaling > by NFS_FABLKSIZE, leaving 8 bits to spare. > > Calculating the limit at runtime would give INT64_MAX / > NFS_FABSBLKSIZE, > or perhaps 1 more than that (to round up instead of down). You might > still want to use an out-of-band limit like INT64_MAX for technical > reasons, but that risks more bugs (for example, anything converting > INT64_MAX / NFS_FABSBLKSIZE + 1 "back" to a byte count would overflow > and anything converting INT64_MAX "back" to a byte count would > overflow > even uint64_t. > > > Or should I try and do the division to see if the large > > value in sf_abytes will fit in INT64_MAX after the division? > > Something > > like: > > Runtime tests have the advantage of continuing to work if someone > changes > the types, provided they are robust, but making them robust is too > hard > here. Robust test's can't simply use INT64_MAX, since INT64_MAX is > only > the max if the type is int64_t... > Ok, I realized the code in the last post was pretty bogus:-) My only excuse was that I typed it as I was running out the door... So, I played with it a bit and the attached patch seems to work for i386. For the fields that are uint64_t in struct statfs, it just divides/assigns. For the int64_t field that takes the divided value (f_bavail) I did the division/assignment to a uint64_t tmp and then assigned that to f_bavail. (Since any value that fits in uint64_t is a positive value for int64_t after being divided by 2 or more, it will always be positive.) For the other int64_t one, I just check for "> INT64_MAX" and set it to INT64_MAX for that case, so it doesn't go negative. Anyhow, the updated patch is attached and maybe kib@ can test it? Thanks for the help with this. I realize I got rather confused during the discussion, rick ------=_Part_835297_766810430.1304281643545 Content-Type: text/x-patch; name=statfs.patch Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename=statfs.patch LS0tIGZzL25mc2NsaWVudC9uZnNfY2xwb3J0LmMuc2F2CTIwMTEtMDQtMzAgMjA6MTY6MzkuMDAw MDAwMDAwIC0wNDAwCisrKyBmcy9uZnNjbGllbnQvbmZzX2NscG9ydC5jCTIwMTEtMDUtMDEgMTY6 MTE6MTguMDAwMDAwMDAwIC0wNDAwCkBAIC04MzgsMjAgKzgzOCwxOSBAQCB2b2lkCiBuZnNjbF9s b2Fkc2JpbmZvKHN0cnVjdCBuZnNtb3VudCAqbm1wLCBzdHJ1Y3QgbmZzc3RhdGZzICpzZnAsIHZv aWQgKnN0YXRmcykKIHsKIAlzdHJ1Y3Qgc3RhdGZzICpzYnAgPSAoc3RydWN0IHN0YXRmcyAqKXN0 YXRmczsKLQluZnNxdWFkX3QgdHF1YWQ7CisJdWludDY0X3QgdG1wOwogCiAJaWYgKG5tcC0+bm1f ZmxhZyAmIChORlNNTlRfTkZTVjMgfCBORlNNTlRfTkZTVjQpKSB7CiAJCXNicC0+Zl9ic2l6ZSA9 IE5GU19GQUJMS1NJWkU7Ci0JCXRxdWFkLnF2YWwgPSBzZnAtPnNmX3RieXRlczsKLQkJc2JwLT5m X2Jsb2NrcyA9IChsb25nKSh0cXVhZC5xdmFsIC8gKCh1X3F1YWRfdClORlNfRkFCTEtTSVpFKSk7 Ci0JCXRxdWFkLnF2YWwgPSBzZnAtPnNmX2ZieXRlczsKLQkJc2JwLT5mX2JmcmVlID0gKGxvbmcp KHRxdWFkLnF2YWwgLyAoKHVfcXVhZF90KU5GU19GQUJMS1NJWkUpKTsKLQkJdHF1YWQucXZhbCA9 IHNmcC0+c2ZfYWJ5dGVzOwotCQlzYnAtPmZfYmF2YWlsID0gKGxvbmcpKHRxdWFkLnF2YWwgLyAo KHVfcXVhZF90KU5GU19GQUJMS1NJWkUpKTsKLQkJdHF1YWQucXZhbCA9IHNmcC0+c2ZfdGZpbGVz OwotCQlzYnAtPmZfZmlsZXMgPSAodHF1YWQubHZhbFswXSAmIDB4N2ZmZmZmZmYpOwotCQl0cXVh ZC5xdmFsID0gc2ZwLT5zZl9mZmlsZXM7Ci0JCXNicC0+Zl9mZnJlZSA9ICh0cXVhZC5sdmFsWzBd ICYgMHg3ZmZmZmZmZik7CisJCXNicC0+Zl9ibG9ja3MgPSBzZnAtPnNmX3RieXRlcyAvIE5GU19G QUJMS1NJWkU7CisJCXNicC0+Zl9iZnJlZSA9IHNmcC0+c2ZfZmJ5dGVzIC8gTkZTX0ZBQkxLU0la RTsKKwkJdG1wID0gc2ZwLT5zZl9hYnl0ZXMgLyBORlNfRkFCTEtTSVpFOworCQlzYnAtPmZfYmF2 YWlsID0gdG1wOworCQlzYnAtPmZfZmlsZXMgPSBzZnAtPnNmX3RmaWxlczsKKwkJaWYgKHNmcC0+ c2ZfZmZpbGVzID4gSU5UNjRfTUFYKQorCQkJc2JwLT5mX2ZmcmVlID0gSU5UNjRfTUFYOworCQll bHNlCisJCQlzYnAtPmZfZmZyZWUgPSBzZnAtPnNmX2ZmaWxlczsKIAl9IGVsc2UgaWYgKChubXAt Pm5tX2ZsYWcgJiBORlNNTlRfTkZTVjQpID09IDApIHsKIAkJc2JwLT5mX2JzaXplID0gKGludDMy X3Qpc2ZwLT5zZl9ic2l6ZTsKIAkJc2JwLT5mX2Jsb2NrcyA9IChpbnQzMl90KXNmcC0+c2ZfYmxv Y2tzOwo= ------=_Part_835297_766810430.1304281643545-- From owner-freebsd-fs@FreeBSD.ORG Sun May 1 20:43:30 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 54182106564A; Sun, 1 May 2011 20:43:30 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id E9DE18FC0A; Sun, 1 May 2011 20:43:29 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEADjFvU2DaFvO/2dsb2JhbACEUaJDiHGoe49JgSqDVYEBBI55jj4 X-IronPort-AV: E=Sophos;i="4.64,299,1301889600"; d="scan'208";a="120084270" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 01 May 2011 16:43:29 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 3007BB3F36; Sun, 1 May 2011 16:43:29 -0400 (EDT) Date: Sun, 1 May 2011 16:43:29 -0400 (EDT) From: Rick Macklem To: Bruce Evans Message-ID: <135141673.835577.1304282609097.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20110502022441.H2013@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Cc: rmacklem@freebsd.org, fs@freebsd.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 May 2011 20:43:30 -0000 > On Sun, 1 May 2011, Rick Macklem wrote: > > >> > >> % + sbp->f_blocks = sfp->sf_tbytes / NFS_FABLKSIZE; > >> % + sbp->f_bfree = sfp->sf_fbytes / NFS_FABLKSIZE; > >> % + sbp->f_bavail = sfp->sf_abytes / NFS_FABLKSIZE; > >> > >> The conversion for f_bavail still has sign extension bugs. f_bavail > >> can be negative on the server. A non-broken (FreeBSD) server passes > >> us this negative value as a uint64_t value with the top bit set. > > > > Well, both RFC1813 (NFSv3) and RFC3530 (NFSv4) specify the value on > > the wire (sf_abytes) as uint64_t. Therefore a negative value can't > > be represented safely and non-FreeBSD clients/servers would be > > confused by cheating and putting the negative value on the wire. > > (I see you mention this further down.) > > But it can be represented. FreeBSD servers always put it on the wire > (if the server file system has a negative value) until the old nfs > server broke it. I can only find a few FreeBSD clients that aren't > confused by this: > - most or all clients work for the v2 case, because v2 doesn't need > to scale for f_bavail, and copying the 31st (unsigned) bit to > the 31st (signed) bit mostly works (except everthing breaks once > the absolute values exceed 2**31-1 or 2**32.1). > - most clients are broken for the v3 case. For negative f_bavail, > sign-extension/overflow bugs in the scaling give a value of about > 2**54. Assigning this to a 32-bit f_bavail gives unobvious garbage; > assigning this to a 64-bit f_bavail gives obvious garbage. > - my FreeBSD-~5.2 v3 client handles negative f_bavail correctly (by > scaling a signed value). It doesn't fix f_ffree. (I see negative > f_bavail quite often but never run into the reserve for f_ffree.) > Well my concern isn't w.r.t. FreeBSD clients, but other ones. I'll start a discussion on freebsd-fs@ about whether a FreeBSD server should "cheat" and put negative values (which other clients will think are large positive values) on the wire or try and conform strictly to the RFC. > BTW, how does scaling of block counts by NFS_FABLKSIZE in the v3 (and > v4?) cases work? I can only see it in clients. Servers seem to start > with block counts and never convert to byte counts. > It must be somewhere, since they are uint64_t byte counts on the wire, except for NFSv2, which used block counts of the block size provided in the same response. > > The new server is broken in that it does not > > check for a negative value. It seems that the best approach for the > > server would be to send a 0 when f_bavail < 0. What else can you do > > Hrmph. It is servers that check and send a 0 when f_bavail < 0 that > are broken. > > > without "cheating" and representing the value in a way that would be > > non-interoperable with non-BSD NFS clients? > > I don't know. See the NetBSD client for some ideas. Note that for > blocks > there are 2 "free" fields, f_bfree and f_bavail, while for files there > is > only 1 (f_ffree). You would think that the redundancy for blocks would > allow passing a negative value as the difference of 2 nonnegative > ones, > but I couldn't make this work. For FreeBSD clients that can handle > this, > is it possible to negotiate the handling with the server? Not that I know of. The spec writers got pretty irate when someone suggested that, for NFSv4, there should be a "vendorId", so clients could use that to handle things differently. Their outlook was that everyone should play by the same rules. I'll try and make my Solaris10 box get to -ve frees and then see what it puts on the wire. After that, I'll start a discussion on freebsd-fs@ about how they think a FreeBSD server should behave when f_bavail and/or f_ffree are negative. rick From owner-freebsd-fs@FreeBSD.ORG Sun May 1 20:47:04 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4CA301065672; Sun, 1 May 2011 20:47:04 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id E2DFA8FC16; Sun, 1 May 2011 20:47:03 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEAGTGvU2DaFvO/2dsb2JhbACEUaJDiHGodY9IgSqDVYEBBI55jj4 X-IronPort-AV: E=Sophos;i="4.64,299,1301889600"; d="scan'208";a="119249997" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 01 May 2011 16:47:02 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id F331CB3F4C; Sun, 1 May 2011 16:47:02 -0400 (EDT) Date: Sun, 1 May 2011 16:47:02 -0400 (EDT) From: Rick Macklem To: Bob Friesenhahn Message-ID: <956418604.835643.1304282822974.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Cc: rmacklem@freebsd.org, fs@freebsd.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 May 2011 20:47:04 -0000 > On Sun, 1 May 2011, Rick Macklem wrote: > > > > Or should I try and do the division to see if the large > > value in sf_abytes will fit in INT64_MAX after the division? > > Something > > like: > > int64_t tmp; > > > > tmp = sfp->sf_abytes; > > tmp /= NFS_FABLKSIZE; > > if (tmp < 0) > > sbp->f_bavail = INT64_MAX; > > else > > sbp->f_bavail = tmp; > > That one seems better because it preserves more of the value, but > perhaps this is better because it does not depend on > undocumented/undefined behavior (also untested): > > uint64_t tmp; > tmp = sfp->sf_abytes / NFS_FABLKSIZE; > if (tmp > (uint64_t) INT64_MAX) > sbp->f_bavail = INT64_MAX; > else > sbp->f_bavail = tmp; > That's basically what I went with for the updated patch, except I didn't put in the "if (tmp > (uint64_t) INT64_MAX)" since once you divide sf_abytes by 2 or more it is guaranteed to be less than or equal INT64_MAX. rick From owner-freebsd-fs@FreeBSD.ORG Sun May 1 21:25:12 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DACEC1065672 for ; Sun, 1 May 2011 21:25:12 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 9F8438FC12 for ; Sun, 1 May 2011 21:25:12 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEAJvOvU2DaFvO/2dsb2JhbACEUaJDpFiNAo9IgSqDVYEBBI55hnyHQg X-IronPort-AV: E=Sophos;i="4.64,299,1301889600"; d="scan'208";a="119251971" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 01 May 2011 17:25:11 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id CCD1CB3F2D for ; Sun, 1 May 2011 17:25:11 -0400 (EDT) Date: Sun, 1 May 2011 17:25:11 -0400 (EDT) From: Rick Macklem To: FreeBSD FS Message-ID: <1404795089.836227.1304285111779.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Subject: RFC: NFS server handling of negative f_bavail? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 May 2011 21:25:12 -0000 Hi, I recently discovered that there seems to be an issue w.r.t. the f_bavail and f_ffree fields of "struct statfs" since they are signed values that can be negative. The RFCs for NFSv3 and NFSv3 define these fields as unsigned byte counts when they go on the wire. I read that as implying that negative values can't be represented for them? I tried a quick test on Solaris10, but I couldn't get the fields to go negative (they appear to be unsigned in their "struct statvfs"), so I couldn't find out what it would have done for negative values. I can think of 2 ways to go: 1 - Have the server reply 0 for these fields when VFS_STATFS() passes negative values up. This would seem to conform to the RFCs and seems least likely to confuse non-BSD clients. OR 2 - Put the signed value in the uint64_t on the wire. The risk here is that some clients will assume it's a large positive value. I admit I don't see the client knowing that the value is negative instead of 0 as being a big issue for an NFS client mount and am leaning towards #1, but I'm not familiar with what utilities might care about the value being negative? Anyhow, any comments? rick From owner-freebsd-fs@FreeBSD.ORG Mon May 2 00:47:51 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 437DE106574E for ; Mon, 2 May 2011 00:47:51 +0000 (UTC) (envelope-from ambrosehua@gmail.com) Received: from mail-qy0-f175.google.com (mail-qy0-f175.google.com [209.85.216.175]) by mx1.freebsd.org (Postfix) with ESMTP id D45158FC16 for ; Mon, 2 May 2011 00:47:50 +0000 (UTC) Received: by qyk35 with SMTP id 35so1181268qyk.13 for ; Sun, 01 May 2011 17:47:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=v+7zZ90snEZJdCGWaWVgG4iLnU/cqcGUVWBtVWFe/68=; b=oQmqqfWalaJ1Dw4zRdJ+seo1fKVR+L4wbaiWgWzAa3BnL9Ghz29NtutEpccIY8KK1v ZPTdgyxiwmRqcZePgoaAsB7ROjQLXRe5IQVl218KdNJHTwJEgP/OKkbd4+gQRi6/nZi4 IUJJobmaXTAAq3j2IBcD4PCNDMLsNtOaxQW5g= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=QNrGYsTvSKTQSDDEpiWkr5C0kZ5gtBWXUuJIRL3zpIwLAyqpvEhLkZn0NEgAXZmfrJ GG+soN/qN4DadkUmAJOm/05MPi1O++YSiYQiK8ptYQO3UJDWoDI5Zcas4Rdem4b7aBTt BkzxkX3q+dj7YrDiSuC2bhFfBU7jd2hTPJl/0= MIME-Version: 1.0 Received: by 10.229.77.142 with SMTP id g14mr17835qck.10.1304297269847; Sun, 01 May 2011 17:47:49 -0700 (PDT) Received: by 10.229.18.68 with HTTP; Sun, 1 May 2011 17:47:49 -0700 (PDT) In-Reply-To: <20110501133627.00006616@unknown> References: <20110501133627.00006616@unknown> Date: Mon, 2 May 2011 08:47:49 +0800 Message-ID: From: ambrosehuang ambrose To: Alexander Leidinger Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, dfr@freebsd.org, Emil Smolenski Subject: Re: [ZFS] Booting from zpool created on 4k-sector drive X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 May 2011 00:47:51 -0000 Here is my trick: 1 Download the ZFS V28 patch for 8-stable, 2 patch the 8-stable , 3 make buildkernel, 4 then you will get gptzfsboot, zfsloader, pmbr 5 install pmbr according to wiki/GPTboot 6 replace your old gptzfsboot, zfsloader with new ones; then you can work around this. It works for me( 3 WD10ears + ZFS V15 + 8-stable) 2011/5/1 Alexander Leidinger : > On Tue, 21 Dec 2010 15:29:01 +0100 "Emil Smolenski" > wrote: > >> Hello, >> >> There is a hack to force zpool creation with minimum sector size >> equal to 4k: >> >> # gnop create -S 4096 ${DEV0} >> # zpool create tank ${DEV0}.nop >> # zpool export tank >> # gnop destroy ${DEV0}.nop >> # zpool import tank >> >> Zpool created this way is much faster on problematic 4k sector >> drives which lies about its sector size (like WD EARS). This hack >> works perfectly fine when system is running. Gnop layer is created >> only for "zpool create" command -- ZFS stores information about >> sector size in its metadata. After zpool creation one can export the >> pool, remove gnop layer and reimport the pool. Difference can be seen >> in the output from the zdb command: >> >> - on 512 sector device (2**9 =3D 512): >> % zdb tank |grep ashift >> ashift=3D9 >> >> - on 4096 sector device (2**12 =3D 4096): >> % zdb tank |grep ashift >> ashift=3D12 >> >> This change is permanent. The only possibility to change the value >> of ashift is: zpool destroy/create and restoring pool from backup. >> >> But there is one problem: I cannot boot from such pool. Error message: >> >> ZFS: i/o error - all block copies unavailable >> ZFS: can't read MOS >> ZFS: unexpected object set type 0 > > FYI: I can boot successfully from a ZFS v28 pool which was created like > this in a GPT partition (tested with 9-current). > > Bye, > Alexander. > > -- > http://www.Leidinger.net =A0 =A0Alexander @ Leidinger.net: PGP ID =3D B00= 63FE7 > http://www.FreeBSD.org =A0 =A0 =A0 netchild @ FreeBSD.org =A0: PGP ID =3D= 72077137 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@FreeBSD.ORG Mon May 2 11:06:58 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 226F6106567A for ; Mon, 2 May 2011 11:06:58 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 109898FC1E for ; Mon, 2 May 2011 11:06:58 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p42B6wpP064075 for ; Mon, 2 May 2011 11:06:58 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p42B6v47064073 for freebsd-fs@FreeBSD.org; Mon, 2 May 2011 11:06:57 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 2 May 2011 11:06:57 GMT Message-Id: <201105021106.p42B6v47064073@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 May 2011 11:06:58 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- p kern/156545 fs [ufs] mv could break UFS on SMP systems o kern/156193 fs [ufs] [hang] UFS snapshot hangs && deadlocks processes o kern/156168 fs [nfs] [panic] Kernel panic under concurrent access ove o kern/156039 fs [nullfs] [unionfs] nullfs + unionfs do not compose, re o kern/155615 fs [zfs] zfs v28 broken on sparc64 -current o kern/155587 fs [zfs] [panic] kernel panic with zfs o kern/155484 fs [ufs] GPT + UFS boot don't work well together o kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 o kern/154447 fs [zfs] [panic] Occasional panics - solaris assert somew f kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153847 fs [nfs] [panic] Kernel panic from incorrect m_free in nf o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153520 fs [zfs] Boot from GPT ZFS root on HP BL460c G1 unstable o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small p kern/152488 fs [tmpfs] [patch] mtime of file updated when only inode o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o kern/151845 fs [smbfs] [patch] smbfs should be upgraded to support Un o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/151111 fs [zfs] vnodes leakage during zfs unmount o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/150207 fs zpool(1): zpool import -d /dev tries to open weird dev o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa f kern/149022 fs [hang] File system operations hangs with suspfs state o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o bin/148296 fs [zfs] [loader] [patch] Very slow probe in /usr/src/sys o kern/148204 fs [nfs] UDP NFS causes overload o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147790 fs [zfs] zfs set acl(mode|inherit) fails on existing zfs o kern/147560 fs [zfs] [boot] Booting 8.1-PRERELEASE raidz system take o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an o bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142914 fs [zfs] ZFS performance degradation over time o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140134 fs [msdosfs] write and fsck destroy filesystem integrity o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139597 fs [patch] [tmpfs] tmpfs initializes va_gen but doesn't u o kern/139564 fs [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/133174 fs [msdosfs] [patch] msdosfs must support utf-encoded int o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121366 fs [zfs] [patch] Automatic disk scrubbing from periodic(8 o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha f kern/120991 fs [panic] [ffs] [snapshot] System crashes when manipulat o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117314 fs [ntfs] Long-filename only NTFS fs'es cause kernel pani o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using o kern/116170 fs [panic] Kernel panic when mounting /tmp o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o kern/109024 fs [msdosfs] [iconv] mount_msdosfs: msdosfs_iconv: Operat o kern/109010 fs [msdosfs] can't mv directory within fat32 file system o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/106030 fs [ufs] [panic] panic in ufs from geom when a dead disk o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/51583 fs [nullfs] [patch] allow to work with devices and socket o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o kern/33464 fs [ufs] soft update inconsistencies after system crash o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 222 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon May 2 14:02:39 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 691001065674; Mon, 2 May 2011 14:02:39 +0000 (UTC) (envelope-from jh@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 3FB618FC16; Mon, 2 May 2011 14:02:39 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p42E2do7033580; Mon, 2 May 2011 14:02:39 GMT (envelope-from jh@freefall.freebsd.org) Received: (from jh@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p42E2dhJ033574; Mon, 2 May 2011 14:02:39 GMT (envelope-from jh) Date: Mon, 2 May 2011 14:02:39 GMT Message-Id: <201105021402.p42E2dhJ033574@freefall.freebsd.org> To: michael.reynolds@gmail.com, jh@FreeBSD.org, freebsd-fs@FreeBSD.org From: jh@FreeBSD.org Cc: Subject: Re: kern/116170: [panic] Kernel panic when mounting /tmp X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 May 2011 14:02:39 -0000 Synopsis: [panic] Kernel panic when mounting /tmp State-Changed-From-To: open->feedback State-Changed-By: jh State-Changed-When: Mon May 2 14:02:38 UTC 2011 State-Changed-Why: Can you still reproduce this on a supported release? http://www.freebsd.org/cgi/query-pr.cgi?pr=116170 From owner-freebsd-fs@FreeBSD.ORG Mon May 2 16:09:26 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3A218106566C; Mon, 2 May 2011 16:09:26 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au [211.29.132.184]) by mx1.freebsd.org (Postfix) with ESMTP id B68508FC1B; Mon, 2 May 2011 16:09:24 +0000 (UTC) Received: from c122-106-155-58.carlnfd1.nsw.optusnet.com.au (c122-106-155-58.carlnfd1.nsw.optusnet.com.au [122.106.155.58]) by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p42G9J5P005602 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 3 May 2011 02:09:21 +1000 Date: Tue, 3 May 2011 02:09:19 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Rick Macklem In-Reply-To: <733531363.835298.1304281643548.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: <20110503013724.I2001@besplex.bde.org> References: <733531363.835298.1304281643548.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: rmacklem@FreeBSD.org, kib@FreeBSD.org, fs@FreeBSD.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 May 2011 16:09:26 -0000 On Sun, 1 May 2011, Rick Macklem wrote: > Ok, I realized the code in the last post was pretty bogus:-) My only > excuse was that I typed it as I was running out the door... > > So, I played with it a bit and the attached patch seems to work for > i386. For the fields that are uint64_t in struct statfs, it just > divides/assigns. For the int64_t field that takes the divided value > (f_bavail) I did the division/assignment to a uint64_t tmp and then > assigned that to f_bavail. (Since any value that fits in uint64_t is > a positive value for int64_t after being divided by 2 or more, it will > always be positive.) For the other int64_t one, I just check for "> INT64_MAX" > and set it to INT64_MAX for that case, so it doesn't go negative. Sorry, I don't like this. Going through tmp makes no difference since all values are reduced below INT64_MAX by dividing by just 2. "Negative" values are still converted to garbage positive values. > Anyhow, the updated patch is attached and maybe kib@ can test it? % --- fs/nfsclient/nfs_clport.c.sav 2011-04-30 20:16:39.000000000 -0400 % +++ fs/nfsclient/nfs_clport.c 2011-05-01 16:11:18.000000000 -0400 % @@ -838,20 +838,19 @@ void % nfscl_loadsbinfo(struct nfsmount *nmp, struct nfsstatfs *sfp, void *statfs) % { % struct statfs *sbp = (struct statfs *)statfs; % - nfsquad_t tquad; % + uint64_t tmp; % % if (nmp->nm_flag & (NFSMNT_NFSV3 | NFSMNT_NFSV4)) { % sbp->f_bsize = NFS_FABLKSIZE; % - tquad.qval = sfp->sf_tbytes; % - sbp->f_blocks = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE)); % - tquad.qval = sfp->sf_fbytes; % - sbp->f_bfree = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE)); % - tquad.qval = sfp->sf_abytes; % - sbp->f_bavail = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE)); % - tquad.qval = sfp->sf_tfiles; % - sbp->f_files = (tquad.lval[0] & 0x7fffffff); % - tquad.qval = sfp->sf_ffiles; % - sbp->f_ffree = (tquad.lval[0] & 0x7fffffff); % + sbp->f_blocks = sfp->sf_tbytes / NFS_FABLKSIZE; % + sbp->f_bfree = sfp->sf_fbytes / NFS_FABLKSIZE; OK. % + tmp = sfp->sf_abytes / NFS_FABLKSIZE; % + sbp->f_bavail = tmp; The division made it less than 2**55, and kept it nonnegative. Going through tmp doesn't change this. But I still want to use my code to support negative values: sbp->f_bavail = (int64_t)sfp->sf_abytes / NFS_FABLKSIZE; If the 63rd bit is set, it must mean that the server is an non-broken^non-conforming FreeBSD one trying to send a negative value, since file systems with 2 >= 2**63 bytes available are physical impossible Even if the file system is virtual and growable so that it has no real limits, it should probably limit itself to much less than 2**63 to avoid testing whether clients can handle such large values. % + sbp->f_files = sfp->sf_tfiles; % + if (sfp->sf_ffiles > INT64_MAX) % + sbp->f_ffree = INT64_MAX; % + else % + sbp->f_ffree = sfp->sf_ffiles; This gives correct-as-possible clamping for large unsigned values, but gives a garbage large positive value for "negative" values. Again, negative values are physically impossible, so if the 63rd bit is set then it must mean that the server is a FreeBSD one trying to send a negative value. So I prefer to use my (untested in this case code to support negative values: Sloppy version: just assign and depend on 2's complement magic that isn't guaranteed to be there, and on the type sizes being the same: sbp->f_ffree = sfp->sf_ffiles; More careful version: first make sure that the 2's complement magic is there: sbp->f_ffree = (int64_t)sfp->sf_ffiles; % } else if ((nmp->nm_flag & NFSMNT_NFSV4) == 0) { % sbp->f_bsize = (int32_t)sfp->sf_bsize; % sbp->f_blocks = (int32_t)sfp->sf_blocks; Bruce From owner-freebsd-fs@FreeBSD.ORG Mon May 2 16:46:19 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B8189106566B; Mon, 2 May 2011 16:46:19 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au [211.29.132.185]) by mx1.freebsd.org (Postfix) with ESMTP id 552898FC13; Mon, 2 May 2011 16:46:18 +0000 (UTC) Received: from c122-106-155-58.carlnfd1.nsw.optusnet.com.au (c122-106-155-58.carlnfd1.nsw.optusnet.com.au [122.106.155.58]) by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p42GkGvE021696 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 3 May 2011 02:46:16 +1000 Date: Tue, 3 May 2011 02:46:16 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Rick Macklem In-Reply-To: <135141673.835577.1304282609097.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: <20110503020940.N2001@besplex.bde.org> References: <135141673.835577.1304282609097.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: rmacklem@FreeBSD.org, fs@FreeBSD.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 May 2011 16:46:19 -0000 On Sun, 1 May 2011, Rick Macklem wrote: >>[negative f_bavail and f_ffree] > Well my concern isn't w.r.t. FreeBSD clients, but other ones. I'll > start a discussion on freebsd-fs@ about whether a FreeBSD server > should "cheat" and put negative values (which other clients will > think are large positive values) on the wire or try and conform > strictly to the RFC. > >> BTW, how does scaling of block counts by NFS_FABLKSIZE in the v3 (and >> v4?) cases work? I can only see it in clients. Servers seem to start >> with block counts and never convert to byte counts. > > It must be somewhere, since they are uint64_t byte counts on the wire, > except for NFSv2, which used block counts of the block size provided > in the same response. I'm not sure how I missed this. The multiplications are there. They have the usual (potential) overflow bugs and the usual (actual) sign extension bugs. See below. I now see how to fix most of the overflow problems for the v2 case very easily by scaling on the server, so that clients see only small values. This should be portable. Here is the new nfs server code for this: if (nd->nd_flag & ND_NFSV2) { NFSM_BUILD(tl, u_int32_t *, NFSX_V2STATFS); *tl++ = txdr_unsigned(NFS_V2MAXDATA); *tl++ = txdr_unsigned(sf->f_bsize); *tl++ = txdr_unsigned(sf->f_blocks); *tl++ = txdr_unsigned(sf->f_bfree); *tl = txdr_unsigned(sf->f_bavail); This just reads server fs values from struct statfs, blindly truncates them, and puts them on the wire. With just 1 more line -- a call to statfs_scale_blocks(sf, UINT32_MAX), it can adjust sf so that all the values fit on the wire. Or more safely, it can get values that fit in the 32-bit longs on old FreeBSD clients and in the bogusly-cast-to-int32_t values in current FreeBSD clients by calling statfs_scale_blocks(sf, INT32_MAX). This should be portable. The adjustments may scale the block size from 16384 to a very large value, but clients should already be able to handle the "any" value. Already, the v2 block size value is rarely NFS_FABLKSIZE and rarely what it used to be since it is under server control. It used to be usually 4096 for ffs, but it is now usually 16384 for ffs, and can easily be 65536 for ffs. Above 65536 there might be more problems but 65536 works up to 128 TB with an int32_t max (2**31-1 blocks times 2**16 bsize = 2**47 - 2**16). Even ffs's default block size of 16K works up to 32 TB. File systems of size >= 32TB are still rare and are more rarely used with v2, so perhaps the overflows haven't occurred for anyone yet. } else { NFSM_BUILD(tl, u_int32_t *, NFSX_V3STATFS); tval = (u_quad_t)sf->f_blocks; tval *= (u_quad_t)sf->f_bsize; This of course does the inverse of the scaling done by the client. This could be written as: tval = sf->f_blocks * sf->f_bsize; The casts have no effect, since eveything is already 64 bits. You may as well assume this, since you have to assume lots about the types and values for this code to work at all. For example, suppose sf->f_blocks >= 2**63 (so that full 64-bitness including no space for a sign bit is actually needed for a block count). Then multiplying by a 64-bit sf->f_blocks may overflow, and you need to do a uint128_t multiplication to avoid overflow. This is difficult since uint128_t ins not supported in C on any arch in FreeBSD. Then the 128-bit values won't fit on the wire, and the need to be scaled as in the v2 case. But the v3 case doesn't seem to pass f_bsize, so it can't do this scaling and would need to clamp. txdr_hyper(tval, tl); tl += 2; tval = (u_quad_t)sf->f_bfree; tval *= (u_quad_t)sf->f_bsize; txdr_hyper(tval, tl); tl += 2; tval = (u_quad_t)sf->f_bavail; tval *= (u_quad_t)sf->f_bsize; The type errors are more serious for this signed field. Suppose sf->f_bsize is -1. Then tval is initially 0xFFFFFFFFFFFFFFFF. Suppose sf->f_bsize is 16K. Then the multiplication overflows. IIRC, the the result is implementation-defined (not undefined for unsigned's) and is normally (uint64_t)-16K = 0xFFFFFFFFFFFFC000. This is the right value for passing -16K as a large unsigned value. Careful code would generate this value without using an overflowing multiplication. txdr_hyper(tval, tl); tl += 2; tval = (u_quad_t)sf->f_files; txdr_hyper(tval, tl); tl += 2; tval = (u_quad_t)sf->f_ffree; txdr_hyper(tval, tl); tl += 2; tval = (u_quad_t)sf->f_ffree; txdr_hyper(tval, tl); tl += 2; *tl = 0; } > I'll try and make my Solaris10 box get to -ve frees and then see what > it puts on the wire. After that, I'll start a discussion on freebsd-fs@ > about how they think a FreeBSD server should behave when f_bavail and/or > f_ffree are negative. The result on Solaris would be interesting. Does Solaris still support ffs? You said later that you couldn't get it to generate negative values. Bruce From owner-freebsd-fs@FreeBSD.ORG Mon May 2 19:15:19 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4B8D01065673; Mon, 2 May 2011 19:15:19 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id EC2558FC12; Mon, 2 May 2011 19:15:18 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEALABv02DaFvO/2dsb2JhbACEUaIyiHGoF5A6gSqDVYEBBI55jj4 X-IronPort-AV: E=Sophos;i="4.64,303,1301889600"; d="scan'208";a="119343900" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 02 May 2011 15:15:18 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 06A5EB3F7E; Mon, 2 May 2011 15:15:18 -0400 (EDT) Date: Mon, 2 May 2011 15:15:18 -0400 (EDT) From: Rick Macklem To: Bruce Evans Message-ID: <433279102.889960.1304363717963.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20110503020940.N2001@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Cc: rmacklem@FreeBSD.org, fs@FreeBSD.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 May 2011 19:15:19 -0000 > > > I'll try and make my Solaris10 box get to -ve frees and then see > > what > > it puts on the wire. After that, I'll start a discussion on > > freebsd-fs@ > > about how they think a FreeBSD server should behave when f_bavail > > and/or > > f_ffree are negative. > > The result on Solaris would be interesting. Does Solaris still support > ffs? You said later that you couldn't get it to generate negative > values. > It has some variation of FFS with logging, which is what I use. Writing a file as root fails with "no space" when "df" reports about 7000blocks free. (I have no idea why it stops at around 7000. Something to do with the log, maybe?) Anyhow, it doesn't report negative values and all the fields in what they call "struct statfvs" are unsigned numbers, including bavail. rick From owner-freebsd-fs@FreeBSD.ORG Mon May 2 19:43:53 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6FF70106566B; Mon, 2 May 2011 19:43:53 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 04B018FC0A; Mon, 2 May 2011 19:43:52 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEAKgIv02DaFvO/2dsb2JhbACEUaIziHGoNZA/gSqDVYEBBI55jj4 X-IronPort-AV: E=Sophos;i="4.64,304,1301889600"; d="scan'208";a="119347033" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 02 May 2011 15:43:52 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 4329FB3E95; Mon, 2 May 2011 15:43:52 -0400 (EDT) Date: Mon, 2 May 2011 15:43:52 -0400 (EDT) From: Rick Macklem To: Bruce Evans Message-ID: <413547662.892660.1304365432183.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20110503013724.I2001@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Cc: rmacklem@FreeBSD.org, kib@FreeBSD.org, fs@FreeBSD.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 May 2011 19:43:53 -0000 > On Sun, 1 May 2011, Rick Macklem wrote: > > > Ok, I realized the code in the last post was pretty bogus:-) My only > > excuse was that I typed it as I was running out the door... > > > > So, I played with it a bit and the attached patch seems to work for > > i386. For the fields that are uint64_t in struct statfs, it just > > divides/assigns. For the int64_t field that takes the divided value > > (f_bavail) I did the division/assignment to a uint64_t tmp and then > > assigned that to f_bavail. (Since any value that fits in uint64_t is > > a positive value for int64_t after being divided by 2 or more, it > > will > > always be positive.) For the other int64_t one, I just check for "> > > INT64_MAX" > > and set it to INT64_MAX for that case, so it doesn't go negative. > > Sorry, I don't like this. Going through tmp makes no difference since > all values are reduced below INT64_MAX by dividing by just 2. > "Negative" > values are still converted to garbage positive values. > > > Anyhow, the updated patch is attached and maybe kib@ can test it? > > % --- fs/nfsclient/nfs_clport.c.sav 2011-04-30 20:16:39.000000000 > -0400 > % +++ fs/nfsclient/nfs_clport.c 2011-05-01 16:11:18.000000000 -0400 > % @@ -838,20 +838,19 @@ void > % nfscl_loadsbinfo(struct nfsmount *nmp, struct nfsstatfs *sfp, void > *statfs) > % { > % struct statfs *sbp = (struct statfs *)statfs; > % - nfsquad_t tquad; > % + uint64_t tmp; > % > % if (nmp->nm_flag & (NFSMNT_NFSV3 | NFSMNT_NFSV4)) { > % sbp->f_bsize = NFS_FABLKSIZE; > % - tquad.qval = sfp->sf_tbytes; > % - sbp->f_blocks = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE)); > % - tquad.qval = sfp->sf_fbytes; > % - sbp->f_bfree = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE)); > % - tquad.qval = sfp->sf_abytes; > % - sbp->f_bavail = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE)); > % - tquad.qval = sfp->sf_tfiles; > % - sbp->f_files = (tquad.lval[0] & 0x7fffffff); > % - tquad.qval = sfp->sf_ffiles; > % - sbp->f_ffree = (tquad.lval[0] & 0x7fffffff); > % + sbp->f_blocks = sfp->sf_tbytes / NFS_FABLKSIZE; > % + sbp->f_bfree = sfp->sf_fbytes / NFS_FABLKSIZE; > > OK. > > % + tmp = sfp->sf_abytes / NFS_FABLKSIZE; > % + sbp->f_bavail = tmp; > > The division made it less than 2**55, and kept it nonnegative. Going > through > tmp doesn't change this. > Agreed. The "tmp" was left over from when I had "if (tmp > INT64_MAX)", which I realized I didn't need. I can just change it to: sbp->f_bavail = sfp->sf_abytes / NFS_FABLKSIZE; > But I still want to use my code to support negative values: > > sbp->f_bavail = (int64_t)sfp->sf_abytes / NFS_FABLKSIZE; > > If the 63rd bit is set, it must mean that the server is an > non-broken^non-conforming FreeBSD one trying to send a negative value, > since file systems with 2 >= 2**63 bytes available are physical > impossible > Even if the file system is virtual and growable so that it has no real > limits, it should probably limit itself to much less than 2**63 to > avoid > testing whether clients can handle such large values. > Well, since the RFCs don't say that, I think it shouldn't be assumed. (I could assume that having the 63rd but set just means a server doesn't know the exact answer and chooses to say "lots are free", but the truth is, neither of us know.) I've asked the question over on freebsd-fs@ and I'll wait to see what everyone thinks w.r.t. RFC conformance vs hiding negative values in the fields. If the collective agrees with you, I don't mind the code assuming that 63rd bit set means negative. > % + sbp->f_files = sfp->sf_tfiles; > % + if (sfp->sf_ffiles > INT64_MAX) > % + sbp->f_ffree = INT64_MAX; > % + else > % + sbp->f_ffree = sfp->sf_ffiles; > > This gives correct-as-possible clamping for large unsigned values, but > gives a garbage large positive value for "negative" values. Again, > negative > values are physically impossible, so if the 63rd bit is set then it > must > mean that the server is a FreeBSD one trying to send a negative value. > So I prefer to use my (untested in this case code to support negative > values: > same as above, since the RFC says they're unsigned, I think that's what the client should assume. rick From owner-freebsd-fs@FreeBSD.ORG Mon May 2 20:47:15 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AC269106566C; Mon, 2 May 2011 20:47:15 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 543468FC18; Mon, 2 May 2011 20:47:15 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEABEYv02DaFvO/2dsb2JhbACEUaIziHGpWZBHgSqDVYEBBI55jj4 X-IronPort-AV: E=Sophos;i="4.64,304,1301889600"; d="scan'208";a="119354736" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 02 May 2011 16:47:05 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id A24A7B3F54; Mon, 2 May 2011 16:47:05 -0400 (EDT) Date: Mon, 2 May 2011 16:47:05 -0400 (EDT) From: Rick Macklem To: Bruce Evans Message-ID: <1040257715.898126.1304369225601.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20110503020940.N2001@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: rmacklem@FreeBSD.org, fs@FreeBSD.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 May 2011 20:47:15 -0000 > > > I'll try and make my Solaris10 box get to -ve frees and then see > > what > > it puts on the wire. After that, I'll start a discussion on > > freebsd-fs@ > > about how they think a FreeBSD server should behave when f_bavail > > and/or > > f_ffree are negative. > > The result on Solaris would be interesting. Does Solaris still support > ffs? You said later that you couldn't get it to generate negative > values. > Well, I just did the reverse (ran a FreeBSD FFS disk out of space so it reported a -ve free and mounted in on Solaris10). Here are the "df" outputs (I used "df -k" on Solaris, since that's a compatible format): FreeBSD-current server (nfsv4-newlap): Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad4s3a 2026030 671492 1192456 36% / devfs 1 1 0 100% /dev /dev/ad4s3e 4697030 4544054 -222786 105% /sub1 /dev/ad4s3d 5077038 641462 4029414 14% /usr Solaris10 client: Filesystem kbytes used avail capacity Mounted on /dev/dsk/c0d0s0 3870110 2790938 1040471 73% / /devices 0 0 0 0% /devices ctfs 0 0 0 0% /system/contract proc 0 0 0 0% /proc mnttab 0 0 0 0% /etc/mnttab swap 975736 624 975112 1% /etc/svc/volatile objfs 0 0 0 0% /system/object /usr/lib/libc/libc_hwcap1.so.1 3870110 2790938 1040471 73% /lib/libc.so.1 fd 0 0 0 0% /dev/fd swap 975112 0 975112 0% /tmp swap 975140 28 975112 1% /var/run /dev/dsk/c0d0s7 5608190 4118091 1434018 75% /export/home nfsv4-newlap:/sub1 4697030 4544054 18014398509259198 1% /mnt as you can see, Solaris10 doesn't assume it's negative and reports lottsa avail. I don't have a Linux client handy, so I can't do the same test with Linux, rick From owner-freebsd-fs@FreeBSD.ORG Mon May 2 20:58:16 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 74B41106566B for ; Mon, 2 May 2011 20:58:16 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 34DF38FC12 for ; Mon, 2 May 2011 20:58:15 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AiwHAEQav02DaFvO/2dsb2JhbACEUZNxjkKlW40CkEeBKoNVgQEEjnmGfIdC X-IronPort-AV: E=Sophos;i="4.64,304,1301889600"; d="scan'208";a="119356546" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 02 May 2011 16:58:15 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 4D324B3F54 for ; Mon, 2 May 2011 16:58:15 -0400 (EDT) Date: Mon, 2 May 2011 16:58:15 -0400 (EDT) From: Rick Macklem To: freebsd-fs@freebsd.org Message-ID: <924130649.898737.1304369895239.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Subject: Re: RFC: NFS server handling of negative f_bavail? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 May 2011 20:58:16 -0000 I just ran a little test where I ran an FFS volume on a FreeBSD-current server out of space so that it showed negative avail and then mounted it on Solaris10. Here are the dfs for the server and client. FreeBSD server (nfsv4-newlap): Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad4s3a 2026030 671492 1192456 36% / devfs 1 1 0 100% /dev /dev/ad4s3e 4697030 4544054 -222786 105% /sub1 /dev/ad4s3d 5077038 641462 4029414 14% /usr and for the Solaris10 client: Filesystem kbytes used avail capacity Mounted on /dev/dsk/c0d0s0 3870110 2790938 1040471 73% / /devices 0 0 0 0% /devices ctfs 0 0 0 0% /system/contract proc 0 0 0 0% /proc mnttab 0 0 0 0% /etc/mnttab swap 975736 624 975112 1% /etc/svc/volatile objfs 0 0 0 0% /system/object /usr/lib/libc/libc_hwcap1.so.1 3870110 2790938 1040471 73% /lib/libc.so.1 fd 0 0 0 0% /dev/fd swap 975112 0 975112 0% /tmp swap 975140 28 975112 1% /var/run /dev/dsk/c0d0s7 5608190 4118091 1434018 75% /export/home nfsv4-newlap:/sub1 4697030 4544054 18014398509259198 1% /mnt You can see that the Solaris10 client thinks there is lottsa avail. I think sending the field as 0 over the wire would provide better interoperability. rick From owner-freebsd-fs@FreeBSD.ORG Mon May 2 22:51:52 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 109CE106566B for ; Mon, 2 May 2011 22:51:52 +0000 (UTC) (envelope-from jan.koum@gmail.com) Received: from mail-qy0-f175.google.com (mail-qy0-f175.google.com [209.85.216.175]) by mx1.freebsd.org (Postfix) with ESMTP id C22228FC08 for ; Mon, 2 May 2011 22:51:51 +0000 (UTC) Received: by qyk35 with SMTP id 35so1765337qyk.13 for ; Mon, 02 May 2011 15:51:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:date:x-google-sender-auth :message-id:subject:from:to:cc:content-type; bh=hMW2o1DB+Dg/hvA3CU7AvAk0y3vnmXZSGPg9JWe5YBc=; b=C5w60uORY92tiZH9fr+ouhTjACvTPzAId6sVPMSVf60ENyS+p0yB4x1EHPiIQ+SMY5 UX/DxoUv9fkANlpLvyA+GW7ozJfVr/E1v+sodY7lUse5BOgT0YDdFtdRVaRsSr4xLX4J Z1uePKWAjElzLqpyKrT1ca1uoBgnUoF7D4Mlo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type; b=YhjVOpbaKfp6FYC/Q4flbgjVzOYcqALeLuIDjxyue02ZhTMBQSIDOKGjQF3rSYZLck K4KMOry6pqlPWB+SRWGZMFYMHqGn1q8xo8aMYOosWEZ95pV5Siv2cr9ctyHoZjCACy4L cl4jScs2SC4bye2kuOl8nO/+JtIKpwGMiFtVw= MIME-Version: 1.0 Received: by 10.224.28.133 with SMTP id m5mr6781069qac.281.1304375303875; Mon, 02 May 2011 15:28:23 -0700 (PDT) Sender: jan.koum@gmail.com Received: by 10.229.88.73 with HTTP; Mon, 2 May 2011 15:28:23 -0700 (PDT) Date: Mon, 2 May 2011 15:28:23 -0700 X-Google-Sender-Auth: 43yF5vdY7S7ZqdHDMO8smWtaXvs Message-ID: From: Jan Koum To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Chris Peiffer Subject: very strange IO issue with FreeBSD 8 and SSD X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 May 2011 22:51:52 -0000 hello, we are seeing some strange activity on our FreeBSD systems running 8.2-PRERELEASE snapshot from early december our system has 4 Intel SSD drives (64GB each) connected directly into motherboard through AHCI: ad4: setting UDMA100 ad4: 61057MB at ata2-master UDMA100 SATA 3Gb/s ad4: 125045424 sectors [124053C/16H/63S] 16 sectors/interrupt 1 depth queue [...] ad7: setting UDMA100 ad7: 61057MB at ata3-slave UDMA100 SATA 3Gb/s ad7: 125045424 sectors [124053C/16H/63S] 16 sectors/interrupt 1 depth queue $ df -h Filesystem Size Used Avail Capacity Mounted on /dev/ad4s1a 57G 24G 29G 45% / /dev/ad5a 58G 17G 36G 32% /d2 /dev/ad7a 58G 17G 36G 32% /d4 /dev/ad6a 58G 17G 36G 32% /d3 so far - so good, right? this is where things get very bizarre: our application receives data from network and writes to disk. on average the file size grows to about 7Kbytes while an average file append is 300-400 bytes. netstat shows about 700-800Kbytes of input and our application log shows we write about 500Kbytes each second. however, when i run iostat i we see upwards of 10MB a second written to disk (if not more). for example: $ iostat -KC -x 1 extended device statistics cpu device r/s w/s kr/s kw/s wait svc_t %b us ni sy in id ad4 9.0 423.3 45.2 4410.1 0 84.3 11 5 0 5 1 89 ad5 9.0 420.7 44.9 4237.4 0 82.3 11 ad6 9.0 420.6 45.1 4254.4 0 81.1 11 ad7 9.0 420.3 44.9 4225.7 0 83.8 11 extended device statistics cpu device r/s w/s kr/s kw/s wait svc_t %b us ni sy in id ad4 14.9 157.9 79.5 1108.4 0 31.7 18 8 0 5 1 86 ad5 15.9 1480.8 63.6 18886.1 0 36.4 19 ad6 20.9 154.9 93.4 1032.9 0 7.4 4 ad7 19.9 216.5 63.6 1450.0 0 9.2 4 extended device statistics cpu device r/s w/s kr/s kw/s wait svc_t %b us ni sy in id ad4 20.9 169.2 115.4 1271.7 0 39.3 13 9 0 4 1 85 ad5 21.9 1179.1 129.4 11598.1 0 34.6 14 ad6 14.9 140.3 39.8 925.4 0 9.4 3 ad7 15.9 213.9 33.8 1610.0 0 7.9 3 extended device statistics cpu device r/s w/s kr/s kw/s wait svc_t %b us ni sy in id ad4 15.9 403.6 53.7 3208.6 0 30.0 10 8 0 6 1 85 ad5 16.9 709.7 47.7 4691.6 0 20.2 9 ad6 23.9 321.1 97.4 2262.3 0 12.9 7 ad7 14.9 421.4 51.7 3437.2 0 13.3 7 (apologies in advance for bad formatting) so, here are we are, looking at iostat output and trying to figure out how it can be this bad and where the discrepancy is coming from. a few things to get out of the way: no, we do not have TRIM enabled yet, we would need to upgrade OS for that, but we don't think TRIM would make such a big different. also we know that we can newfs with -b 512 -f 4096 but again, we also dont think that it would account for such a large IO discrepancy. any thoughts to what this could be? has anybody seen anything similar before? 10MB of metadata for 500K worth of disk writes? that can't be.... right? From owner-freebsd-fs@FreeBSD.ORG Mon May 2 23:36:06 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4EF611065670 for ; Mon, 2 May 2011 23:36:06 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta02.westchester.pa.mail.comcast.net (qmta02.westchester.pa.mail.comcast.net [76.96.62.24]) by mx1.freebsd.org (Postfix) with ESMTP id EF1DA8FC16 for ; Mon, 2 May 2011 23:36:05 +0000 (UTC) Received: from omta16.westchester.pa.mail.comcast.net ([76.96.62.88]) by qmta02.westchester.pa.mail.comcast.net with comcast id en811g00A1uE5Es52nc6Vl; Mon, 02 May 2011 23:36:06 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta16.westchester.pa.mail.comcast.net with comcast id enc31g00U1t3BNj3cnc4SD; Mon, 02 May 2011 23:36:05 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id B6A159B418; Mon, 2 May 2011 16:36:01 -0700 (PDT) Date: Mon, 2 May 2011 16:36:01 -0700 From: Jeremy Chadwick To: Jan Koum Message-ID: <20110502233601.GA29710@icarus.home.lan> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, Chris Peiffer Subject: Re: very strange IO issue with FreeBSD 8 and SSD X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 May 2011 23:36:06 -0000 On Mon, May 02, 2011 at 03:28:23PM -0700, Jan Koum wrote: > hello, > > we are seeing some strange activity on our FreeBSD systems running > 8.2-PRERELEASE snapshot from early december > > our system has 4 Intel SSD drives (64GB each) connected directly into > motherboard through AHCI: > > ad4: setting UDMA100 > ad4: 61057MB at ata2-master UDMA100 SATA > 3Gb/s > ad4: 125045424 sectors [124053C/16H/63S] 16 sectors/interrupt 1 depth queue > [...] > ad7: setting UDMA100 > ad7: 61057MB at ata3-slave UDMA100 SATA > 3Gb/s > ad7: 125045424 sectors [124053C/16H/63S] 16 sectors/interrupt 1 depth queue > > $ df -h > Filesystem Size Used Avail Capacity Mounted on > /dev/ad4s1a 57G 24G 29G 45% / > /dev/ad5a 58G 17G 36G 32% /d2 > /dev/ad7a 58G 17G 36G 32% /d4 > /dev/ad6a 58G 17G 36G 32% /d3 > > so far - so good, right? this is where things get very bizarre: our > application receives data from network and writes to disk. on average the > file size grows to about 7Kbytes while an average file append is 300-400 > bytes. > > netstat shows about 700-800Kbytes of input and our application log shows we > write about 500Kbytes each second. however, when i run iostat i we see > upwards of 10MB a second written to disk (if not more). for example: > > $ iostat -KC -x 1 > extended device statistics cpu > device r/s w/s kr/s kw/s wait svc_t %b us ni sy in id > ad4 9.0 423.3 45.2 4410.1 0 84.3 11 5 0 5 1 89 > ad5 9.0 420.7 44.9 4237.4 0 82.3 11 > ad6 9.0 420.6 45.1 4254.4 0 81.1 11 > ad7 9.0 420.3 44.9 4225.7 0 83.8 11 > extended device statistics cpu > device r/s w/s kr/s kw/s wait svc_t %b us ni sy in id > ad4 14.9 157.9 79.5 1108.4 0 31.7 18 8 0 5 1 86 > ad5 15.9 1480.8 63.6 18886.1 0 36.4 19 > ad6 20.9 154.9 93.4 1032.9 0 7.4 4 > ad7 19.9 216.5 63.6 1450.0 0 9.2 4 > extended device statistics cpu > device r/s w/s kr/s kw/s wait svc_t %b us ni sy in id > ad4 20.9 169.2 115.4 1271.7 0 39.3 13 9 0 4 1 85 > ad5 21.9 1179.1 129.4 11598.1 0 34.6 14 > ad6 14.9 140.3 39.8 925.4 0 9.4 3 > ad7 15.9 213.9 33.8 1610.0 0 7.9 3 > extended device statistics cpu > device r/s w/s kr/s kw/s wait svc_t %b us ni sy in id > ad4 15.9 403.6 53.7 3208.6 0 30.0 10 8 0 6 1 85 > ad5 16.9 709.7 47.7 4691.6 0 20.2 9 > ad6 23.9 321.1 97.4 2262.3 0 12.9 7 > ad7 14.9 421.4 51.7 3437.2 0 13.3 7 > > (apologies in advance for bad formatting) > > so, here are we are, looking at iostat output and trying to figure out how > it can be this bad and where the discrepancy is coming from. a few things > to get out of the way: no, we do not have TRIM enabled yet, we would need to > upgrade OS for that, but we don't think TRIM would make such a big > different. also we know that we can newfs with -b 512 -f 4096 but again, we > also dont think that it would account for such a large IO discrepancy. > > any thoughts to what this could be? has anybody seen anything similar > before? 10MB of metadata for 500K worth of disk writes? that can't be.... > right? I would recommend trying ahci.ko instead of ataahci.ko. Your device names will change (ad4 --> ada0, ad5 --> ada1, etc.). Just add ahci_load="yes" to /boot/loader.conf and reboot into single-user, fix /etc/fstab and related configuration files, and that's all you should have to do. We use Intel SSDs (X25-M 80GB) in our servers, also backed by UFS2 with softupdates. Controllers are Intel ICH7R (in AHCI mode) and Intel ICH9R (also in AHCI mode). We *did not* apply any 4K alignment when making the partitions. We use ahci.ko. I haven't tested write speeds and all that, but the disks work fine. You might also try comparing iostat output to gstat output, though gstat refreshes the screen continually making this a little difficult. I would recommend "gstat -I500ms -f '^ad[0-9]$' and watch closely. Change the regex, of course, if you switch to ahci.ko. If you want to compare benchmarks, I need to know exactly what to do to reproduce the issue you're stating. I would prefer the traffic not come off the network (e.g. use dd or bonnie++ or something) to rule out problems there. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Mon May 2 23:57:18 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1DBDD1065670 for ; Mon, 2 May 2011 23:57:18 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id CA05F8FC1C for ; Mon, 2 May 2011 23:57:17 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEAHBEv02DaFvO/2dsb2JhbACEUaI2tCiQWIR/gQEEjnmOPg X-IronPort-AV: E=Sophos;i="4.64,306,1301889600"; d="scan'208";a="119370570" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 02 May 2011 19:57:16 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id B78B7B3F2D; Mon, 2 May 2011 19:57:16 -0400 (EDT) Date: Mon, 2 May 2011 19:57:16 -0400 (EDT) From: Rick Macklem To: Kostik Belousov Message-ID: <2119325179.903923.1304380636687.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20110503020940.N2001@besplex.bde.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_903922_2059190712.1304380636685" X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Cc: fs@FreeBSD.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 May 2011 23:57:18 -0000 ------=_Part_903922_2059190712.1304380636685 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Hi, I have attached a version of the patch that I intend to commit unless it doesn't work for Kostik's test case. Kostik, could you please test this one. Yes, Bruce, I realize you won't like it, but I have put some comments in it to try and clarify why it is coded the way it is. (The arithmetic seems to work the way I would expect it to for i386, which is the only arch I have for testing.) If the "collective concensus" is to "cheat" and put the negative values in the uint64_t on the wire, then I can commit a change to handle that later. If anyone has input w.r.t. this, please post it under the Subject heading "NFS server handling of negative f_bavail?" on freebsd-fs@freebsd.org. I basically need to move onto other issues, rick ------=_Part_903922_2059190712.1304380636685 Content-Type: text/x-patch; name=statfs.patch Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename=statfs.patch LS0tIGZzL25mc2NsaWVudC9uZnNfY2xwb3J0LmMuc2F2CTIwMTEtMDQtMzAgMjA6MTY6MzkuMDAw MDAwMDAwIC0wNDAwCisrKyBmcy9uZnNjbGllbnQvbmZzX2NscG9ydC5jCTIwMTEtMDUtMDIgMTk6 MzI6MzEuMDAwMDAwMDAwIC0wNDAwCkBAIC04MzgsMjEgKzgzOCwzMyBAQCB2b2lkCiBuZnNjbF9s b2Fkc2JpbmZvKHN0cnVjdCBuZnNtb3VudCAqbm1wLCBzdHJ1Y3QgbmZzc3RhdGZzICpzZnAsIHZv aWQgKnN0YXRmcykKIHsKIAlzdHJ1Y3Qgc3RhdGZzICpzYnAgPSAoc3RydWN0IHN0YXRmcyAqKXN0 YXRmczsKLQluZnNxdWFkX3QgdHF1YWQ7CiAKIAlpZiAobm1wLT5ubV9mbGFnICYgKE5GU01OVF9O RlNWMyB8IE5GU01OVF9ORlNWNCkpIHsKIAkJc2JwLT5mX2JzaXplID0gTkZTX0ZBQkxLU0laRTsK LQkJdHF1YWQucXZhbCA9IHNmcC0+c2ZfdGJ5dGVzOwotCQlzYnAtPmZfYmxvY2tzID0gKGxvbmcp KHRxdWFkLnF2YWwgLyAoKHVfcXVhZF90KU5GU19GQUJMS1NJWkUpKTsKLQkJdHF1YWQucXZhbCA9 IHNmcC0+c2ZfZmJ5dGVzOwotCQlzYnAtPmZfYmZyZWUgPSAobG9uZykodHF1YWQucXZhbCAvICgo dV9xdWFkX3QpTkZTX0ZBQkxLU0laRSkpOwotCQl0cXVhZC5xdmFsID0gc2ZwLT5zZl9hYnl0ZXM7 Ci0JCXNicC0+Zl9iYXZhaWwgPSAobG9uZykodHF1YWQucXZhbCAvICgodV9xdWFkX3QpTkZTX0ZB QkxLU0laRSkpOwotCQl0cXVhZC5xdmFsID0gc2ZwLT5zZl90ZmlsZXM7Ci0JCXNicC0+Zl9maWxl cyA9ICh0cXVhZC5sdmFsWzBdICYgMHg3ZmZmZmZmZik7Ci0JCXRxdWFkLnF2YWwgPSBzZnAtPnNm X2ZmaWxlczsKLQkJc2JwLT5mX2ZmcmVlID0gKHRxdWFkLmx2YWxbMF0gJiAweDdmZmZmZmZmKTsK KwkJc2JwLT5mX2Jsb2NrcyA9IHNmcC0+c2ZfdGJ5dGVzIC8gTkZTX0ZBQkxLU0laRTsKKwkJc2Jw LT5mX2JmcmVlID0gc2ZwLT5zZl9mYnl0ZXMgLyBORlNfRkFCTEtTSVpFOworCQkvKgorCQkgKiBB bHRob3VnaCBzZl9hYnl0ZXMgaXMgdWludDY0X3QgYW5kIGZfYmF2YWlsIGlzIGludDY0X3QsCisJ CSAqIHRoZSB2YWx1ZSBhZnRlciBkaXZpZGluZyBieSBORlNfRkFCTEtTSVpFIGlzIHNtYWxsCisJ CSAqIGVub3VnaCB0aGF0IGl0IHdpbGwgZml0IGluIDYzYml0cywgc28gaXQgaXMgb2sgdG8KKwkJ ICogYXNzaWduIGl0IHRvIGZfYmF2YWlsIHdpdGhvdXQgZmVhciB0aGF0IGl0IHdpbGwgYmVjb21l CisJCSAqIG5lZ2F0aXZlLgorCQkgKi8KKwkJc2JwLT5mX2JhdmFpbCA9IHNmcC0+c2ZfYWJ5dGVz IC8gTkZTX0ZBQkxLU0laRTsKKwkJc2JwLT5mX2ZpbGVzID0gc2ZwLT5zZl90ZmlsZXM7CisJCS8q IFNpbmNlIGZfZmZyZWUgaXMgaW50NjRfdCwgY2xpcCBpdCB0byA2M2JpdHMuICovCisJCWlmIChz ZnAtPnNmX2ZmaWxlcyA+ICh1aW50NjRfdClJTlQ2NF9NQVgpCisJCQlzYnAtPmZfZmZyZWUgPSBJ TlQ2NF9NQVg7CisJCWVsc2UKKwkJCXNicC0+Zl9mZnJlZSA9IHNmcC0+c2ZfZmZpbGVzOwogCX0g ZWxzZSBpZiAoKG5tcC0+bm1fZmxhZyAmIE5GU01OVF9ORlNWNCkgPT0gMCkgeworCQkvKgorCQkg KiBUaGUgdHlwZSBjYXN0cyB0byAoaW50MzJfdCkgZW5zdXJlIHRoYXQgdGhpcyBjb2RlIGlzCisJ CSAqIGNvbXBhdGlibGUgd2l0aCB0aGUgb2xkIE5GUyBjbGllbnQsIGluIHRoYXQgaXQgd2lsbAor CQkgKiBzaWduIGV4dGVuZCBhIHZhbHVlIHdpdGggYml0MzEgc2V0LiBUaGlzIG1heSBvciBtYXkK KwkJICogbm90IGJlIGNvcnJlY3QgZm9yIE5GU3YyLCBidXQgc2luY2UgaXQgaXMgYSBsZWdhY3kK KwkJICogZW52aXJvbm1lbnQsIEknZCByYXRoZXIgcmV0YWluIGJhY2t3YXJkcyBjb21wYXRpYmls aXR5LgorCQkgKi8KIAkJc2JwLT5mX2JzaXplID0gKGludDMyX3Qpc2ZwLT5zZl9ic2l6ZTsKIAkJ c2JwLT5mX2Jsb2NrcyA9IChpbnQzMl90KXNmcC0+c2ZfYmxvY2tzOwogCQlzYnAtPmZfYmZyZWUg PSAoaW50MzJfdClzZnAtPnNmX2JmcmVlOwo= ------=_Part_903922_2059190712.1304380636685-- From owner-freebsd-fs@FreeBSD.ORG Tue May 3 03:53:01 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8FA3A106566B for ; Tue, 3 May 2011 03:53:01 +0000 (UTC) (envelope-from jan.koum@gmail.com) Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id 442718FC0C for ; Tue, 3 May 2011 03:53:00 +0000 (UTC) Received: by qwc9 with SMTP id 9so3706573qwc.13 for ; Mon, 02 May 2011 20:53:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=txSRA4WIgbfpHLgc0QzARaX1P4eMOkYRbqN/jX9V+Uw=; b=Z302K1CerNseU+Sl3nzZc5uZKAebIauARiY/uwvwQGE2kmb5w66p8YMsdSDvXEFeuS P7RGKIggQhBptHWNDVOuJlzL23MdQUnnGyZDcYkT0E1nLWZ5+H3IKGfBQPmnMiOva5aZ +x0Fi98213g/h/R+rG9amnje7B14Pz9Lrr/WM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; b=UlBb9adK0TqfbLx02MSga+Y7sGVn4sK2c96LMs3Wwtt7UBllKqCzsSqYMu8j4DHTku Nm0zaFBefk8sHWH+WFxAae2CqKnIDpJpWGqRkxIDrTqzX7Rky5+O0kjAq4+1heXxuqwl RN4jxOOjrrVTiDwyGnMkbayvIYLJHfVo6v9u0= MIME-Version: 1.0 Received: by 10.229.43.99 with SMTP id v35mr6811819qce.8.1304394780385; Mon, 02 May 2011 20:53:00 -0700 (PDT) Sender: jan.koum@gmail.com Received: by 10.229.88.73 with HTTP; Mon, 2 May 2011 20:53:00 -0700 (PDT) In-Reply-To: References: <20110502233601.GA29710@icarus.home.lan> Date: Mon, 2 May 2011 20:53:00 -0700 X-Google-Sender-Auth: WF2x4hPXiNaF4zu51BgfAtAds4w Message-ID: From: Jan Koum To: Adam Vande More Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org, Chris Peiffer Subject: Re: very strange IO issue with FreeBSD 8 and SSD X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 May 2011 03:53:01 -0000 On Mon, May 2, 2011 at 8:26 PM, Adam Vande More wrote: > On Mon, May 2, 2011 at 6:36 PM, Jeremy Chadwick wrote: > >> You might also try comparing iostat output to gstat output, though gstat >> refreshes the screen continually making this a little difficult. >> > > gstat -b > sure: $ sudo gstat -b dT: 1.007s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 605 33 137 0.4 572 4349 34.7 10.9 ad4 0 605 33 137 0.4 572 4349 35.8 10.9 ad4s1 0 620 25 149 1.0 595 4280 22.2 9.9 ad5 0 605 33 137 0.4 572 4349 36.5 11.0 ad4s1a 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1b 0 60 18 60 1.1 42 169 2.5 3.6 ad6 0 817 30 121 0.2 787 5382 15.9 8.1 ad7 0 620 25 149 1.1 595 4280 23.1 10.0 ad5a 0 60 18 60 1.1 42 169 2.6 3.7 ad6a 0 817 30 121 0.2 787 5382 16.5 8.1 ad7a > > Also top -m io may help. > > doubt it. these server only have a single process running (our app) From owner-freebsd-fs@FreeBSD.ORG Tue May 3 03:57:07 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 95148106566B for ; Tue, 3 May 2011 03:57:07 +0000 (UTC) (envelope-from amvandemore@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 212748FC0A for ; Tue, 3 May 2011 03:57:06 +0000 (UTC) Received: by fxm11 with SMTP id 11so5881503fxm.13 for ; Mon, 02 May 2011 20:57:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=qTxSMlN6vXOJk3W+pwtOiqXUGkK02QLbscr6+rZDdu0=; b=YNlV7cpQz1P9ARqvmrLet3MObTPPZvbja+23VoCA7VD0GzaSiRNxRHKCNfNyw28PA8 YzfzXfO8VSQdMcj2rq8c7WLGSnC26YHDWb3ZhOP/BTXsfjEnbotkmtaq4awYC3lZRigd iK632LTc6vvIhKg3qvJ0T7LTEl7/UqNPxauBk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=sCoHaYlsfxXVGuk2DnKeNNTphcHHxqGAbV2BhAfWVSQ7HVVWE9zsslkOnXNE0Lmc0j o1aKIxrZO4VGX7BsQrRF8POIXSAjNFCsDv7Rri4lNZa2ykfYhkycWRJaho35B5LxGz5E CfZW9ayWfkyDg/dRwCMrK5P+ilTG5LgfSfN0o= MIME-Version: 1.0 Received: by 10.223.127.210 with SMTP id h18mr2630278fas.73.1304393198952; Mon, 02 May 2011 20:26:38 -0700 (PDT) Received: by 10.223.20.145 with HTTP; Mon, 2 May 2011 20:26:38 -0700 (PDT) In-Reply-To: <20110502233601.GA29710@icarus.home.lan> References: <20110502233601.GA29710@icarus.home.lan> Date: Mon, 2 May 2011 22:26:38 -0500 Message-ID: From: Adam Vande More To: Jeremy Chadwick Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org, Chris Peiffer Subject: Re: very strange IO issue with FreeBSD 8 and SSD X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 May 2011 03:57:07 -0000 On Mon, May 2, 2011 at 6:36 PM, Jeremy Chadwick wrote: > You might also try comparing iostat output to gstat output, though gstat > refreshes the screen continually making this a little difficult. > gstat -b Also top -m io may help. -- Adam Vande More From owner-freebsd-fs@FreeBSD.ORG Tue May 3 04:17:21 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9240A106566C for ; Tue, 3 May 2011 04:17:21 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta01.westchester.pa.mail.comcast.net (qmta01.westchester.pa.mail.comcast.net [76.96.62.16]) by mx1.freebsd.org (Postfix) with ESMTP id 3B5138FC12 for ; Tue, 3 May 2011 04:17:20 +0000 (UTC) Received: from omta03.westchester.pa.mail.comcast.net ([76.96.62.27]) by qmta01.westchester.pa.mail.comcast.net with comcast id esHA1g0020bG4ec51sHMWi; Tue, 03 May 2011 04:17:21 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta03.westchester.pa.mail.comcast.net with comcast id esHK1g00M1t3BNj3PsHLJL; Tue, 03 May 2011 04:17:21 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 668359B418; Mon, 2 May 2011 21:17:18 -0700 (PDT) Date: Mon, 2 May 2011 21:17:18 -0700 From: Jeremy Chadwick To: Jan Koum Message-ID: <20110503041718.GA34604@icarus.home.lan> References: <20110502233601.GA29710@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, Chris Peiffer Subject: Re: very strange IO issue with FreeBSD 8 and SSD X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 May 2011 04:17:21 -0000 On Mon, May 02, 2011 at 08:53:00PM -0700, Jan Koum wrote: > On Mon, May 2, 2011 at 8:26 PM, Adam Vande More wrote: > > > On Mon, May 2, 2011 at 6:36 PM, Jeremy Chadwick wrote: > > > >> You might also try comparing iostat output to gstat output, though gstat > >> refreshes the screen continually making this a little difficult. > >> > > > > gstat -b > > > > > sure: > > $ sudo gstat -b > dT: 1.007s w: 1.000s > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name > 0 605 33 137 0.4 572 4349 34.7 10.9 ad4 > 0 605 33 137 0.4 572 4349 35.8 10.9 ad4s1 > 0 620 25 149 1.0 595 4280 22.2 9.9 ad5 > 0 605 33 137 0.4 572 4349 36.5 11.0 ad4s1a > 0 0 0 0 0.0 0 0 0.0 0.0 ad4s1b > 0 60 18 60 1.1 42 169 2.5 3.6 ad6 > 0 817 30 121 0.2 787 5382 15.9 8.1 ad7 > 0 620 25 149 1.1 595 4280 23.1 10.0 ad5a > 0 60 18 60 1.1 42 169 2.6 3.7 ad6a > 0 817 30 121 0.2 787 5382 16.5 8.1 ad7a To emulate "iostat 1", you will need to run this from inside of a while loop via the shell. E.g. in sh or bash: while true; do gstat -b; sleep 1; done I believe your concern point that started the thread was that 4MBytes/sec was considered bad performance. There are indications from your iostat output that occasionally the writes are buffered and come in "in a burst" at 10-11MByte/sec, but your overall average is around 4-5MByte/sec. You can test your disk I/O by simply dd'ing directly to a file on one of the filesystems, e.g. cd /place/where/ad5a/is/mounted dd if=/dev/zero of=test.bin bs=64k You can change bs to whatever value you'd like (larger or smaller), but I tend to stick to 64k (64KBytes). ^C when you're finished, and you'll see overall I/O statistics. You can run the gstat loop or iostat at the same time if you wish. Here's an example: icarus# dd if=/dev/zero of=test.bin bs=64k ^C4401+0 records in 4400+0 records out 288358400 bytes transferred in 6.575845 secs (43851155 bytes/sec) Another window running "iostat -x ada0 1": extended device statistics device r/s w/s kr/s kw/s wait svc_t %b ada0 0.0 0.0 0.0 0.0 0 0.0 0 extended device statistics device r/s w/s kr/s kw/s wait svc_t %b ada0 0.0 0.0 0.0 0.0 0 0.0 0 extended device statistics device r/s w/s kr/s kw/s wait svc_t %b ada0 0.0 61.9 0.0 7924.2 8 19.2 18 extended device statistics device r/s w/s kr/s kw/s wait svc_t %b ada0 1.0 334.8 15.9 42790.8 8 19.8 100 extended device statistics device r/s w/s kr/s kw/s wait svc_t %b ada0 1.0 338.5 15.9 43102.6 7 19.7 100 extended device statistics device r/s w/s kr/s kw/s wait svc_t %b ada0 2.0 335.2 31.8 42900.5 8 19.7 100 extended device statistics device r/s w/s kr/s kw/s wait svc_t %b ada0 1.0 336.3 15.9 43047.9 5 20.3 100 extended device statistics device r/s w/s kr/s kw/s wait svc_t %b ada0 1.0 331.7 15.8 42455.8 6 20.3 100 extended device statistics device r/s w/s kr/s kw/s wait svc_t %b ada0 2.0 366.2 31.8 42638.6 8 21.0 100 extended device statistics device r/s w/s kr/s kw/s wait svc_t %b ada0 0.0 125.7 0.0 15836.6 0 20.6 37 extended device statistics device r/s w/s kr/s kw/s wait svc_t %b ada0 0.0 0.0 0.0 0.0 0 0.0 0 ^C Controller and disk details: ahci0: port 0x1c50-0x1c57,0x1c44-0x1c47,0x1c48-0x1c4f,0x1c40-0x1c43,0x18e0-0x18ff mem 0xdc000800-0xdc000fff irq 17 at device 31.2 on pci0 ahci0: [ITHREAD] ahci0: AHCI v1.20 with 6 3Gbps ports, Port Multiplier supported ahcich0: at channel 0 on ahci0 ahcich0: [ITHREAD] ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: ATA-7 SATA 2.x device ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 38166MB (78165360 512 byte sectors: 16H 63S/T 16383C) # camcontrol identify ada0 pass0: ATA-7 SATA 2.x device pass0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) protocol ATA/ATAPI-7 SATA 2.x device model INTEL SSDSA2M040G2GC firmware revision 2CV102M3 serial number XXX WWN XXX cylinders 16383 heads 16 sectors/track 63 sector size logical 512, physical 512, offset 0 LBA supported 78165360 sectors LBA48 supported 78165360 sectors PIO supported PIO4 DMA supported WDMA2 UDMA6 media RPM non-rotating Feature Support Enabled Value Vendor read ahead yes yes write cache yes yes flush cache yes yes overlap no Tagged Command Queuing (TCQ) no no Native Command Queuing (NCQ) yes 32 tags SMART yes yes microcode download yes yes security yes no power management yes yes advanced power management no no automatic acoustic management no no media status notification no no power-up in Standby no no write-read-verify no no unload yes yes free-fall no no data set management (TRIM) yes I can safely say the conversation is going to immediately turn to "how does your application work?", including people asking for full source code and so on. Unless I misunderstand, that's effectively what you're asking: "why does our application perform so badly on these SSDs?" -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Tue May 3 05:49:34 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CA268106564A for ; Tue, 3 May 2011 05:49:34 +0000 (UTC) (envelope-from jan.koum@gmail.com) Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id 7AA328FC16 for ; Tue, 3 May 2011 05:49:34 +0000 (UTC) Received: by qwc9 with SMTP id 9so3738454qwc.13 for ; Mon, 02 May 2011 22:49:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=r/ZIf/YbFBJ/VVKyo6rvcjQV/9L2JloX5SP+kkIj05o=; b=FTT3pRoT6NCkEfzXUIqhxfvjWvp7H3FjnMXpTaUGhaMvnXNg6pedR+5lvKQvG9qyjp ZQQ3YPz89LwnINdMEQ7PX/HOYckhnK5IOhI5p54K+TFxQ6zpQDgXhX3Q8R4RgN1gcdvO ItidufOC74V4XQ+wKRyNkTT0Dgmc0NOsE+5VE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; b=T9idXkogme9Ysyjr0OoTHN+z/e4Q/fyQ8g1dvXCLEqbWeypIN3uLF/1WhRNbAh8MsW hFvRsm4ebxmb/SULvi8D9SQsdEggT7hAMLhu98hq8T0XTyn9D98C6afRSV9M1HgBrPWo C0ng4Qh5W6g372S5nIWToRjRc/x26kZTKZNWg= MIME-Version: 1.0 Received: by 10.229.17.11 with SMTP id q11mr6829607qca.46.1304401772514; Mon, 02 May 2011 22:49:32 -0700 (PDT) Sender: jan.koum@gmail.com Received: by 10.229.88.73 with HTTP; Mon, 2 May 2011 22:49:32 -0700 (PDT) In-Reply-To: <20110503041718.GA34604@icarus.home.lan> References: <20110502233601.GA29710@icarus.home.lan> <20110503041718.GA34604@icarus.home.lan> Date: Mon, 2 May 2011 22:49:32 -0700 X-Google-Sender-Auth: n3DPmuseieyxkLVkWuok2ntXZ_c Message-ID: From: Jan Koum To: Jeremy Chadwick Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org, Chris Peiffer Subject: Re: very strange IO issue with FreeBSD 8 and SSD X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 May 2011 05:49:34 -0000 On Mon, May 2, 2011 at 9:17 PM, Jeremy Chadwick wrote: > > To emulate "iostat 1", you will need to run this from inside of a while > loop via the shell. E.g. in sh or bash: > > while true; do gstat -b; sleep 1; done > > sure: $ sudo gstat -b | head -2 ; while true; do sudo gstat -b | grep 'a$'; sleep 1; echo; done dT: 1.009s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 258 56 16 42 0.2 40 312 2.2 1.0 ad4s1a 288 76 20 81 0.2 57 387 4.0 1.2 ad5a 255 208 28 76 0.4 180 1977 12.1 3.1 ad6a 276 83 26 139 0.5 58 499 6.2 3.1 ad7a 0 17 16 40 0.2 1 4 0.2 0.4 ad4s1a 0 30 28 95 5.4 2 20 0.2 15.1 ad5a 0 2943 30 139 17.9 2913 46257 261.6 40.5 ad6a 0 24 23 82 0.2 1 4 1.6 0.6 ad7a 0 791 30 137 0.5 762 6897 24.2 16.1 ad4s1a 0 858 18 68 0.2 840 8261 35.7 16.7 ad5a 0 1308 18 46 1.7 1290 13023 25.5 22.1 ad6a 0 791 21 113 1.5 771 7320 19.8 21.3 ad7a 0 3152 26 77 18.1 3126 46089 236.0 44.0 ad4s1a 0 385 30 109 10.6 355 2420 11.4 28.1 ad5a 0 1263 25 107 11.5 1239 7172 37.3 27.8 ad6a 696 761 32 159 12.2 730 4510 22.5 31.1 ad7a 0 456 26 76 0.4 430 1892 19.0 9.4 ad4s1a 0 616 14 36 0.2 602 4971 20.3 8.6 ad5a 0 811 14 46 0.3 797 6186 27.0 10.4 ad6a 0 207 19 58 2.1 188 2982 25.2 10.3 ad7a 313 467 20 76 0.2 447 3834 19.2 4.6 ad4s1a 10 33 17 96 0.2 16 123 82.7 8.8 ad5a 3 32 16 62 0.2 16 98 0.3 0.6 ad6a 1 40 20 52 0.2 20 223 0.3 0.7 ad7a 151 1624 18 77 51.6 1606 10039 106.3 69.1 ad4s1a 25 232 8 22 95.1 224 3565 94.4 64.5 ad5a 0 868 15 48 0.2 854 7438 20.7 17.7 ad6a 0 821 11 73 1.2 810 8846 26.3 17.1 ad7a > I believe your concern point that started the thread was that > 4MBytes/sec was considered bad performance. sorry, not quite... i am not judging "performance" - what i am trying to get to the bottom of is why in the world would 500KB of file updates (write/append) per second would generate so much IO > There are indications from > your iostat output that occasionally the writes are buffered and come in > "in a burst" at 10-11MByte/sec, but your overall average is around > 4-5MByte/sec. > > we see higher averages, but OK -- don't think you 4-5MB/sec is still way too high for the little IO application is doing? (dd doesn't really reproduce the real life usage of filesystem with multiple directories and threads using the underlying fs) > I can safely say the conversation is going to immediately turn to "how > does your application work?", including people asking for full source > code and so on. it is a very very very simple app built on top of erlang file module: http://www.erlang.org/doc/man/file.html > Unless I misunderstand, that's effectively what you're > asking: "why does our application perform so badly on these SSDs?" > > not really. what i am asking is: why is there so much IO overhead? where is it coming from? From owner-freebsd-fs@FreeBSD.ORG Tue May 3 06:22:06 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A29D6106566C; Tue, 3 May 2011 06:22:06 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 79AEA8FC13; Tue, 3 May 2011 06:22:06 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p436M6GP093746; Tue, 3 May 2011 06:22:06 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p436M66m093742; Tue, 3 May 2011 06:22:06 GMT (envelope-from linimon) Date: Tue, 3 May 2011 06:22:06 GMT Message-Id: <201105030622.p436M66m093742@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/156781: [zfs] zfs is losing the snapshot directory, X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 May 2011 06:22:06 -0000 Old Synopsis: zfs is loosing the snapshot directory, New Synopsis: [zfs] zfs is losing the snapshot directory, Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Tue May 3 06:21:27 UTC 2011 Responsible-Changed-Why: reclassify. http://www.freebsd.org/cgi/query-pr.cgi?pr=156781 From owner-freebsd-fs@FreeBSD.ORG Tue May 3 08:30:56 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 56A621065674 for ; Tue, 3 May 2011 08:30:56 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au [211.29.132.184]) by mx1.freebsd.org (Postfix) with ESMTP id E77168FC08 for ; Tue, 3 May 2011 08:30:55 +0000 (UTC) Received: from c122-106-155-58.carlnfd1.nsw.optusnet.com.au (c122-106-155-58.carlnfd1.nsw.optusnet.com.au [122.106.155.58]) by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p438UpOZ008619 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 3 May 2011 18:30:52 +1000 Date: Tue, 3 May 2011 18:30:51 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Rick Macklem In-Reply-To: <2119325179.903923.1304380636687.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: <20110503174200.V1050@besplex.bde.org> References: <2119325179.903923.1304380636687.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: fs@FreeBSD.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 May 2011 08:30:56 -0000 On Mon, 2 May 2011, Rick Macklem wrote: > I have attached a version of the patch that I intend to commit > unless it doesn't work for Kostik's test case. Kostik, could > you please test this one. > > Yes, Bruce, I realize you won't like it, but I > have put some comments in it > to try and clarify why it is coded the way it is. > (The arithmetic seems to work the way I would expect it to for > i386, which is the only arch I have for testing.) Sigh. % --- fs/nfsclient/nfs_clport.c.sav 2011-04-30 20:16:39.000000000 -0400 % +++ fs/nfsclient/nfs_clport.c 2011-05-02 19:32:31.000000000 -0400 % @@ -838,21 +838,33 @@ void % nfscl_loadsbinfo(struct nfsmount *nmp, struct nfsstatfs *sfp, void *statfs) % { % struct statfs *sbp = (struct statfs *)statfs; % - nfsquad_t tquad; % % if (nmp->nm_flag & (NFSMNT_NFSV3 | NFSMNT_NFSV4)) { % sbp->f_bsize = NFS_FABLKSIZE; % - tquad.qval = sfp->sf_tbytes; % - sbp->f_blocks = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE)); % - tquad.qval = sfp->sf_fbytes; % - sbp->f_bfree = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE)); % - tquad.qval = sfp->sf_abytes; % - sbp->f_bavail = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE)); % - tquad.qval = sfp->sf_tfiles; % - sbp->f_files = (tquad.lval[0] & 0x7fffffff); % - tquad.qval = sfp->sf_ffiles; % - sbp->f_ffree = (tquad.lval[0] & 0x7fffffff); % + sbp->f_blocks = sfp->sf_tbytes / NFS_FABLKSIZE; % + sbp->f_bfree = sfp->sf_fbytes / NFS_FABLKSIZE; % + /* % + * Although sf_abytes is uint64_t and f_bavail is int64_t, % + * the value after dividing by NFS_FABLKSIZE is small % + * enough that it will fit in 63bits, so it is ok to % + * assign it to f_bavail without fear that it will become % + * negative. % + */ % + sbp->f_bavail = sfp->sf_abytes / NFS_FABLKSIZE; % + sbp->f_files = sfp->sf_tfiles; % + /* Since f_ffree is int64_t, clip it to 63bits. */ % + if (sfp->sf_ffiles > (uint64_t)INT64_MAX) This cast has no effect. INT64_MAX has type int64_t. sf_ffiles has uint64_t. The default binary promotions cause both types to be promoted to the minimally larger common type. This type is uint64_t. Thus INT64_MAX is converted automatically to the correct type. % + sbp->f_ffree = INT64_MAX; % + else % + sbp->f_ffree = sfp->sf_ffiles; % } else if ((nmp->nm_flag & NFSMNT_NFSV4) == 0) { % + /* % + * The type casts to (int32_t) ensure that this code is % + * compatible with the old NFS client, in that it will % + * sign extend a value with bit31 set. This may or may % + * not be correct for NFSv2, but since it is a legacy % + * environment, I'd rather retain backwards compatibility. % + */ % sbp->f_bsize = (int32_t)sfp->sf_bsize; % sbp->f_blocks = (int32_t)sfp->sf_blocks; % sbp->f_bfree = (int32_t)sfp->sf_bfree; It won't sign extend, but will propagate bit31 as an unsigned bit. For example, sfp->sf_blocks = 0x80000000 becomes sbp->f_blocks = 0xFFFFFFFF80000000, which is massively different. Again, omitting the cast gives the correct result if the wire insists on its values being unsigned. The result is only backwards compatible with relatively recent FreeBSD nfs clients. All FreeBSD clients are completely broken if bit31 is set, and compatibility with this brokenness is not useful (but as I pointed out in another reply, we would never have seen the broken case when the old clients weren't old, since it takes a server file system size of about 32TB for bit 31 to be set). The details of the brokenness vary: Net/2, FreeBSD-1, 4.4BSD-Lite, FreeBSD-[2-4]: f_blocks was plain long: if long is 32 bits, then sfp->sf_blocks = 0x80000000 becomes sbp->f_blocks = -0x7fffffff - 1 (LONG_MIN) if long is 64 bits, then sfp->sf_blocks = 0x80000000 becomes sbp->f_blocks = -0x80000000L (INT32_MIN (same as 32-bit LONG_MIN) FreeBSD-current after 2003/11/12, FreeBSD-[5-9]: f_blocks is now uint64_t: changing it (and others from a signed type to an unsigned type mainly gave lots of sign extension bugs, including here. The bugs remain mostly unfixed. sfp->sf_blocks = 0x80000000 becomes sbp->f_blocks = 0xFFFFFFFF80000000 ((uint64_t)INT32_MIN) on all arches. Neither of the garbage values INT32_MIN, ((uint64_t)INT_MIN) gives useful behaviour. The former is negative, though the wire value cannot be negative (not sure about this for v2). Applications that are naive enough to believe this value should assume that the the file system has a negative size and never try to write anything. The latter is enormous and positive. If the wire count really is 0x80000000, then that is already very large, so believing that the value is 0xFFFFFFFF80000000 should make little difference. The bugs are a little different for signed fields like f_bavail. Now there are no sign extension bugs or version-dependent misbehaviours. There are just overflow bugs in the bogus casts. (int32_t)0x80000000 overflows to INT32_MIN (only on 2's complement machines, but no others are supported), and assignment to sbp->f_bavail doesn't change this garbage value. Now the bugs are even further off, since it takes about a 400 TB ffs server file system to reach them. (400 TB with 8% minfree gives a 32TB reserve for root. After using all 32TB of this reserve, there would be -32TB available for non-root. -32TB is INT32_MIN in 16K-blocks.) Bruce From owner-freebsd-fs@FreeBSD.ORG Tue May 3 08:34:55 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BC5641065674 for ; Tue, 3 May 2011 08:34:55 +0000 (UTC) (envelope-from kraduk@gmail.com) Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id 0A6898FC26 for ; Tue, 3 May 2011 08:34:54 +0000 (UTC) Received: by wwc33 with SMTP id 33so6503618wwc.31 for ; Tue, 03 May 2011 01:34:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=WdgJSuhSTDqLei4B/39H4xwb9Wl7nLruSYT8vuzO0Is=; b=KjCRLTVIF+4c5L9vdgl6eUxpX2Q2T7C8ln2fCBbDlWUO/tbzHlk7BtFGUM0RJhtuZk mb2W/BbqNRy7u7f0lQESPdKzIuv6hJgVjTaVuYnzK/3GFl6yw0OQQYPtrvNTQLFdebOM Ofzdt23Ig0oYHPYHSaAUkypBYuvUJZugUD1iM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=vInYMIjqEhle9IYzx5GRHQDJulkmor+faFpXo/rwaCn4G8EfYiTAobJWT90VBzusrZ PpjFakkK54hBKsFe+ACDDFDnvTv0imxnvQ45Nu00FrVtm/l63rKD9MZdwYIvwiu9Wm4K O85wkEKJ//Eu/wUZ7jJM328paZ4iCr+f2Ce0Q= MIME-Version: 1.0 Received: by 10.216.143.74 with SMTP id k52mr8655756wej.0.1304411693679; Tue, 03 May 2011 01:34:53 -0700 (PDT) Received: by 10.216.15.73 with HTTP; Tue, 3 May 2011 01:34:53 -0700 (PDT) In-Reply-To: References: <20110501133627.00006616@unknown> Date: Tue, 3 May 2011 09:34:53 +0100 Message-ID: From: krad To: ambrosehuang ambrose Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org, Alexander Leidinger , dfr@freebsd.org, Emil Smolenski Subject: Re: [ZFS] Booting from zpool created on 4k-sector drive X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 May 2011 08:34:55 -0000 On 2 May 2011 01:47, ambrosehuang ambrose wrote: > Here is my trick: > 1 Download the ZFS V28 patch for 8-stable, > 2 patch the 8-stable , > 3 make buildkernel, > 4 then you will get gptzfsboot, zfsloader, pmbr > 5 install pmbr according to wiki/GPTboot > 6 replace your old gptzfsboot, zfsloader with new ones; > then you can work around this. It works for me( 3 WD10ears + > ZFS V15 + 8-stable) > > 2011/5/1 Alexander Leidinger : > > On Tue, 21 Dec 2010 15:29:01 +0100 "Emil Smolenski" > > wrote: > > > >> Hello, > >> > >> There is a hack to force zpool creation with minimum sector size > >> equal to 4k: > >> > >> # gnop create -S 4096 ${DEV0} > >> # zpool create tank ${DEV0}.nop > >> # zpool export tank > >> # gnop destroy ${DEV0}.nop > >> # zpool import tank > >> > >> Zpool created this way is much faster on problematic 4k sector > >> drives which lies about its sector size (like WD EARS). This hack > >> works perfectly fine when system is running. Gnop layer is created > >> only for "zpool create" command -- ZFS stores information about > >> sector size in its metadata. After zpool creation one can export the > >> pool, remove gnop layer and reimport the pool. Difference can be seen > >> in the output from the zdb command: > >> > >> - on 512 sector device (2**9 = 512): > >> % zdb tank |grep ashift > >> ashift=9 > >> > >> - on 4096 sector device (2**12 = 4096): > >> % zdb tank |grep ashift > >> ashift=12 > >> > >> This change is permanent. The only possibility to change the value > >> of ashift is: zpool destroy/create and restoring pool from backup. > >> > >> But there is one problem: I cannot boot from such pool. Error message: > >> > >> ZFS: i/o error - all block copies unavailable > >> ZFS: can't read MOS > >> ZFS: unexpected object set type 0 > > > > FYI: I can boot successfully from a ZFS v28 pool which was created like > > this in a GPT partition (tested with 9-current). > > > > Bye, > > Alexander. > > > > -- > > http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 > > http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > or grab these prebuilt boot blocks and install them http://people.freebsd.org/~pjd/zfsboot/ worked for me a treat with exactly the problem you have From owner-freebsd-fs@FreeBSD.ORG Tue May 3 09:18:10 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DA1CF106566C; Tue, 3 May 2011 09:18:10 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au [211.29.132.188]) by mx1.freebsd.org (Postfix) with ESMTP id 5DEF18FC15; Tue, 3 May 2011 09:18:09 +0000 (UTC) Received: from c122-106-155-58.carlnfd1.nsw.optusnet.com.au (c122-106-155-58.carlnfd1.nsw.optusnet.com.au [122.106.155.58]) by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p439I404014489 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 3 May 2011 19:18:06 +1000 Date: Tue, 3 May 2011 19:18:04 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Rick Macklem In-Reply-To: <1040257715.898126.1304369225601.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: <20110503183651.L1224@besplex.bde.org> References: <1040257715.898126.1304369225601.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: rmacklem@FreeBSD.org, fs@FreeBSD.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 May 2011 09:18:10 -0000 On Mon, 2 May 2011, Rick Macklem wrote: >>> I'll try and make my Solaris10 box get to -ve frees and then see >>> what >>> it puts on the wire. After that, I'll start a discussion on >>> freebsd-fs@ >>> about how they think a FreeBSD server should behave when f_bavail >>> and/or >>> f_ffree are negative. >> >> The result on Solaris would be interesting. Does Solaris still support >> ffs? You said later that you couldn't get it to generate negative >> values. >> > Well, I just did the reverse (ran a FreeBSD FFS disk out of space so > it reported a -ve free and mounted in on Solaris10). Here are the > "df" outputs (I used "df -k" on Solaris, since that's a compatible format): That is almost as good a test. > FreeBSD-current server (nfsv4-newlap): > Filesystem 1K-blocks Used Avail Capacity Mounted on > /dev/ad4s3a 2026030 671492 1192456 36% / > devfs 1 1 0 100% /dev > /dev/ad4s3e 4697030 4544054 -222786 105% /sub1 > /dev/ad4s3d 5077038 641462 4029414 14% /usr > > Solaris10 client: > Filesystem kbytes used avail capacity Mounted on > /dev/dsk/c0d0s0 3870110 2790938 1040471 73% / > /devices 0 0 0 0% /devices > ctfs 0 0 0 0% /system/contract > proc 0 0 0 0% /proc > mnttab 0 0 0 0% /etc/mnttab > swap 975736 624 975112 1% /etc/svc/volatile > objfs 0 0 0 0% /system/object > /usr/lib/libc/libc_hwcap1.so.1 3870110 2790938 1040471 73% /lib/libc.so.1 > fd 0 0 0 0% /dev/fd > swap 975112 0 975112 0% /tmp > swap 975140 28 975112 1% /var/run > /dev/dsk/c0d0s7 5608190 4118091 1434018 75% /export/home > nfsv4-newlap:/sub1 4697030 4544054 18014398509259198 1% /mnt > > as you can see, Solaris10 doesn't assume it's negative and > reports lottsa avail. > > I don't have a Linux client handy, so I can't do the same test > with Linux, rick I looked at linux-2.6.10 code. It doesn't do anything good for signed counts, and declares f_bavail with a bad mixture of arch-dependent types -- int, s32, u32, __u32, long, u64, __u64 (but no s64 :-). It does 1 nearby thing better: instead of a fixed blocksize of NFS_FABLKSIZE = 512 for nfs, the blocksize is a parameter, and in scaling by this it is careful to round up. NetBSD is best. Its statvfs at least has full support for handling this problem. From a 2004 version of NetBSD statvfs.h: % struct statvfs { % unsigned long f_flag; /* copy of mount exported flags */ % unsigned long f_bsize; /* file system block size */ % unsigned long f_frsize; /* fundamental file system block size */ % unsigned long f_iosize; /* optimal file system block size */ % % fsblkcnt_t f_blocks; /* number of blocks in file system, */ % /* (in units of f_frsize) */ % fsblkcnt_t f_bfree; /* free blocks avail in file system */ % fsblkcnt_t f_bavail; /* free blocks avail to non-root */ % fsblkcnt_t f_bresvd; /* blocks reserved for root */ statvfs is specified by POSIX, and I previously mentioned that POSIX is quite broken in this area. One of the bugs is that all the POSIX block count types like fsblkcnt_t in the above are specified to be unsigned. Thus negative block counts cannot be supported directly using these types, even if the OS has negative block counts. In the above, NetBSD works around this by having an extension giving a nonnegative block count for the blocks reserved for root. statfs should have used this instead of a hack involving negative counts, but presumably didn't to avoid changing the ABI. Even NetBSD doesn't have this extension for statfs, at least in 2004. statfs(2) was apparently deprecated in NetBSD before 2004, with newer features only going into statvfs(2). % % fsfilcnt_t f_files; /* total file nodes in file system */ % fsfilcnt_t f_ffree; /* free file nodes in file system */ % fsfilcnt_t f_favail; /* free file nodes avail to non-root */ % fsfilcnt_t f_fresvd; /* file nodes reserved for root */ Similarly. % % uint64_t f_syncreads; /* count of sync reads since mount */ % uint64_t f_syncwrites; /* count of sync writes since mount */ % % uint64_t f_asyncreads; /* count of async reads since mount */ % uint64_t f_asyncwrites; /* count of async writes since mount */ % % fsid_t f_fsidx; /* NetBSD compatible fsid */ % unsigned long f_fsid; /* Posix compatible fsid */ % unsigned long f_namemax; /* maximum filename length */ % uid_t f_owner; /* user that mounted the file system */ % % uint32_t f_spare[4]; /* spare space */ % % char f_fstypename[_VFS_NAMELEN]; /* fs type name */ % char f_mntonname[_VFS_MNAMELEN]; /* directory on which mounted */ % char f_mntfromname[_VFS_MNAMELEN]; /* mounted file system */ % % }; As I said before, NetBSD's nfs tries to make this work for nfs, but I couldn't this worked in NetBSD or anything I could think of, since the extension is not in the nfs protocol. Now I think it does work, but still can't see how. Details: NetBSD puts f_bavail on the wire without clamping it (it just scales it). Now I think f_bavail is never negative in NetBSD, so this scaling doesn't involves the usual sign extension and overflow bugs, or abuse of the top bit. The client zaps negative values for v3 f_bavail but not for other things, and initializes f_bresvd: from a 2005 version ofs nfs_vfsops.c: % if (v3) { % sbp->f_frsize = sbp->f_bsize = NFS_FABLKSIZE; % tquad = fxdr_hyper(&sfp->sf_tbytes); % sbp->f_blocks = ((quad_t)tquad / (quad_t)NFS_FABLKSIZE); % tquad = fxdr_hyper(&sfp->sf_fbytes); % sbp->f_bfree = ((quad_t)tquad / (quad_t)NFS_FABLKSIZE); % tquad = fxdr_hyper(&sfp->sf_abytes); % tquad = ((quad_t)tquad / (quad_t)NFS_FABLKSIZE); % sbp->f_bresvd = sbp->f_bfree - tquad; I still can't see how this initialization works. f_bresvd has to end up as nonzero if root has a reserve, and drop to zero as the reserve is used up. sf_fbytes - sf_abytes must give this reserve. % sbp->f_bavail = tquad; % #ifdef COMPAT_20 % /* Handle older NFS servers returning negative values */ % if ((quad_t)sbp->f_bavail < 0) % sbp->f_bavail = 0; % #endif NetBSD's own server puts f_bavail on the wire unchanged except for scaling, so it is now clear that f_bavail is never negative in NetBSD. % tquad = fxdr_hyper(&sfp->sf_tfiles); % sbp->f_files = tquad; % tquad = fxdr_hyper(&sfp->sf_ffiles); % sbp->f_ffree = tquad; % sbp->f_favail = tquad; "Negative" values for this are not zapped. % sbp->f_fresvd = 0; This reserv is not really supported. Supporting it is impossible since there is not as much redundancy in the wire values for the file counts as for the block counts. % sbp->f_namemax = MAXNAMLEN; % } else { % sbp->f_bsize = NFS_FABLKSIZE; % sbp->f_frsize = fxdr_unsigned(int32_t, sfp->sf_bsize); % sbp->f_blocks = fxdr_unsigned(int32_t, sfp->sf_blocks); % sbp->f_bfree = fxdr_unsigned(int32_t, sfp->sf_bfree); % sbp->f_bavail = fxdr_unsigned(int32_t, sfp->sf_bavail); Still has old bugs. % sbp->f_fresvd = 0; % sbp->f_files = 0; % sbp->f_ffree = 0; % sbp->f_favail = 0; % sbp->f_fresvd = 0; % sbp->f_namemax = MAXNAMLEN; % } Next steps: someone should look at why there are 3 nfsv3 protocol fields for the block counts when only 2 are strictly needed. Bruce From owner-freebsd-fs@FreeBSD.ORG Tue May 3 11:48:44 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5180F106566C; Tue, 3 May 2011 11:48:44 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id 01CAB8FC21; Tue, 3 May 2011 11:48:43 +0000 (UTC) Received: from outgoing.leidinger.net (p5B155A42.dip.t-dialin.net [91.21.90.66]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id D183F844017; Tue, 3 May 2011 13:48:29 +0200 (CEST) Received: from webmail.leidinger.net (webmail.Leidinger.net [IPv6:fd73:10c7:2053:1::2:102]) by outgoing.leidinger.net (Postfix) with ESMTP id 15D5311C5; Tue, 3 May 2011 13:48:27 +0200 (CEST) Received: (from www@localhost) by webmail.leidinger.net (8.14.4/8.13.8/Submit) id p43BmQpR006371; Tue, 3 May 2011 13:48:26 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Tue, 03 May 2011 13:48:26 +0200 Message-ID: <20110503134826.712070yt2urhxp8g@webmail.leidinger.net> Date: Tue, 03 May 2011 13:48:26 +0200 From: Alexander Leidinger To: Pawel Jakub Dawidek References: <4DBBB20A.5050102@FreeBSD.org> <20110430072831.GA65598@icarus.home.lan> <20110501000656.00007ea1@unknown> <20110501133752.GC3245@garage.freebsd.pl> In-Reply-To: <20110501133752.GC3245@garage.freebsd.pl> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.6) X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: D183F844017.AF0FB X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=0, required 6, autolearn=disabled) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1305028111.44812@DGdeTBgXehAN7f2t7b6JVg X-EBL-Spam-Status: No Cc: freebsd-fs@FreeBSD.org, Alexander Motin Subject: Re: TRIM clustering X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 May 2011 11:48:44 -0000 Quoting Pawel Jakub Dawidek (from Sun, 1 May 2011 15:37:52 +0200): > On Sun, May 01, 2011 at 12:06:56AM +0200, Alexander Leidinger wrote: >> On Sat, 30 Apr 2011 00:28:31 -0700 Jeremy Chadwick >> wrote: >> >> > On Sat, Apr 30, 2011 at 09:54:02AM +0300, Alexander Motin wrote: >> >> > Other notes: TRIM needs to be supported on swap as well, and in my >> > opinion this is just as important as it being in UFS. I'm not sure >> > how one would implement that. >> >> This brings up the question if a ZFS cache (where the contents do not >> survive a reboot) is completely TRIMmed before used (and normally >> trimmed during use)... > > It is not trimmed at all. This does not sound like the optimal solution... is there a way to know the first access after boot/attach to a cache device? If yes, would it be possible to TRIM the complete provider (except for some static data which needs to be there) from this place? This would not solve the not TRIMmed during use part, put at least a reboot/reattach could provide a sane state. Bye, Alexander. -- BOFH excuse #189: SCSI's too wide http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Tue May 3 14:07:47 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3B468106564A for ; Tue, 3 May 2011 14:07:47 +0000 (UTC) (envelope-from ticso@cicely7.cicely.de) Received: from raven.bwct.de (raven.bwct.de [85.159.14.73]) by mx1.freebsd.org (Postfix) with ESMTP id 96DAC8FC0A for ; Tue, 3 May 2011 14:07:46 +0000 (UTC) Received: from mail.cicely.de ([10.1.1.37]) by raven.bwct.de (8.13.4/8.13.4) with ESMTP id p43Db59L039034 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 3 May 2011 15:37:06 +0200 (CEST) (envelope-from ticso@cicely7.cicely.de) Received: from cicely7.cicely.de (cicely7.cicely.de [10.1.1.9]) by mail.cicely.de (8.14.4/8.14.4) with ESMTP id p43DapMk069143 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 3 May 2011 15:36:51 +0200 (CEST) (envelope-from ticso@cicely7.cicely.de) Received: from cicely7.cicely.de (localhost [127.0.0.1]) by cicely7.cicely.de (8.14.2/8.14.2) with ESMTP id p43DPHG8002183; Tue, 3 May 2011 15:25:17 +0200 (CEST) (envelope-from ticso@cicely7.cicely.de) Received: (from ticso@localhost) by cicely7.cicely.de (8.14.2/8.14.2/Submit) id p43DPHgm002182; Tue, 3 May 2011 15:25:17 +0200 (CEST) (envelope-from ticso) Date: Tue, 3 May 2011 15:25:17 +0200 From: Bernd Walter To: Alexander Leidinger Message-ID: <20110503132517.GF1549@cicely7.cicely.de> References: <4DBBB20A.5050102@FreeBSD.org> <20110430072831.GA65598@icarus.home.lan> <20110501000656.00007ea1@unknown> <20110501133752.GC3245@garage.freebsd.pl> <20110503134826.712070yt2urhxp8g@webmail.leidinger.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110503134826.712070yt2urhxp8g@webmail.leidinger.net> X-Operating-System: FreeBSD cicely7.cicely.de 7.0-STABLE i386 User-Agent: Mutt/1.5.11 X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED=-1, BAYES_00=-1.9, T_RP_MATCHES_RCVD=-0.01 autolearn=unavailable version=3.3.0 X-Spam-Checker-Version: SpamAssassin 3.3.0 (2010-01-18) on spamd.cicely.de Cc: freebsd-fs@freebsd.org, Alexander Motin , Pawel Jakub Dawidek Subject: Re: TRIM clustering X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: ticso@cicely.de List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 May 2011 14:07:47 -0000 On Tue, May 03, 2011 at 01:48:26PM +0200, Alexander Leidinger wrote: > Quoting Pawel Jakub Dawidek (from Sun, 1 May 2011 > 15:37:52 +0200): > > >On Sun, May 01, 2011 at 12:06:56AM +0200, Alexander Leidinger wrote: > >>On Sat, 30 Apr 2011 00:28:31 -0700 Jeremy Chadwick > >> wrote: > >> > >>> On Sat, Apr 30, 2011 at 09:54:02AM +0300, Alexander Motin wrote: > >> > >>> Other notes: TRIM needs to be supported on swap as well, and in my > >>> opinion this is just as important as it being in UFS. I'm not sure > >>> how one would implement that. > >> > >>This brings up the question if a ZFS cache (where the contents do not > >>survive a reboot) is completely TRIMmed before used (and normally > >>trimmed during use)... > > > >It is not trimmed at all. > > This does not sound like the optimal solution... is there a way to > know the first access after boot/attach to a cache device? If yes, > would it be possible to TRIM the complete provider (except for some > static data which needs to be there) from this place? This would not > solve the not TRIMmed during use part, put at least a reboot/reattach > could provide a sane state. What would be the possible benefit? I mean it's just until the device is filled, which won't happen that regular in environments where cache devices make sense. More interesting would be to have the cached data reboot persistent one day instead of TRIMing it. -- B.Walter http://www.bwct.de Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm. From owner-freebsd-fs@FreeBSD.ORG Tue May 3 14:33:43 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9CD8C106566C; Tue, 3 May 2011 14:33:43 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id 2BDB48FC19; Tue, 3 May 2011 14:33:42 +0000 (UTC) Received: from outgoing.leidinger.net (p5B155A42.dip.t-dialin.net [91.21.90.66]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id E7DF3844017; Tue, 3 May 2011 16:33:27 +0200 (CEST) Received: from webmail.leidinger.net (webmail.Leidinger.net [IPv6:fd73:10c7:2053:1::2:102]) by outgoing.leidinger.net (Postfix) with ESMTP id 0179011C6; Tue, 3 May 2011 16:33:24 +0200 (CEST) Received: (from www@localhost) by webmail.leidinger.net (8.14.4/8.13.8/Submit) id p43EXOnR046041; Tue, 3 May 2011 16:33:24 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Tue, 03 May 2011 16:33:24 +0200 Message-ID: <20110503163324.11285rolq1oyrnlc@webmail.leidinger.net> Date: Tue, 03 May 2011 16:33:24 +0200 From: Alexander Leidinger To: ticso@cicely.de, Bernd Walter References: <4DBBB20A.5050102@FreeBSD.org> <20110430072831.GA65598@icarus.home.lan> <20110501000656.00007ea1@unknown> <20110501133752.GC3245@garage.freebsd.pl> <20110503134826.712070yt2urhxp8g@webmail.leidinger.net> <20110503132517.GF1549@cicely7.cicely.de> In-Reply-To: <20110503132517.GF1549@cicely7.cicely.de> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.6) X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: E7DF3844017.A04FF X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=0, required 6, autolearn=disabled) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1305038008.30478@0FRKaNgB0n/VcxxUkpbs+g X-EBL-Spam-Status: No Cc: freebsd-fs@freebsd.org, Alexander Motin , Pawel Jakub Dawidek Subject: Re: TRIM clustering X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 May 2011 14:33:43 -0000 Quoting Bernd Walter (from Tue, 3 May 2011 15:25:17 +0200): > On Tue, May 03, 2011 at 01:48:26PM +0200, Alexander Leidinger wrote: >> Quoting Pawel Jakub Dawidek (from Sun, 1 May 2011 >> 15:37:52 +0200): >> >> >On Sun, May 01, 2011 at 12:06:56AM +0200, Alexander Leidinger wrote: >> >>On Sat, 30 Apr 2011 00:28:31 -0700 Jeremy Chadwick >> >> wrote: >> >> >> >>> On Sat, Apr 30, 2011 at 09:54:02AM +0300, Alexander Motin wrote: >> >> >> >>> Other notes: TRIM needs to be supported on swap as well, and in my >> >>> opinion this is just as important as it being in UFS. I'm not sure >> >>> how one would implement that. >> >> >> >>This brings up the question if a ZFS cache (where the contents do not >> >>survive a reboot) is completely TRIMmed before used (and normally >> >>trimmed during use)... >> > >> >It is not trimmed at all. >> >> This does not sound like the optimal solution... is there a way to >> know the first access after boot/attach to a cache device? If yes, >> would it be possible to TRIM the complete provider (except for some >> static data which needs to be there) from this place? This would not >> solve the not TRIMmed during use part, put at least a reboot/reattach >> could provide a sane state. > > What would be the possible benefit? > I mean it's just until the device is filled, which won't happen that > regular in environments where cache devices make sense. If a cache is not full, it is not used well (or you do not access that much data, but in this case you do not have to worry). The benefit of the initial TRIM should be a faster cache-fill latency (if it matters or not depends upon your use-case/drive-channel-usage). Regarding the in-use-TRIMming... I agree that it is subject to discussion (and the use-case), but at least it looks like a more correct solution. If large objects are removed from the cache, following cache fills could have lower write latency. Again, if this matters or not depends upon your use-case/drive-channel-usage. > More interesting would be to have the cached data reboot persistent > one day instead of TRIMing it. I assume this would be more work than to teach it to TRIM (looking for low haning fruits), but in general I agree. Bye, Alexander. -- Fools rush in -- and get the best seats. http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Tue May 3 22:05:50 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 66D971065672 for ; Tue, 3 May 2011 22:05:50 +0000 (UTC) (envelope-from wmn@siberianet.ru) Received: from mail.siberianet.ru (mail.siberianet.ru [89.105.136.7]) by mx1.freebsd.org (Postfix) with ESMTP id C0DF08FC15 for ; Tue, 3 May 2011 22:05:49 +0000 (UTC) Received: from wmn.localnet (wmn.siberianet.ru [89.105.137.12]) by mail.siberianet.ru (Postfix) with ESMTPA id 612695028AE for ; Wed, 4 May 2011 05:47:58 +0800 (KRAST) From: Sergey Lobanov Organization: ISP "SiberiaNet" Date: Wed, 4 May 2011 05:47:52 +0800 User-Agent: KMail/1.13.7 (Linux/2.6.38-ARCH; KDE/4.6.2; i686; ; ) MIME-Version: 1.0 X-Length: 2409 X-UID: 18 To: freebsd-fs@freebsd.org Content-Type: Text/Plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Message-Id: <201105040547.52216.wmn@siberianet.ru> Subject: fsck_ufs only in preen mode terminates with non-zero exit status trying to check absent device X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 May 2011 22:05:50 -0000 Hello, I am trying to workaround problem in setup with md(4) file-backed images mounted in jails(8). I could not find how to automatically check file systems on md images during system boot or jail start. That is, after hard system reset (for example, because of power loss) file systems on md file-backed images are all dirty and is not auto-repaired out of the box. May be I've missed something, feel free to point me out to the documentation describing the case. Here is workaround i am trying to make: 1) md images are all added into jail fstabs so system can boot normally (because if images are in host fstab, system stops on check of such obviously absent at boot time devices). 2) I use ezjail, so we add hack into its rc-NG script which executes external script to check file systems on corresponding md images before jail start; ezjail script relies on exit status of this external script, so we can skip jail if check have been failed. 3) The script for check of md images gets jail name as parameter, greps /dev/md* rows from corresponding fstab file and tries to fsck in preen mode first and then in normal mode if first fails. And here we get problem with fsck: in preen mode it exits with non-zero status if device is not present, but if we then launch it in normal mode for the same device, it prints errors and terminates with status 0. Example script (test-fsck-ufs.sh): -------------------- #!/bin/sh rc_info="YES" . /etc/rc.subr /sbin/fsck_ufs -p /dev/md-non-existent if [ $? -ne 0 ]; then warn "Could not check in preen mode, trying normal..." /sbin/fsck_ufs -y /dev/md-non-existent || err $? "Could not check in normal mode, XXX IMAGE FILE IS CORRUPT XXX" else info "Consistent" fi -------------------- Result of execution of above script on 8.2-stable r220968 and 7.3-stable r215651: Can't stat /dev/md-non-existent: No such file or directory ./test-fsck-ufs.sh: WARNING: Could not check in preen mode, trying normal... Can't stat /dev/md-non-existent: No such file or directory Can't stat /dev/md-non-existent: No such file or directory Which is incorrect from my point of view, fsck_ffs(8) clearly states at the very end: "EXIT STATUS The fsck_ffs utility exits 0 on success, and >0 if an error occurs." I can definitely hack fsck_ffs so it will return error on such conditions, something like this (fixes my case but was not checked in normal operation, patch for releng8): ---patch start--- --- sbin/fsck_ffs/main.c.orig 2011-05-04 04:11:18.000000000 +0800 +++ sbin/fsck_ffs/main.c 2011-05-04 04:29:23.000000000 +0800 @@ -70,6 +70,7 @@ static int checkfilesys(char *filesys); static int chkdoreload(struct statfs *mntp); static struct statfs *getmntpt(const char *); +char fails = 0; int main(int argc, char *argv[]) @@ -179,6 +180,8 @@ if (returntosingle) ret = 2; + else + if (fails) ret = EEXIT; exit(ret); } @@ -373,6 +376,7 @@ case 0: if (preen) pfatal("CAN'T CHECK FILE SYSTEM."); + fails = 1; return (0); case -1: clean: ---patch end--- but may be there is some other, more sane way. Or I've just missed something, there is strong reason for such behaviour and it is a feature actually :} I am subscribed to the list so there is no need to add me to CC. -- ISP SiberiaNet System and Network Administrator From owner-freebsd-fs@FreeBSD.ORG Tue May 3 23:23:56 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7780F106564A for ; Tue, 3 May 2011 23:23:56 +0000 (UTC) (envelope-from roberto@keltia.freenix.fr) Received: from keltia.net (unknown [IPv6:2a01:240:fe5c::41]) by mx1.freebsd.org (Postfix) with ESMTP id 2DA798FC13 for ; Tue, 3 May 2011 23:23:56 +0000 (UTC) Received: from lonrach.keltia.net (lonrach.keltia.net [193.56.58.71]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: roberto) by keltia.net (Postfix/TLS) with ESMTPSA id 09FB5E15B for ; Wed, 4 May 2011 01:23:54 +0200 (CEST) Date: Wed, 4 May 2011 01:23:52 +0200 From: Ollivier Robert To: freebsd-fs@freebsd.org Message-ID: <20110503232352.GB29092@lonrach.keltia.net> References: <4DB8EF02.8060406@bk.ru> <20110430001524.GA58845@icarus.home.lan> <4DBC2E46.9060404@userid.org> <4DBCA4AE.3090506@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4DBCA4AE.3090506@FreeBSD.org> X-Operating-System: MacOS X / MBP 4,1 - FreeBSD 8.0 / T3500-E5520 Nehalem User-Agent: Mutt/1.5.20 (2009-06-14) Subject: Re: ZFS v28 for 8.2-STABLE X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 May 2011 23:23:56 -0000 According to Martin Matuska: > I have updated patch to reflect latest changes (grab latest one): > http://people.freebsd.org/~mm/patches/zfs/v28/ My 8.2-STABLE machine (r221058) is running with the 20110317 patch applied, I put back my partition on the 3rd drive as a cache (was taking a full CPU in v15 due to a overflow bug) and it has been working fine for a few days now, doing www/uucp/dns/dnssec/ssh and sending away dozens of spammers. Handle these all fine, I even enabled deduplication on some filesets: 643 [1:19] root@centre:munin/plugins# zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT data 284G 210G 73.8G 74% 1.00x ONLINE - 1x 320 GB tank 294G 69.2G 225G 23% 1.06x ONLINE - 2x 320 GB mirrorred > As to your setup, have you tried using a partition as a log device? cache yes, log no. -- Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- roberto@keltia.freenix.fr In memoriam to Ondine : http://ondine.keltia.net/ From owner-freebsd-fs@FreeBSD.ORG Wed May 4 00:15:43 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D60AD1065670 for ; Wed, 4 May 2011 00:15:42 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 7D8D08FC14 for ; Wed, 4 May 2011 00:15:42 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEAH6ZwE2DaFvO/2dsb2JhbACEUaJGiHKreJEdgSqDV4EBBI8Yjk4 X-IronPort-AV: E=Sophos;i="4.64,312,1301889600"; d="scan'208";a="120327109" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 03 May 2011 20:15:41 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 8AABCB3F24; Tue, 3 May 2011 20:15:41 -0400 (EDT) Date: Tue, 3 May 2011 20:15:41 -0400 (EDT) From: Rick Macklem To: Bruce Evans Message-ID: <2143699515.968680.1304468141505.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20110503174200.V1050@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Cc: fs@FreeBSD.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 May 2011 00:15:43 -0000 > On Mon, 2 May 2011, Rick Macklem wrote: > > > I have attached a version of the patch that I intend to commit > > unless it doesn't work for Kostik's test case. Kostik, could > > you please test this one. > > > > Yes, Bruce, I realize you won't like it, but I > > have put some comments in it > > to try and clarify why it is coded the way it is. > > (The arithmetic seems to work the way I would expect it to for > > i386, which is the only arch I have for testing.) > > Sigh. > > % --- fs/nfsclient/nfs_clport.c.sav 2011-04-30 20:16:39.000000000 > -0400 > % +++ fs/nfsclient/nfs_clport.c 2011-05-02 19:32:31.000000000 -0400 > % @@ -838,21 +838,33 @@ void > % nfscl_loadsbinfo(struct nfsmount *nmp, struct nfsstatfs *sfp, void > *statfs) > % { > % struct statfs *sbp = (struct statfs *)statfs; > % - nfsquad_t tquad; > % > % if (nmp->nm_flag & (NFSMNT_NFSV3 | NFSMNT_NFSV4)) { > % sbp->f_bsize = NFS_FABLKSIZE; > % - tquad.qval = sfp->sf_tbytes; > % - sbp->f_blocks = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE)); > % - tquad.qval = sfp->sf_fbytes; > % - sbp->f_bfree = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE)); > % - tquad.qval = sfp->sf_abytes; > % - sbp->f_bavail = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE)); > % - tquad.qval = sfp->sf_tfiles; > % - sbp->f_files = (tquad.lval[0] & 0x7fffffff); > % - tquad.qval = sfp->sf_ffiles; > % - sbp->f_ffree = (tquad.lval[0] & 0x7fffffff); > % + sbp->f_blocks = sfp->sf_tbytes / NFS_FABLKSIZE; > % + sbp->f_bfree = sfp->sf_fbytes / NFS_FABLKSIZE; > % + /* > % + * Although sf_abytes is uint64_t and f_bavail is int64_t, > % + * the value after dividing by NFS_FABLKSIZE is small > % + * enough that it will fit in 63bits, so it is ok to > % + * assign it to f_bavail without fear that it will become > % + * negative. > % + */ > % + sbp->f_bavail = sfp->sf_abytes / NFS_FABLKSIZE; > % + sbp->f_files = sfp->sf_tfiles; > % + /* Since f_ffree is int64_t, clip it to 63bits. */ > % + if (sfp->sf_ffiles > (uint64_t)INT64_MAX) > > This cast has no effect. INT64_MAX has type int64_t. sf_ffiles has > uint64_t. The default binary promotions cause both types to be > promoted > to the minimally larger common type. This type is uint64_t. Thus > INT64_MAX is converted automatically to the correct type. > Yea, I didn't tthink the cast mattered and it didn't affect the outcome for my little userland test program, so I'll take it out. (I was trying the "play it safe", but if you say it doesn't matter, I believe you.) > % + sbp->f_ffree = INT64_MAX; > % + else > % + sbp->f_ffree = sfp->sf_ffiles; > % } else if ((nmp->nm_flag & NFSMNT_NFSV4) == 0) { > % + /* > % + * The type casts to (int32_t) ensure that this code is > % + * compatible with the old NFS client, in that it will > % + * sign extend a value with bit31 set. This may or may > % + * not be correct for NFSv2, but since it is a legacy > % + * environment, I'd rather retain backwards compatibility. > % + */ > % sbp->f_bsize = (int32_t)sfp->sf_bsize; > % sbp->f_blocks = (int32_t)sfp->sf_blocks; > % sbp->f_bfree = (int32_t)sfp->sf_bfree; > > It won't sign extend, but will propagate bit31 as an unsigned bit. For > example, sfp->sf_blocks = 0x80000000 becomes sbp->f_blocks = > 0xFFFFFFFF80000000, which is massively different. Again, omitting the > cast gives the correct result if the wire insists on its values being > unsigned. > Ok, I'll change the comment. > The result is only backwards compatible with relatively recent FreeBSD > nfs clients. All FreeBSD clients are completely broken if bit31 is > set, and compatibility with this brokenness is not useful (but as I > pointed out in another reply, we would never have seen the broken case > when the old clients weren't old, since it takes a server file system > size of about 32TB for bit 31 to be set). Well, the last legitimate use of the FreeBSD NFSv2 client was a diskless root fs stored on a non-FreeBSD NFS server (because pxeboot didn't know the correct file handle size). Since this is now fixed, there really isn't any use for the NFSv2 client, as far as I know. Given that and the fact that no one is complaining about it being broken, I feel it should just be left alone. (Or remain "bug compatible" with the regular NFS client, if you prefer.) I'm afraid I have other things to work on and just don't see changing NFSv2 (a 1985 protocol superceded by NFSv3 in 1994) a priority, rick. > The details of the > brokenness > vary: > > Net/2, FreeBSD-1, 4.4BSD-Lite, FreeBSD-[2-4]: > f_blocks was plain long: > if long is 32 bits, then sfp->sf_blocks = 0x80000000 becomes > sbp->f_blocks = -0x7fffffff - 1 (LONG_MIN) > if long is 64 bits, then sfp->sf_blocks = 0x80000000 becomes > sbp->f_blocks = -0x80000000L (INT32_MIN (same as 32-bit LONG_MIN) > > FreeBSD-current after 2003/11/12, FreeBSD-[5-9]: > f_blocks is now uint64_t: > changing it (and others from a signed type to an unsigned type mainly > gave lots of sign extension bugs, including here. The bugs remain > mostly unfixed. > sfp->sf_blocks = 0x80000000 becomes > sbp->f_blocks = 0xFFFFFFFF80000000 ((uint64_t)INT32_MIN) on all > arches. > > Neither of the garbage values INT32_MIN, ((uint64_t)INT_MIN) gives > useful > behaviour. The former is negative, though the wire value cannot be > negative > (not sure about this for v2). Applications that are naive enough to > believe > this value should assume that the the file system has a negative size > and > never try to write anything. The latter is enormous and positive. If > the > wire count really is 0x80000000, then that is already very large, so > believing that the value is 0xFFFFFFFF80000000 should make little > difference. > > The bugs are a little different for signed fields like f_bavail. Now > there > are no sign extension bugs or version-dependent misbehaviours. There > are > just overflow bugs in the bogus casts. (int32_t)0x80000000 overflows > to > INT32_MIN (only on 2's complement machines, but no others are > supported), > and assignment to sbp->f_bavail doesn't change this garbage value. Now > the bugs are even further off, since it takes about a 400 TB ffs > server file > system to reach them. (400 TB with 8% minfree gives a 32TB reserve for > root. After using all 32TB of this reserve, there would be -32TB > available > for non-root. -32TB is INT32_MIN in 16K-blocks.) > > Bruce From owner-freebsd-fs@FreeBSD.ORG Wed May 4 00:27:54 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1A088106564A; Wed, 4 May 2011 00:27:54 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id A908F8FC13; Wed, 4 May 2011 00:27:53 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEAP+cwE2DaFvO/2dsb2JhbACEUaJGiHKrX5EcgSqBX4F4gQEEjxiOTg X-IronPort-AV: E=Sophos;i="4.64,312,1301889600"; d="scan'208";a="119503788" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 03 May 2011 20:27:53 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id C4F95B3F34; Tue, 3 May 2011 20:27:52 -0400 (EDT) Date: Tue, 3 May 2011 20:27:52 -0400 (EDT) From: Rick Macklem To: Bruce Evans Message-ID: <308871799.968962.1304468872744.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20110503183651.L1224@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: rmacklem@FreeBSD.org, fs@FreeBSD.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 May 2011 00:27:54 -0000 > On Mon, 2 May 2011, Rick Macklem wrote: > > >>> I'll try and make my Solaris10 box get to -ve frees and then see > >>> what > >>> it puts on the wire. After that, I'll start a discussion on > >>> freebsd-fs@ > >>> about how they think a FreeBSD server should behave when f_bavail > >>> and/or > >>> f_ffree are negative. > >> > >> The result on Solaris would be interesting. Does Solaris still > >> support > >> ffs? You said later that you couldn't get it to generate negative > >> values. > >> > > Well, I just did the reverse (ran a FreeBSD FFS disk out of space so > > it reported a -ve free and mounted in on Solaris10). Here are the > > "df" outputs (I used "df -k" on Solaris, since that's a compatible > > format): > > That is almost as good a test. > > > FreeBSD-current server (nfsv4-newlap): > > Filesystem 1K-blocks Used Avail Capacity Mounted on > > /dev/ad4s3a 2026030 671492 1192456 36% / > > devfs 1 1 0 100% /dev > > /dev/ad4s3e 4697030 4544054 -222786 105% /sub1 > > /dev/ad4s3d 5077038 641462 4029414 14% /usr > > > > Solaris10 client: > > Filesystem kbytes used avail capacity Mounted on > > /dev/dsk/c0d0s0 3870110 2790938 1040471 73% / > > /devices 0 0 0 0% /devices > > ctfs 0 0 0 0% /system/contract > > proc 0 0 0 0% /proc > > mnttab 0 0 0 0% /etc/mnttab > > swap 975736 624 975112 1% /etc/svc/volatile > > objfs 0 0 0 0% /system/object > > /usr/lib/libc/libc_hwcap1.so.1 3870110 2790938 1040471 73% > > /lib/libc.so.1 > > fd 0 0 0 0% /dev/fd > > swap 975112 0 975112 0% /tmp > > swap 975140 28 975112 1% /var/run > > /dev/dsk/c0d0s7 5608190 4118091 1434018 75% /export/home > > nfsv4-newlap:/sub1 4697030 4544054 18014398509259198 1% /mnt > > > > as you can see, Solaris10 doesn't assume it's negative and > > reports lottsa avail. > > > > I don't have a Linux client handy, so I can't do the same test > > with Linux, rick > > I looked at linux-2.6.10 code. It doesn't do anything good for signed > counts, and declares f_bavail with a bad mixture of arch-dependent > types > -- int, s32, u32, __u32, long, u64, __u64 (but no s64 :-). It does 1 > nearby thing better: instead of a fixed blocksize of NFS_FABLKSIZE = > 512 > for nfs, the blocksize is a parameter, and in scaling by this it is > careful to round up. > > NetBSD is best. Its statvfs at least has full support for handling > this > problem. From a 2004 version of NetBSD statvfs.h: > > % struct statvfs { > % unsigned long f_flag; /* copy of mount exported flags */ > % unsigned long f_bsize; /* file system block size */ > % unsigned long f_frsize; /* fundamental file system block size */ > % unsigned long f_iosize; /* optimal file system block size */ > % > % fsblkcnt_t f_blocks; /* number of blocks in file system, */ > % /* (in units of f_frsize) */ > % fsblkcnt_t f_bfree; /* free blocks avail in file system */ > % fsblkcnt_t f_bavail; /* free blocks avail to non-root */ > % fsblkcnt_t f_bresvd; /* blocks reserved for root */ > > statvfs is specified by POSIX, and I previously mentioned that POSIX > is > quite broken in this area. One of the bugs is that all the POSIX block > count types like fsblkcnt_t in the above are specified to be unsigned. > Thus negative block counts cannot be supported directly using these > types, > even if the OS has negative block counts. In the above, NetBSD works > around this by having an extension giving a nonnegative block count > for > the blocks reserved for root. statfs should have used this instead of > a hack involving negative counts, but presumably didn't to avoid > changing > the ABI. Even NetBSD doesn't have this extension for statfs, at least > in 2004. statfs(2) was apparently deprecated in NetBSD before 2004, > with > newer features only going into statvfs(2). > > % > % fsfilcnt_t f_files; /* total file nodes in file system */ > % fsfilcnt_t f_ffree; /* free file nodes in file system */ > % fsfilcnt_t f_favail; /* free file nodes avail to non-root */ > % fsfilcnt_t f_fresvd; /* file nodes reserved for root */ > > Similarly. > > % > % uint64_t f_syncreads; /* count of sync reads since mount */ > % uint64_t f_syncwrites; /* count of sync writes since mount */ > % > % uint64_t f_asyncreads; /* count of async reads since mount */ > % uint64_t f_asyncwrites; /* count of async writes since mount */ > % > % fsid_t f_fsidx; /* NetBSD compatible fsid */ > % unsigned long f_fsid; /* Posix compatible fsid */ > % unsigned long f_namemax; /* maximum filename length */ > % uid_t f_owner; /* user that mounted the file system */ > % > % uint32_t f_spare[4]; /* spare space */ > % > % char f_fstypename[_VFS_NAMELEN]; /* fs type name */ > % char f_mntonname[_VFS_MNAMELEN]; /* directory on which mounted */ > % char f_mntfromname[_VFS_MNAMELEN]; /* mounted file system */ > % > % }; > > As I said before, NetBSD's nfs tries to make this work for nfs, but I > couldn't this worked in NetBSD or anything I could think of, since the > extension is not in the nfs protocol. Now I think it does work, but > still can't see how. Details: NetBSD puts f_bavail on the wire without > clamping it (it just scales it). Now I think f_bavail is never > negative > in NetBSD, so this scaling doesn't involves the usual sign extension > and overflow bugs, or abuse of the top bit. The client zaps negative > values for v3 f_bavail but not for other things, and initializes > f_bresvd: > from a 2005 version ofs nfs_vfsops.c: > > % if (v3) { > % sbp->f_frsize = sbp->f_bsize = NFS_FABLKSIZE; > % tquad = fxdr_hyper(&sfp->sf_tbytes); > % sbp->f_blocks = ((quad_t)tquad / (quad_t)NFS_FABLKSIZE); > % tquad = fxdr_hyper(&sfp->sf_fbytes); > % sbp->f_bfree = ((quad_t)tquad / (quad_t)NFS_FABLKSIZE); > % tquad = fxdr_hyper(&sfp->sf_abytes); > % tquad = ((quad_t)tquad / (quad_t)NFS_FABLKSIZE); > % sbp->f_bresvd = sbp->f_bfree - tquad; > > I still can't see how this initialization works. f_bresvd has to end > up as nonzero if root has a reserve, and drop to zero as the reserve > is used up. sf_fbytes - sf_abytes must give this reserve. > > % sbp->f_bavail = tquad; > % #ifdef COMPAT_20 > % /* Handle older NFS servers returning negative values */ > % if ((quad_t)sbp->f_bavail < 0) > % sbp->f_bavail = 0; > % #endif > > NetBSD's own server puts f_bavail on the wire unchanged except for > scaling, > so it is now clear that f_bavail is never negative in NetBSD. > > % tquad = fxdr_hyper(&sfp->sf_tfiles); > % sbp->f_files = tquad; > % tquad = fxdr_hyper(&sfp->sf_ffiles); > % sbp->f_ffree = tquad; > % sbp->f_favail = tquad; > > "Negative" values for this are not zapped. > > % sbp->f_fresvd = 0; > > This reserv is not really supported. Supporting it is impossible since > there is not as much redundancy in the wire values for the file counts > as for the block counts. > > % sbp->f_namemax = MAXNAMLEN; > % } else { > % sbp->f_bsize = NFS_FABLKSIZE; > % sbp->f_frsize = fxdr_unsigned(int32_t, sfp->sf_bsize); > % sbp->f_blocks = fxdr_unsigned(int32_t, sfp->sf_blocks); > % sbp->f_bfree = fxdr_unsigned(int32_t, sfp->sf_bfree); > % sbp->f_bavail = fxdr_unsigned(int32_t, sfp->sf_bavail); > > Still has old bugs. > > % sbp->f_fresvd = 0; > % sbp->f_files = 0; > % sbp->f_ffree = 0; > % sbp->f_favail = 0; > % sbp->f_fresvd = 0; > % sbp->f_namemax = MAXNAMLEN; > % } > > Next steps: someone should look at why there are 3 nfsv3 protocol > fields for the block counts when only 2 are strictly needed. > > Bruce Here is the RFCs definition of the 3 fields: tbytes The total size, in bytes, of the file system. fbytes The amount of free space, in bytes, in the file system. abytes The amount of free space, in bytes, available to the user identified by the authentication information in the RPC. (This reflects space that is reserved by the file system; it does not reflect any quota system implemented by the server.) I suspect that most systems running FFS (mis)use abytes to represent the non-root value, even when "root" does the RPC. If they didn't do that, then abytes would be different when root did statfs and that would be confusing to a typical client. Since you don't know if the server's file system is one like FFS that has a "minfree" (and you don't know what "minfree" is), you can't reliably calculate a negative f_bavail from the above, from what I can see. rick From owner-freebsd-fs@FreeBSD.ORG Wed May 4 02:51:21 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 39A8F106566B; Wed, 4 May 2011 02:51:21 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail06.syd.optusnet.com.au (mail06.syd.optusnet.com.au [211.29.132.187]) by mx1.freebsd.org (Postfix) with ESMTP id B4F898FC18; Wed, 4 May 2011 02:51:20 +0000 (UTC) Received: from c122-106-155-58.carlnfd1.nsw.optusnet.com.au (c122-106-155-58.carlnfd1.nsw.optusnet.com.au [122.106.155.58]) by mail06.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p442pGqb017742 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 4 May 2011 12:51:17 +1000 Date: Wed, 4 May 2011 12:51:16 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Rick Macklem In-Reply-To: <308871799.968962.1304468872744.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: <20110504120552.P956@besplex.bde.org> References: <308871799.968962.1304468872744.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: rmacklem@freebsd.org, fs@freebsd.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 May 2011 02:51:21 -0000 On Tue, 3 May 2011, Rick Macklem wrote: [attributions lost] >> ... >> As I said before, NetBSD's nfs tries to make this work for nfs, but I >> couldn't this worked in NetBSD or anything I could think of, since the >> extension is not in the nfs protocol. Now I think it does work, but >> still can't see how. Details: NetBSD puts f_bavail on the wire without Nah, it cannot work. >> clamping it (it just scales it). Now I think f_bavail is never >> negative >> in NetBSD, so this scaling doesn't involves the usual sign extension >> and overflow bugs, or abuse of the top bit. The client zaps negative >> values for v3 f_bavail but not for other things, and initializes >> f_bresvd: >> from a 2005 version ofs nfs_vfsops.c: >> >> % if (v3) { >> % sbp->f_frsize = sbp->f_bsize = NFS_FABLKSIZE; >> % tquad = fxdr_hyper(&sfp->sf_tbytes); >> % sbp->f_blocks = ((quad_t)tquad / (quad_t)NFS_FABLKSIZE); >> % tquad = fxdr_hyper(&sfp->sf_fbytes); >> % sbp->f_bfree = ((quad_t)tquad / (quad_t)NFS_FABLKSIZE); >> % tquad = fxdr_hyper(&sfp->sf_abytes); >> % tquad = ((quad_t)tquad / (quad_t)NFS_FABLKSIZE); >> % sbp->f_bresvd = sbp->f_bfree - tquad; Hmm, the tabs are more mangled than usual. >> I still can't see how this initialization works. f_bresvd has to end >> up as nonzero if root has a reserve, and drop to zero as the reserve >> is used up. sf_fbytes - sf_abytes must give this reserve. >> >> % sbp->f_bavail = tquad; >> % #ifdef COMPAT_20 >> % /* Handle older NFS servers returning negative values */ >> % if ((quad_t)sbp->f_bavail < 0) >> % sbp->f_bavail = 0; >> % #endif >> >> NetBSD's own server puts f_bavail on the wire unchanged except for >> scaling, >> so it is now clear that f_bavail is never negative in NetBSD. >> >> % tquad = fxdr_hyper(&sfp->sf_tfiles); >> % sbp->f_files = tquad; >> % tquad = fxdr_hyper(&sfp->sf_ffiles); >> % sbp->f_ffree = tquad; >> % sbp->f_favail = tquad; >> >> "Negative" values for this are not zapped. >> >> % sbp->f_fresvd = 0; >> >> This reserv is not really supported. Supporting it is impossible since >> there is not as much redundancy in the wire values for the file counts >> as for the block counts. >> >> % sbp->f_namemax = MAXNAMLEN; >> % } else { >> % sbp->f_bsize = NFS_FABLKSIZE; >> % sbp->f_frsize = fxdr_unsigned(int32_t, sfp->sf_bsize); >> % sbp->f_blocks = fxdr_unsigned(int32_t, sfp->sf_blocks); >> % sbp->f_bfree = fxdr_unsigned(int32_t, sfp->sf_bfree); >> % sbp->f_bavail = fxdr_unsigned(int32_t, sfp->sf_bavail); >> >> Still has old bugs. >> >> % sbp->f_fresvd = 0; >> % sbp->f_files = 0; >> % sbp->f_ffree = 0; >> % sbp->f_favail = 0; >> % sbp->f_fresvd = 0; >> % sbp->f_namemax = MAXNAMLEN; >> % } >> >> Next steps: someone should look at why there are 3 nfsv3 protocol >> fields for the block counts when only 2 are strictly needed. > Here is the RFCs definition of the 3 fields: > tbytes > The total size, in bytes, of the file system. > > fbytes > The amount of free space, in bytes, in the file > system. > > abytes > The amount of free space, in bytes, available to the > user identified by the authentication information in > the RPC. (This reflects space that is reserved by the > file system; it does not reflect any quota system > implemented by the server.) So nfs does support a specially restricted amount of free space available to a mere user, but it doesn't support this amount being negative. BSD uses a negative amount for this to indicate how far away fom having any space to use the user is. > I suspect that most systems running FFS (mis)use abytes to represent > the non-root value, even when "root" does the RPC. If they didn't > do that, then abytes would be different when root did statfs and that > would be confusing to a typical client. > > Since you don't know if the server's file system is one like FFS that > has a "minfree" (and you don't know what "minfree" is), you can't > reliably calculate a negative f_bavail from the above, from what I > can see. Yes, it's very annoying that only 3 numbers are available, and 3 numbers are supplied, and the number corresponding to minfree can be recovered from the 3 numbers supplied, but only when abytes > 0. (fbytes - abytes) gives the amount of free space _not_ available to the user and therefore the amount if free space reserved. Under the condition abytes > 0, for file systems like ffs, none of the original reservation (according to minfree) is used, so (fbytes - abytes) also gives the size of the original reservation. But when the reservation starts being used, abytes is clamped to 0 on broken systems, so the linear relations between the 3 numbers and the alternative more useful 3 numbers (tbytes, fbytes, origreservedbytes) are broken, so there is no way to recover the original reservation, or equivalently, the amount of the reservation that is used. NetBSD uses: tquad = fxdr_hyper(&sfp->sf_abytes); tquad = ((quad_t)tquad / (quad_t)NFS_FABLKSIZE); sbp->f_bresvd = sbp->f_bfree - tquad; This is (fbytes - abytes) in blocks, so it only works when abytes > 0. Repeating part of the above: > I suspect that most systems running FFS (mis)use abytes to represent > the non-root value, even when "root" does the RPC. If they didn't > do that, then abytes would be different when root did statfs and that > would be confusing to a typical client. Negative abytes is even more useful for root (or for any user privileged enough to use the reserve). It tells how much of the reserve is used. Users that can eat the reserve should try not to, and when they have they should try to release space to get back to the full reserve. Without negative abytes, there is no API in statfs(2) to tell how much has been eaten. NetBSD's f_bresvd in stavfs(2) might be able to tell, but it is unclear if it is supposed to give the original reserve or the current reserve, and it is already hard enough to decode the 3 numbers into 3 useful ones. Bruce From owner-freebsd-fs@FreeBSD.ORG Wed May 4 03:58:18 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F045E1065670; Wed, 4 May 2011 03:58:18 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id C83598FC19; Wed, 4 May 2011 03:58:18 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p443wIxI001631; Wed, 4 May 2011 03:58:18 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p443wIhD001627; Wed, 4 May 2011 03:58:18 GMT (envelope-from linimon) Date: Wed, 4 May 2011 03:58:18 GMT Message-Id: <201105040358.p443wIhD001627@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/156797: [zfs] [panic] Double panic with FreeBSD 9-CURRENT and ZFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 May 2011 03:58:19 -0000 Old Synopsis: Double panic with FreeBSD 9-CURRENT and ZFS New Synopsis: [zfs] [panic] Double panic with FreeBSD 9-CURRENT and ZFS Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Wed May 4 03:58:07 UTC 2011 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=156797 From owner-freebsd-fs@FreeBSD.ORG Wed May 4 08:19:34 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7D2301065670 for ; Wed, 4 May 2011 08:19:34 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 132D28FC08 for ; Wed, 4 May 2011 08:19:33 +0000 (UTC) Received: from deviant.kiev.zoral.com.ua (root@deviant.kiev.zoral.com.ua [10.1.1.148]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id p448IJ2t070852 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 4 May 2011 11:18:19 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4) with ESMTP id p448IJRA087944; Wed, 4 May 2011 11:18:19 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.4/8.14.4/Submit) id p448IJSP087943; Wed, 4 May 2011 11:18:19 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 4 May 2011 11:18:19 +0300 From: Kostik Belousov To: Rick Macklem Message-ID: <20110504081819.GM48734@deviant.kiev.zoral.com.ua> References: <20110503174200.V1050@besplex.bde.org> <2143699515.968680.1304468141505.JavaMail.root@erie.cs.uoguelph.ca> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ttU9AGiyjxxpUEJf" Content-Disposition: inline In-Reply-To: <2143699515.968680.1304468141505.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.4 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DNS_FROM_OPENWHOIS autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: fs@freebsd.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 May 2011 08:19:34 -0000 --ttU9AGiyjxxpUEJf Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, May 03, 2011 at 08:15:41PM -0400, Rick Macklem wrote: > > On Mon, 2 May 2011, Rick Macklem wrote: > >=20 > > > I have attached a version of the patch that I intend to commit > > > unless it doesn't work for Kostik's test case. Kostik, could > > > you please test this one. > > > > > > Yes, Bruce, I realize you won't like it, but I > > > have put some comments in it > > > to try and clarify why it is coded the way it is. > > > (The arithmetic seems to work the way I would expect it to for > > > i386, which is the only arch I have for testing.) > >=20 > > Sigh. > >=20 > > % --- fs/nfsclient/nfs_clport.c.sav 2011-04-30 20:16:39.000000000 > > -0400 > > % +++ fs/nfsclient/nfs_clport.c 2011-05-02 19:32:31.000000000 -0400 > > % @@ -838,21 +838,33 @@ void > > % nfscl_loadsbinfo(struct nfsmount *nmp, struct nfsstatfs *sfp, void > > *statfs) > > % { > > % struct statfs *sbp =3D (struct statfs *)statfs; > > % - nfsquad_t tquad; > > % > > % if (nmp->nm_flag & (NFSMNT_NFSV3 | NFSMNT_NFSV4)) { > > % sbp->f_bsize =3D NFS_FABLKSIZE; > > % - tquad.qval =3D sfp->sf_tbytes; > > % - sbp->f_blocks =3D (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE)); > > % - tquad.qval =3D sfp->sf_fbytes; > > % - sbp->f_bfree =3D (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE)); > > % - tquad.qval =3D sfp->sf_abytes; > > % - sbp->f_bavail =3D (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE)); > > % - tquad.qval =3D sfp->sf_tfiles; > > % - sbp->f_files =3D (tquad.lval[0] & 0x7fffffff); > > % - tquad.qval =3D sfp->sf_ffiles; > > % - sbp->f_ffree =3D (tquad.lval[0] & 0x7fffffff); > > % + sbp->f_blocks =3D sfp->sf_tbytes / NFS_FABLKSIZE; > > % + sbp->f_bfree =3D sfp->sf_fbytes / NFS_FABLKSIZE; > > % + /* > > % + * Although sf_abytes is uint64_t and f_bavail is int64_t, > > % + * the value after dividing by NFS_FABLKSIZE is small > > % + * enough that it will fit in 63bits, so it is ok to > > % + * assign it to f_bavail without fear that it will become > > % + * negative. > > % + */ > > % + sbp->f_bavail =3D sfp->sf_abytes / NFS_FABLKSIZE; > > % + sbp->f_files =3D sfp->sf_tfiles; > > % + /* Since f_ffree is int64_t, clip it to 63bits. */ > > % + if (sfp->sf_ffiles > (uint64_t)INT64_MAX) > >=20 > > This cast has no effect. INT64_MAX has type int64_t. sf_ffiles has > > uint64_t. The default binary promotions cause both types to be > > promoted > > to the minimally larger common type. This type is uint64_t. Thus > > INT64_MAX is converted automatically to the correct type. > >=20 > Yea, I didn't tthink the cast mattered and it didn't affect the outcome > for my little userland test program, so I'll take it out. (I was trying > the "play it safe", but if you say it doesn't matter, I believe you.) >=20 > > % + sbp->f_ffree =3D INT64_MAX; > > % + else > > % + sbp->f_ffree =3D sfp->sf_ffiles; > > % } else if ((nmp->nm_flag & NFSMNT_NFSV4) =3D=3D 0) { > > % + /* > > % + * The type casts to (int32_t) ensure that this code is > > % + * compatible with the old NFS client, in that it will > > % + * sign extend a value with bit31 set. This may or may > > % + * not be correct for NFSv2, but since it is a legacy > > % + * environment, I'd rather retain backwards compatibility. > > % + */ > > % sbp->f_bsize =3D (int32_t)sfp->sf_bsize; > > % sbp->f_blocks =3D (int32_t)sfp->sf_blocks; > > % sbp->f_bfree =3D (int32_t)sfp->sf_bfree; > >=20 > > It won't sign extend, but will propagate bit31 as an unsigned bit. For > > example, sfp->sf_blocks =3D 0x80000000 becomes sbp->f_blocks =3D > > 0xFFFFFFFF80000000, which is massively different. Again, omitting the > > cast gives the correct result if the wire insists on its values being > > unsigned. > >=20 > Ok, I'll change the comment. >=20 > > The result is only backwards compatible with relatively recent FreeBSD > > nfs clients. All FreeBSD clients are completely broken if bit31 is > > set, and compatibility with this brokenness is not useful (but as I > > pointed out in another reply, we would never have seen the broken case > > when the old clients weren't old, since it takes a server file system > > size of about 32TB for bit 31 to be set). > Well, the last legitimate use of the FreeBSD NFSv2 client was a diskless > root fs stored on a non-FreeBSD NFS server (because pxeboot didn't know t= he > correct file handle size). Since this is now fixed, there really isn't > any use for the NFSv2 client, as far as I know. Given that and the fact > that no one is complaining about it being broken, I feel it should just > be left alone. (Or remain "bug compatible" with the regular NFS client, > if you prefer.) >=20 > I'm afraid I have other things to work on and just don't see changing > NFSv2 (a 1985 protocol superceded by NFSv3 in 1994) a priority, rick. Rick, so any final version of the final patch to (re-)test ? --ttU9AGiyjxxpUEJf Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk3BC8oACgkQC3+MBN1Mb4jluQCfdYUpvlQVu1lY+zV/KsWyr97Q QCcAnisrVqXE3UdiXE8KiGKoigmYk4zR =ld0I -----END PGP SIGNATURE----- --ttU9AGiyjxxpUEJf-- From owner-freebsd-fs@FreeBSD.ORG Wed May 4 11:29:39 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DA74C106566B for ; Wed, 4 May 2011 11:29:39 +0000 (UTC) (envelope-from rs@bytecamp.net) Received: from mail.bytecamp.net (mail.bytecamp.net [212.204.60.9]) by mx1.freebsd.org (Postfix) with ESMTP id 6CB7C8FC13 for ; Wed, 4 May 2011 11:29:39 +0000 (UTC) Received: (qmail 26889 invoked by uid 89); 4 May 2011 13:02:57 +0200 Received: from stella.bytecamp.net (HELO ?212.204.60.37?) (rs%bytecamp.net@212.204.60.37) by mail.bytecamp.net with CAMELLIA256-SHA encrypted SMTP; 4 May 2011 13:02:57 +0200 Message-ID: <4DC13260.4020905@bytecamp.net> Date: Wed, 04 May 2011 13:02:56 +0200 From: Robert Schulze Organization: bytecamp GmbH User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.14) Gecko/20110223 Thunderbird/3.1.8 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Subject: zfs/zpool upgrade required? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 May 2011 11:29:39 -0000 Hello, when upgrading from 8.0 to 8-STABLE, kernel and userland support new versions of ZFS pool and filesystem. Is it _required_ to upgrade existing pools and filesystems or can that be done anytime later? with kind regards, Robert Schulze From owner-freebsd-fs@FreeBSD.ORG Wed May 4 11:55:43 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0612B106564A for ; Wed, 4 May 2011 11:55:43 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta13.emeryville.ca.mail.comcast.net (qmta13.emeryville.ca.mail.comcast.net [76.96.27.243]) by mx1.freebsd.org (Postfix) with ESMTP id E1FE28FC14 for ; Wed, 4 May 2011 11:55:42 +0000 (UTC) Received: from omta18.emeryville.ca.mail.comcast.net ([76.96.30.74]) by qmta13.emeryville.ca.mail.comcast.net with comcast id fPqs1g0021bwxycADPviun; Wed, 04 May 2011 11:55:42 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta18.emeryville.ca.mail.comcast.net with comcast id fPvg1g00f1t3BNj8ePvhTd; Wed, 04 May 2011 11:55:41 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 82ACE102C36; Wed, 4 May 2011 04:55:40 -0700 (PDT) Date: Wed, 4 May 2011 04:55:40 -0700 From: Jeremy Chadwick To: Robert Schulze Message-ID: <20110504115540.GA88625@icarus.home.lan> References: <4DC13260.4020905@bytecamp.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4DC13260.4020905@bytecamp.net> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: zfs/zpool upgrade required? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 May 2011 11:55:43 -0000 On Wed, May 04, 2011 at 01:02:56PM +0200, Robert Schulze wrote: > when upgrading from 8.0 to 8-STABLE, kernel and userland support new > versions of ZFS pool and filesystem. > > Is it _required_ to upgrade existing pools and filesystems or can > that be done anytime later? - It can be done later, though by not upgrading you lose the ability to use newer features. For a list of what those are, refer to the official OpenSolaris docs. See menu on left side, near bottom: http://hub.opensolaris.org/bin/view/Community+Group+zfs/whatis - Make sure to note that the pool version and the filesystem version are separate. Some folks remember to "zpool upgrade" but not "zfs upgrade". - Remember that upgrading is one-way; you cannot roll back to an older version without destroying your pools. If you're worried, do full backups beforehand. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Wed May 4 15:21:38 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D850A1065672 for ; Wed, 4 May 2011 15:21:38 +0000 (UTC) (envelope-from kraduk@gmail.com) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 64DB48FC14 for ; Wed, 4 May 2011 15:21:38 +0000 (UTC) Received: by wyf23 with SMTP id 23so1215889wyf.13 for ; Wed, 04 May 2011 08:21:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=XXhPHbPM5VI/aSEILoxAcJSjZAgPF3c4pY9A9/0bg2E=; b=mBGBSfyyyYRkBFql4T9Jb2rQrFPk0JyZkr57ndJVJd6QB+yQ7EjRKOcmwHak3RsjMc KkFR0AT2E8aUlny2LbTFpPsLR8qz+W0G4K71axQtlZUThcv3EzDByeITlFYN4BhQ9JvC c0hAETp7OzEhckzxq3VhJJp+IvovJ5Xi0kTVU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=crHlczAPf64FvdHpJh63zXI6y51wPe26XTtMF+aaFy0M0RmqUGhlKXtEFKszRmTFrL YDX9YOiULUp+hUV+WdGPGToql0fs6j7xccZaDfnwSFsUa6LCXxj2J+myB6BHGkELMi9F FW2BRvCkEl9+GKSP2p26X49EKNAXyavfbfexo= MIME-Version: 1.0 Received: by 10.216.143.74 with SMTP id k52mr1250647wej.0.1304522497179; Wed, 04 May 2011 08:21:37 -0700 (PDT) Received: by 10.216.15.73 with HTTP; Wed, 4 May 2011 08:21:37 -0700 (PDT) In-Reply-To: <20110504115540.GA88625@icarus.home.lan> References: <4DC13260.4020905@bytecamp.net> <20110504115540.GA88625@icarus.home.lan> Date: Wed, 4 May 2011 16:21:37 +0100 Message-ID: From: krad To: Jeremy Chadwick Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: zfs/zpool upgrade required? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 May 2011 15:21:38 -0000 On 4 May 2011 12:55, Jeremy Chadwick wrote: > On Wed, May 04, 2011 at 01:02:56PM +0200, Robert Schulze wrote: > > when upgrading from 8.0 to 8-STABLE, kernel and userland support new > > versions of ZFS pool and filesystem. > > > > Is it _required_ to upgrade existing pools and filesystems or can > > that be done anytime later? > > - It can be done later, though by not upgrading you lose the ability to > use newer features. For a list of what those are, refer to the > official OpenSolaris docs. See menu on left side, near bottom: > > http://hub.opensolaris.org/bin/view/Community+Group+zfs/whatis > > - Make sure to note that the pool version and the filesystem version are > separate. Some folks remember to "zpool upgrade" but not "zfs upgrade". > > - Remember that upgrading is one-way; you cannot roll back to an older > version without destroying your pools. If you're worried, do full > backups beforehand. > > -- > | Jeremy Chadwick jdc@parodius.com | > | Parodius Networking http://www.parodius.com/ | > | UNIX Systems Administrator Mountain View, CA, USA | > | Making life hard for others since 1977. PGP 4BD6C0CB | > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > Generally in production i would leave it on the old pool version for a while, until you are confident you are not having any issues. As previously stated you can then roll back more easily. When you are happy upgrade the pool. From owner-freebsd-fs@FreeBSD.ORG Wed May 4 21:53:23 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 22E36106567A for ; Wed, 4 May 2011 21:53:23 +0000 (UTC) (envelope-from mad@madpilot.net) Received: from megatron.madpilot.net (megatron.madpilot.net [88.149.173.206]) by mx1.freebsd.org (Postfix) with ESMTP id B39C68FC20 for ; Wed, 4 May 2011 21:53:22 +0000 (UTC) Received: from megatron.madpilot.net (localhost [127.0.0.1]) by megatron.madpilot.net (Postfix) with ESMTP id D468D1E73 for ; Wed, 4 May 2011 23:53:21 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=madpilot.net; h= content-transfer-encoding:content-type:content-type:in-reply-to :references:subject:subject:mime-version:user-agent:from:from :date:date:message-id:received:received; s=mail; t=1304545999; x=1306360399; bh=yVNqxY5A4TpzlaJQWHCs0kK/5P/EyTV+uYWV/XcbK3I=; b= IEC80IaegZVyw6xBLoGcwyUPfSs/Nu1s8pMHJNeN9H7LY5fsstCO5I25E3PdVCwf pHYEJx9x8aPR4GcoWeBAeuLEr5gUw+6E45y2zqREeru17n0niS8AANRpoMezNuie G9s/7MxBsClXJRgdMxhu2yV/Bc5uaSDJWxDYoIGySDo= X-Virus-Scanned: amavisd-new at madpilot.net Received: from megatron.madpilot.net ([127.0.0.1]) by megatron.madpilot.net (megatron.madpilot.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uoRNHTnq+exG for ; Wed, 4 May 2011 23:53:19 +0200 (CEST) Received: from marvin.madpilot.net (localhost [127.0.0.1]) by megatron.madpilot.net (Postfix) with ESMTP for ; Wed, 4 May 2011 23:53:19 +0200 (CEST) Message-ID: <4DC1CACF.8050506@madpilot.net> Date: Wed, 04 May 2011 23:53:19 +0200 From: Guido Falsi User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.17) Gecko/20110429 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <4DC13260.4020905@bytecamp.net> <20110504115540.GA88625@icarus.home.lan> In-Reply-To: <20110504115540.GA88625@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: zfs/zpool upgrade required? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 May 2011 21:53:23 -0000 On 05/04/11 13:55, Jeremy Chadwick wrote: > On Wed, May 04, 2011 at 01:02:56PM +0200, Robert Schulze wrote: >> when upgrading from 8.0 to 8-STABLE, kernel and userland support new >> versions of ZFS pool and filesystem. >> >> Is it _required_ to upgrade existing pools and filesystems or can >> that be done anytime later? > > - It can be done later, though by not upgrading you lose the ability to > use newer features. For a list of what those are, refer to the > official OpenSolaris docs. See menu on left side, near bottom: > > http://hub.opensolaris.org/bin/view/Community+Group+zfs/whatis > > - Make sure to note that the pool version and the filesystem version are > separate. Some folks remember to "zpool upgrade" but not "zfs upgrade". > > - Remember that upgrading is one-way; you cannot roll back to an older > version without destroying your pools. If you're worried, do full > backups beforehand. > I'd add, if he's booting off of zfs, that it is very important to upgrade the boot code as soon as he upgrades the pool or he will have trouble booting the system which could be a pain to recover at that point. -- Guido Falsi From owner-freebsd-fs@FreeBSD.ORG Wed May 4 22:22:55 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5DAC8106566C for ; Wed, 4 May 2011 22:22:55 +0000 (UTC) (envelope-from kraduk@gmail.com) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id DF48D8FC19 for ; Wed, 4 May 2011 22:22:54 +0000 (UTC) Received: by wyf23 with SMTP id 23so1577987wyf.13 for ; Wed, 04 May 2011 15:22:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=5U87xPT7kftVWpe1r83YYldlNG5wF/Q92ONSz0dtDrM=; b=QPQq4YV3VyOPckWNbFCipNmlmsTI5dFC5dQt2aPCKMj6nw3lbYu43EsGF3wQgdPGuQ lxxmqwwpqeFxWSrQGDMZCR8ECS/k1XgukHAuOgOqE3M9w4DyXoGHELcRtIYYSl8WPvPS x/2CniLSw5n7h26RHdiWvgDJyTCTL/EZUIWaw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=QzteNB05WhABHmX2R0nR3ELc1sgBtm48SfY6zNelFjlG5glbf+978UzFuhNm68Q9K2 ESTQCAz4ckXknGnuhe8hdga5Ew5kI1hUALvNhG7D3PlSU0VQNxHfbCwKVRGopbFXOVHA ZAx/bCMO+mV3z17vk9FEA9vBRCGoFiuRgQC2g= MIME-Version: 1.0 Received: by 10.216.143.96 with SMTP id k74mr5464758wej.100.1304547773684; Wed, 04 May 2011 15:22:53 -0700 (PDT) Received: by 10.216.15.73 with HTTP; Wed, 4 May 2011 15:22:53 -0700 (PDT) In-Reply-To: <4DC1CACF.8050506@madpilot.net> References: <4DC13260.4020905@bytecamp.net> <20110504115540.GA88625@icarus.home.lan> <4DC1CACF.8050506@madpilot.net> Date: Wed, 4 May 2011 23:22:53 +0100 Message-ID: From: krad To: Guido Falsi Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: zfs/zpool upgrade required? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 May 2011 22:22:55 -0000 On 4 May 2011 22:53, Guido Falsi wrote: > On 05/04/11 13:55, Jeremy Chadwick wrote: > >> On Wed, May 04, 2011 at 01:02:56PM +0200, Robert Schulze wrote: >> >>> when upgrading from 8.0 to 8-STABLE, kernel and userland support new >>> versions of ZFS pool and filesystem. >>> >>> Is it _required_ to upgrade existing pools and filesystems or can >>> that be done anytime later? >>> >> >> - It can be done later, though by not upgrading you lose the ability to >> use newer features. For a list of what those are, refer to the >> official OpenSolaris docs. See menu on left side, near bottom: >> >> http://hub.opensolaris.org/bin/view/Community+Group+zfs/whatis >> >> - Make sure to note that the pool version and the filesystem version are >> separate. Some folks remember to "zpool upgrade" but not "zfs upgrade". >> >> - Remember that upgrading is one-way; you cannot roll back to an older >> version without destroying your pools. If you're worried, do full >> backups beforehand. >> >> > I'd add, if he's booting off of zfs, that it is very important to upgrade > the boot code as soon as he upgrades the pool or he will have trouble > booting the system which could be a pain to recover at that point. > > -- > Guido Falsi > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > yep, probably worth using these for the time being as they are the most uptodate boot code bits http://people.freebsd.org/~pjd/zfsboot/ From owner-freebsd-fs@FreeBSD.ORG Thu May 5 01:30:18 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C8F0A106566C for ; Thu, 5 May 2011 01:30:18 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 7902B8FC17 for ; Thu, 5 May 2011 01:30:18 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEAIH8wU2DaFvO/2dsb2JhbACEUKJGtlORIoEqhF0EjzWOVg X-IronPort-AV: E=Sophos;i="4.64,317,1301889600"; d="scan'208";a="120905583" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 04 May 2011 21:24:51 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id AE91AB3F24; Wed, 4 May 2011 21:24:51 -0400 (EDT) Date: Wed, 4 May 2011 21:24:51 -0400 (EDT) From: Rick Macklem To: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= Message-ID: <277230554.1031144.1304558691708.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <86iptvg9uo.fsf@ds4.des.no> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org Subject: Re: RFC: make the experimental NFS subsystem the default one X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 May 2011 01:30:18 -0000 > Rick Macklem writes: > > "Dag-Erling Sm=C3=B8rgrav" writes: > > > interface oldnfs.1 already present in the KLD 'kernel'! > > > /etc/rc: WARNING: Unable to load kernel module nfsclient > > Ok, I'll need to look at this. At a glance, I see a load_kld, > > but that won't get upset if it's already loaded. (It does need > > to be fixed, though, since it refers to nfsclient as the module > > for "nfs" instead of nfscl.) >=20 > This comes from mountcritremote: >=20 > case "`mount -d -a -t nfs 2> /dev/null`" in > *mount_nfs*) > # Handle absent nfs client support > load_kld -m nfs nfsclient || return 1 > ;; > esac >=20 > mount(8) will print "mount_oldnfs" instead of "mount_nfs". Note that > until you flipped the switch, the exact same error would occur, in > reverse, on systems running the new stack. >=20 Testing here, it seems that none of the NFS specific stuff is needed in mountcritremote (as hinted by the comment). You can try the version without the NFS specific stuff if you'd like. It's in: http://people.freebsd.org/~rmacklem/rc.conf along with the other modified/added scripts. rick From owner-freebsd-fs@FreeBSD.ORG Thu May 5 06:41:30 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A33A7106564A for ; Thu, 5 May 2011 06:41:30 +0000 (UTC) (envelope-from john@theusgroup.com) Received: from theusgroup.com (theusgroup.com [64.122.243.222]) by mx1.freebsd.org (Postfix) with ESMTP id 8F93F8FC16 for ; Thu, 5 May 2011 06:41:30 +0000 (UTC) To: freebsd-fs@freebsd.org Date: Wed, 04 May 2011 23:22:45 -0700 From: John Message-Id: <20110505062246.60B561C4@server.theusgroup.com> Subject: zfs v28 destory -r snapshot failure X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: john@TheUsGroup.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 May 2011 06:41:30 -0000 Applied this patch set stable-8-zfsv28-20110501.patch.xz to a fresh download of 8.2-release, buildworld, buildkernel, install and rebooted. Did not upgrade pool or filesystems. Made a snapshot of tank/foo@today, then tried to delete with zfs destroy -r tank@today yielded: cannot destroy 'tank@today': dataset does not exist no snapshots destroyed If tank@today exists along with tank/foo@today, then the destroy works correctly. Rebooted with kernel.old which is 8.2-release without the v28 patch and zfs destroy -r tank@today deleted tank/foo@today without an error. John Theus From owner-freebsd-fs@FreeBSD.ORG Thu May 5 07:32:13 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6C282106564A for ; Thu, 5 May 2011 07:32:13 +0000 (UTC) (envelope-from gpm@hotplug.ru) Received: from gate.pikinvest.ru (gate.pikinvest.ru [87.245.155.170]) by mx1.freebsd.org (Postfix) with ESMTP id DB1A98FC1C for ; Thu, 5 May 2011 07:32:12 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mailgate.pik.ru (Postfix) with ESMTP id BE9521C08A3 for ; Thu, 5 May 2011 11:16:48 +0400 (MSD) Received: from EX03PIK.PICompany.ru (unknown [192.168.156.51]) by mailgate.pik.ru (Postfix) with ESMTP id BA1E61C08A1 for ; Thu, 5 May 2011 11:16:48 +0400 (MSD) Received: from [192.168.148.9] ([192.168.148.9]) by EX03PIK.PICompany.ru with Microsoft SMTPSVC(6.0.3790.4675); Thu, 5 May 2011 11:16:35 +0400 Message-ID: <4DC24ED3.4040703@hotplug.ru> Date: Thu, 05 May 2011 11:16:35 +0400 From: Emil Muratov User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.14) Gecko/20110223 Thunderbird/3.1.8 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <20110505062246.60B561C4@server.theusgroup.com> In-Reply-To: <20110505062246.60B561C4@server.theusgroup.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 05 May 2011 07:16:35.0268 (UTC) FILETIME=[5E067040:01CC0AF4] Subject: Re: zfs v28 destory -r snapshot failure X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 May 2011 07:32:13 -0000 > Applied this patch set stable-8-zfsv28-20110501.patch.xz to a fresh download > of 8.2-release, buildworld, buildkernel, install and rebooted. Did not upgrade > pool or filesystems. > > Made a snapshot of tank/foo@today, then tried to delete with > zfs destroy -r tank@today yielded: > cannot destroy 'tank@today': dataset does not exist > no snapshots destroyed > > If tank@today exists along with tank/foo@today, then the destroy works > correctly. > > Rebooted with kernel.old which is 8.2-release without the v28 patch and > zfs destroy -r tank@today deleted tank/foo@today without an error. Same here. zfSnap utility no longer purges old snaps since upgrading to v28. Manual testing discovered that zfs destroy -r no longer works as expected for snapshots, for datasets ok. From owner-freebsd-fs@FreeBSD.ORG Thu May 5 08:19:52 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 816471065672 for ; Thu, 5 May 2011 08:19:52 +0000 (UTC) (envelope-from rs@bytecamp.net) Received: from mail.bytecamp.net (mail.bytecamp.net [212.204.60.9]) by mx1.freebsd.org (Postfix) with ESMTP id 0F05D8FC1A for ; Thu, 5 May 2011 08:19:51 +0000 (UTC) Received: (qmail 39916 invoked by uid 89); 5 May 2011 10:19:50 +0200 Received: from stella.bytecamp.net (HELO ?212.204.60.37?) (rs%bytecamp.net@212.204.60.37) by mail.bytecamp.net with CAMELLIA256-SHA encrypted SMTP; 5 May 2011 10:19:50 +0200 Message-ID: <4DC25DA6.3060009@bytecamp.net> Date: Thu, 05 May 2011 10:19:50 +0200 From: Robert Schulze Organization: bytecamp GmbH User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.14) Gecko/20110223 Thunderbird/3.1.8 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Subject: zfs l2arc issue X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 May 2011 08:19:52 -0000 Hi, we are running an NFS server with the following pool setup: home ONLINE 0 0 0 raidz2 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da3 ONLINE 0 0 0 da4 ONLINE 0 0 0 da5 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 da6 ONLINE 0 0 0 da7 ONLINE 0 0 0 da8 ONLINE 0 0 0 da9 ONLINE 0 0 0 da10 ONLINE 0 0 0 logs ONLINE 0 0 0 mirror ONLINE 0 0 0 da12 ONLINE 0 0 0 da13 ONLINE 0 0 0 cache ad4 ONLINE 0 0 0 ad8 ONLINE 0 0 0 All drives except the caching SSDs are attached to a LSI 9690SA-8I. The system is equipped with 32 GB RAM, and runs with a load of <1, please note: we are running 8.0, yet, since there was one issue with ZFS which blocked the upgrade to 8-STABLE. After about 100d uptime, we had a sudden large increase in load of about 5-7, nfsd had 100-400% WCPU. Also an rsync downloading files from that machine was very slow. We didn't really narrow down the problem, we had to reboot the machine because performance was nearly completely absent. After reboot, system performance became normal. Could this problem be related to the caching SSDs beeing full? Cache consists of two 76 GB SSDs, after warming up, only 8 MB are free on each disk. Is ZFS supposed to fill arbitrary large caches? I think of doubling the cache and then ending up with fully filled SSDs again. For if, could l2arc be limited somehow, so that SSDs don't get written full? Could this behaviour also appear in 8-STABLE? With kind regards, Robert Schulze From owner-freebsd-fs@FreeBSD.ORG Thu May 5 10:13:45 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 07CC91065673 for ; Thu, 5 May 2011 10:13:45 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta04.westchester.pa.mail.comcast.net (qmta04.westchester.pa.mail.comcast.net [76.96.62.40]) by mx1.freebsd.org (Postfix) with ESMTP id A9C0F8FC13 for ; Thu, 5 May 2011 10:13:44 +0000 (UTC) Received: from omta17.westchester.pa.mail.comcast.net ([76.96.62.89]) by qmta04.westchester.pa.mail.comcast.net with comcast id fm3S1g0041vXlb854mDk5V; Thu, 05 May 2011 10:13:44 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta17.westchester.pa.mail.comcast.net with comcast id fmDj1g0091t3BNj3dmDjdG; Thu, 05 May 2011 10:13:44 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id E1054102C36; Thu, 5 May 2011 03:13:41 -0700 (PDT) Date: Thu, 5 May 2011 03:13:41 -0700 From: Jeremy Chadwick To: Robert Schulze Message-ID: <20110505101341.GA10618@icarus.home.lan> References: <4DC25DA6.3060009@bytecamp.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4DC25DA6.3060009@bytecamp.net> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: zfs l2arc issue X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 May 2011 10:13:45 -0000 On Thu, May 05, 2011 at 10:19:50AM +0200, Robert Schulze wrote: > we are running an NFS server with the following pool setup: > > home ONLINE 0 0 0 > raidz2 ONLINE 0 0 0 > da1 ONLINE 0 0 0 > da2 ONLINE 0 0 0 > da3 ONLINE 0 0 0 > da4 ONLINE 0 0 0 > da5 ONLINE 0 0 0 > raidz2 ONLINE 0 0 0 > da6 ONLINE 0 0 0 > da7 ONLINE 0 0 0 > da8 ONLINE 0 0 0 > da9 ONLINE 0 0 0 > da10 ONLINE 0 0 0 > logs ONLINE 0 0 0 > mirror ONLINE 0 0 0 > da12 ONLINE 0 0 0 > da13 ONLINE 0 0 0 > cache > ad4 ONLINE 0 0 0 > ad8 ONLINE 0 0 0 > > > All drives except the caching SSDs are attached to a LSI 9690SA-8I. > The system is equipped with 32 GB RAM, and runs with a load of <1, > please note: we are running 8.0, yet, since there was one issue with > ZFS which blocked the upgrade to 8-STABLE. > > After about 100d uptime, we had a sudden large increase in load of > about 5-7, nfsd had 100-400% WCPU. Also an rsync downloading files > from that machine was very slow. > > We didn't really narrow down the problem, we had to reboot the > machine because performance was nearly completely absent. After > reboot, system performance became normal. > > Could this problem be related to the caching SSDs beeing full? Cache > consists of two 76 GB SSDs, after warming up, only 8 MB are free on > each disk. > Is ZFS supposed to fill arbitrary large caches? I think of doubling > the cache and then ending up with fully filled SSDs again. For if, > could l2arc be limited somehow, so that SSDs don't get written full? > > Could this behaviour also appear in 8-STABLE? To readers: make sure you note this user is running either 8.0-RELEASE or 8.0-STABLE. ZFS during that time is very different and **many** pieces to its innards and tweaking/tuning pieces are different now. - It would help if we could match disk types (SSDs, etc.) to a device string. "camcontrol devlist -v" would be useful on this machine. - nfsd taking up 100-400% CPU (that has been addressed in a later release by the way; it will show 100% total for all 4 cores; I believe "top -C" changes the behaviour) doesn't tell us much. What was nfsd actually *doing* during that time? Could you "procstat -kk PID"? Did you try using "ktrace -i -t+ -p PID" to see what syscalls it was making? - Have you done any system tuning on this machine for ZFS? It's very important that you provide the following: - uname -a (you can hide/XXX-out the machine name). This will provide both the exact build date (which hopefully will match what time your kernel sources were synced), and whether or not the machine is i386 or amd64 - Contents of /etc/sysctl.conf - Contents of /boot/loader.conf - Contents of /etc/rc.conf (you can XXX out machine names, IPs, etc.) - Output from dmesg (after a fresh reboot is fine) - Output from "sysctl -a vfs.zfs" - Output from "sysctl -a kstat.zfs" - Output from "top" when the issue is occurring; interested mainly in the high-CPU-usage processes as well as all the system/memory statistics - Output from "zpool iostat -v 1" when the issue is occurring. I should warn you in advance: you're asking for assistance with something that's "fairly old", and as I stated in the "To readers" section, ZFS on 8.0 is very different than 8.2. There are all sorts of tunings/adjustments that are required there that are not on 8.2. I think most of us would like to know what single ZFS issue is keeping you from upgrading the machine to RELENG_8 / 8.2-STABLE. I think overall it might make the most sense to address or fix that problem for you and then have you try 8.2-STABLE to see if the above issue persists. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu May 5 10:39:37 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1EB571065673 for ; Thu, 5 May 2011 10:39:37 +0000 (UTC) (envelope-from rs@bytecamp.net) Received: from mail.bytecamp.net (mail.bytecamp.net [212.204.60.9]) by mx1.freebsd.org (Postfix) with ESMTP id 6797F8FC19 for ; Thu, 5 May 2011 10:39:35 +0000 (UTC) Received: (qmail 78909 invoked by uid 89); 5 May 2011 12:39:35 +0200 Received: from stella.bytecamp.net (HELO ?212.204.60.37?) (rs%bytecamp.net@212.204.60.37) by mail.bytecamp.net with CAMELLIA256-SHA encrypted SMTP; 5 May 2011 12:39:35 +0200 Message-ID: <4DC27E66.70904@bytecamp.net> Date: Thu, 05 May 2011 12:39:34 +0200 From: Robert Schulze Organization: bytecamp GmbH User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.17) Gecko/20110424 Thunderbird/3.1.10 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <4DC25DA6.3060009@bytecamp.net> <20110505101341.GA10618@icarus.home.lan> In-Reply-To: <20110505101341.GA10618@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: zfs l2arc issue X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 May 2011 10:39:37 -0000 Hi, Am 05.05.2011 12:13, schrieb Jeremy Chadwick: > I think most of us would like to know what single ZFS issue is keeping > you from upgrading the machine to RELENG_8 / 8.2-STABLE. I think > overall it might make the most sense to address or fix that problem for > you and then have you try 8.2-STABLE to see if the above issue persists. > there _was_ a problem causing the kernel to panic with highly nested filesystems (kern/154681 thread stack size too small), which was fixed by avg@ in mid march. A panic every three days is not tolerable in production use, so we waited with upgrading. Of course I know, that issues with old FreeBSD releases are not very gladly seen on this list, well, we will upgrade the machine in the upcoming days and hope for the best. *sigh* with kind regards, Robert Schulze From owner-freebsd-fs@FreeBSD.ORG Thu May 5 13:32:34 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3F4631065674; Thu, 5 May 2011 13:32:34 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.garage.freebsd.pl (60.wheelsystems.com [83.12.187.60]) by mx1.freebsd.org (Postfix) with ESMTP id D17588FC18; Thu, 5 May 2011 13:32:33 +0000 (UTC) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id 226D14569A; Thu, 5 May 2011 15:32:32 +0200 (CEST) Received: from localhost (public-gprs14895.centertel.pl [87.96.58.47]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 287CD45684; Thu, 5 May 2011 15:32:21 +0200 (CEST) Date: Thu, 5 May 2011 15:31:56 +0200 From: Pawel Jakub Dawidek To: Alexander Leidinger Message-ID: <20110505133156.GE14661@garage.freebsd.pl> References: <4DBBB20A.5050102@FreeBSD.org> <20110430072831.GA65598@icarus.home.lan> <20110501000656.00007ea1@unknown> <20110501133752.GC3245@garage.freebsd.pl> <20110503134826.712070yt2urhxp8g@webmail.leidinger.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="imjhCm/Pyz7Rq5F2" Content-Disposition: inline In-Reply-To: <20110503134826.712070yt2urhxp8g@webmail.leidinger.net> X-OS: FreeBSD 9.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-2.6 required=4.5 tests=BAYES_00 autolearn=ham version=3.0.4 Cc: freebsd-fs@FreeBSD.org, Alexander Motin Subject: Re: TRIM clustering X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 May 2011 13:32:34 -0000 --imjhCm/Pyz7Rq5F2 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, May 03, 2011 at 01:48:26PM +0200, Alexander Leidinger wrote: > Quoting Pawel Jakub Dawidek (from Sun, 1 May 2011 > 15:37:52 +0200): >=20 > >On Sun, May 01, 2011 at 12:06:56AM +0200, Alexander Leidinger wrote: > >>On Sat, 30 Apr 2011 00:28:31 -0700 Jeremy Chadwick > >> wrote: > >> > >>> On Sat, Apr 30, 2011 at 09:54:02AM +0300, Alexander Motin wrote: > >> > >>> Other notes: TRIM needs to be supported on swap as well, and in my > >>> opinion this is just as important as it being in UFS. I'm not sure > >>> how one would implement that. > >> > >>This brings up the question if a ZFS cache (where the contents do not > >>survive a reboot) is completely TRIMmed before used (and normally > >>trimmed during use)... > > > >It is not trimmed at all. >=20 > This does not sound like the optimal solution... is there a way to > know the first access after boot/attach to a cache device? If yes, > would it be possible to TRIM the complete provider (except for some > static data which needs to be there) from this place? This would not > solve the not TRIMmed during use part, put at least a > reboot/reattach could provide a sane state. Doing TRIM for cache devices before first use might be slightly useful, but it may make the boot time longer. L2ARC is designed to work with very slow devices - if they cannot keep up we will simply not evict cache from ARC to L2ARC. That's not a big problem. Doing TRIM for cache devices at run time seems pointless to me. Optimal use is when cache device is 100% full, so new data replaces old data and there is no window where we could put TRIM. We would need to replace writes with trim+write, which will increase the latency. TRIM will be more useful for regular data within a pool and most useful for log devices as we do free blocks there and this is where latency is critical (log devices are there to reduce latency). --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com --imjhCm/Pyz7Rq5F2 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAk3CpssACgkQForvXbEpPzT6+QCdFVDXFHUJmgrv4BqkgWeLbqn2 bAoAoLyI0fjfMP5ZLAo6WS94/jevKKGh =6roC -----END PGP SIGNATURE----- --imjhCm/Pyz7Rq5F2-- From owner-freebsd-fs@FreeBSD.ORG Thu May 5 14:10:45 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 15DD71065670 for ; Thu, 5 May 2011 14:10:45 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id B171F8FC21 for ; Thu, 5 May 2011 14:10:44 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEAKauwk2DaFvO/2dsb2JhbACEUKJdtEGRL4EqhF0Ej0qOaw X-IronPort-AV: E=Sophos;i="4.64,319,1301889600"; d="scan'208";a="121340463" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 05 May 2011 09:59:25 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id E5E33793A7 for ; Thu, 5 May 2011 09:59:25 -0400 (EDT) Date: Thu, 5 May 2011 09:59:25 -0400 (EDT) From: Rick Macklem To: FreeBSD FS Message-ID: <237603556.1045829.1304603965877.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Subject: fixing NFS related sysctl naming X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 May 2011 14:10:45 -0000 Hi, Right now there are separate name paths for sysctls used by the two NFS clients, which is awkward since scripts like to play with them (currently using "vfs.nfs" which is the old one). One thought I had was moving the SYSCTL()s and the global variables they manipulate into the "nfslock" modules, which is shared by both clients, so that changing "vfs.nfs.xxx" will affect both NFS clients concurrently. How does this idea sound? Any other suggestions on how to best deal with this? Thanks in advance for any comment, rick From owner-freebsd-fs@FreeBSD.ORG Thu May 5 14:40:19 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BF6441065674; Thu, 5 May 2011 14:40:19 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id 6D1848FC14; Thu, 5 May 2011 14:40:19 +0000 (UTC) Received: from outgoing.leidinger.net (p5B155AFC.dip.t-dialin.net [91.21.90.252]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id 7D59C844018; Thu, 5 May 2011 16:40:05 +0200 (CEST) Received: from webmail.leidinger.net (webmail.Leidinger.net [IPv6:fd73:10c7:2053:1::2:102]) by outgoing.leidinger.net (Postfix) with ESMTP id 3529A11FB; Thu, 5 May 2011 16:40:02 +0200 (CEST) Received: (from www@localhost) by webmail.leidinger.net (8.14.4/8.13.8/Submit) id p45Ee1ll098311; Thu, 5 May 2011 16:40:01 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Thu, 05 May 2011 16:40:01 +0200 Message-ID: <20110505164001.79532nb02isxjlxc@webmail.leidinger.net> Date: Thu, 05 May 2011 16:40:01 +0200 From: Alexander Leidinger To: Pawel Jakub Dawidek References: <4DBBB20A.5050102@FreeBSD.org> <20110430072831.GA65598@icarus.home.lan> <20110501000656.00007ea1@unknown> <20110501133752.GC3245@garage.freebsd.pl> <20110503134826.712070yt2urhxp8g@webmail.leidinger.net> <20110505133156.GE14661@garage.freebsd.pl> In-Reply-To: <20110505133156.GE14661@garage.freebsd.pl> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.6) X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: 7D59C844018.AE39A X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=0, required 6, autolearn=disabled) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1305211206.33085@AJSavKJQhc/K66neZSmg+w X-EBL-Spam-Status: No Cc: freebsd-fs@FreeBSD.org, Alexander Motin Subject: Re: TRIM clustering X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 May 2011 14:40:19 -0000 Quoting Pawel Jakub Dawidek (from Thu, 5 May 2011 15:31:56 +0200): > On Tue, May 03, 2011 at 01:48:26PM +0200, Alexander Leidinger wrote: >> Quoting Pawel Jakub Dawidek (from Sun, 1 May 2011 >> 15:37:52 +0200): >> >> >On Sun, May 01, 2011 at 12:06:56AM +0200, Alexander Leidinger wrote: >> >>On Sat, 30 Apr 2011 00:28:31 -0700 Jeremy Chadwick >> >> wrote: >> >> >> >>> On Sat, Apr 30, 2011 at 09:54:02AM +0300, Alexander Motin wrote: >> >> >> >>> Other notes: TRIM needs to be supported on swap as well, and in my >> >>> opinion this is just as important as it being in UFS. I'm not sure >> >>> how one would implement that. >> >> >> >>This brings up the question if a ZFS cache (where the contents do not >> >>survive a reboot) is completely TRIMmed before used (and normally >> >>trimmed during use)... >> > >> >It is not trimmed at all. >> >> This does not sound like the optimal solution... is there a way to > TRIM will be more useful for regular data within a pool and most useful > for log devices as we do free blocks there and this is where latency is > critical (log devices are there to reduce latency). Wait, does this mean that ZFS does not TRIM at all? I was understanding your first answer as the cache is not trimmed at all. Bye, Alexander. -- If *I* had a hammer, there'd be no more folk singers. http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Thu May 5 16:36:41 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8F1B11065672 for ; Thu, 5 May 2011 16:36:41 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com [209.85.213.182]) by mx1.freebsd.org (Postfix) with ESMTP id 478328FC0C for ; Thu, 5 May 2011 16:36:40 +0000 (UTC) Received: by yxl31 with SMTP id 31so1055611yxl.13 for ; Thu, 05 May 2011 09:36:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=xNUO0xrAIAVlTqZtyNrG7mYGS9y70hwYJGWHnmvzmQE=; b=pk7APgbd9vkW3TfIBz4Q+3mrZVjuUvPliybN9WIgPfERn/tVxi5gcnOcnIpQjB1S3k lOc2SrluqTkwt5zj3A6T3rcfULVrpDpG5/xKqeU4tyXvZmh7naRZQVltn0kfoV3rY2t/ a//uEyQrAubU7fHPrBU5sdYHjHVqVmqTM0lGs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=Bgl3RrkVcZqmWeaOTShi4OTPtEic7WRv9E/Zs1uzU5OzK0lA7UDHx+UxWEEsn3n6sB YQkU5vfPwrCqeGUKV60EHTbM+iw3tmqrY/vZtsufAtODIVB61+4GTzkczgeq6G/D0aVr 3lcI0K6/pT+sS+rBPhxiygg2Ewh6/Gc4SXej0= MIME-Version: 1.0 Received: by 10.90.113.15 with SMTP id l15mr492864agc.32.1304613400354; Thu, 05 May 2011 09:36:40 -0700 (PDT) Received: by 10.90.52.15 with HTTP; Thu, 5 May 2011 09:36:40 -0700 (PDT) In-Reply-To: <20110505062246.60B561C4@server.theusgroup.com> References: <20110505062246.60B561C4@server.theusgroup.com> Date: Thu, 5 May 2011 09:36:40 -0700 Message-ID: From: Freddie Cash To: john@theusgroup.com Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs@freebsd.org Subject: Re: zfs v28 destory -r snapshot failure X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 May 2011 16:36:41 -0000 On Wed, May 4, 2011 at 11:22 PM, John wrote: > Applied this patch set stable-8-zfsv28-20110501.patch.xz to a fresh download > of 8.2-release, buildworld, buildkernel, install and rebooted. Did not upgrade > pool or filesystems. > > Made a snapshot of tank/foo@today, then tried to delete with > zfs destroy -r tank@today yielded: > cannot destroy 'tank@today': dataset does not exist > no snapshots destroyed So, you have a snapshot tank/foo@today, but you don't have a snapshot tank@today, and you expect it to be able to delete the non-existent tank@foo? > If tank@today exists along with tank/foo@today, then the destroy works > correctly. Makes sense. tank@today exists, so you can destroy it. tank/foo@today also exists, so you can destroy it as part of the recursion. > Rebooted with kernel.old which is 8.2-release without the v28 patch and > zfs destroy -r tank@today deleted tank/foo@today without an error. That sounds like an error, since you shouldn't be able to destroy something that doesn't exist. But, maybe my understanding of how -r works is faulty. -- Freddie Cash fjwcash@gmail.com From owner-freebsd-fs@FreeBSD.ORG Thu May 5 16:39:58 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7A82B1065678 for ; Thu, 5 May 2011 16:39:58 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id 33FBE8FC0C for ; Thu, 5 May 2011 16:39:58 +0000 (UTC) Received: by qwc9 with SMTP id 9so2018614qwc.13 for ; Thu, 05 May 2011 09:39:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=t+750fvyyi9Cd34g9H5+EaBnBv3K8VYorAF3/YK1gbI=; b=pf7Vk1Vb5L1gOzw6mXIRmdaGHF8n9YfnsXycL8MVIYBJY4BMeUM2yDjRezR5GeP49U +zKlRSCjVIT+4m6f7Hsz6uXU+4pKF9gum039/yBuQATWUvloG2r1xaCFvsiS4mYDFMLg z7Q47sRScdqvlKNR1zKsIIEF9EZZHHQG8HEeM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; b=sAlPRdPcNibShwlNGvl9KBhMQbM6LnCF187NhoccvueMRETczhkZR9Cpv/lzRkdS7K vZQdVTznOr26Cr7ut0vuT2wzhClwgdKrZpNyQQA1/qSaAwR02pXbvOG/1J9C30m5RPiy Vtwda1pUmz1KRJ6GuHf9xfjhwHZU/OFj0WyJI= MIME-Version: 1.0 Received: by 10.229.107.38 with SMTP id z38mr1617240qco.158.1304613597545; Thu, 05 May 2011 09:39:57 -0700 (PDT) Sender: artemb@gmail.com Received: by 10.229.95.140 with HTTP; Thu, 5 May 2011 09:39:57 -0700 (PDT) In-Reply-To: <4DC25DA6.3060009@bytecamp.net> References: <4DC25DA6.3060009@bytecamp.net> Date: Thu, 5 May 2011 09:39:57 -0700 X-Google-Sender-Auth: zbL4nxIhdqb46cqbjXvdaxHxP_s Message-ID: From: Artem Belevich To: Robert Schulze Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-fs@freebsd.org Subject: Re: zfs l2arc issue X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 May 2011 16:39:58 -0000 On Thu, May 5, 2011 at 1:19 AM, Robert Schulze wrote: > All drives except the caching SSDs are attached to a LSI 9690SA-8I. > The system is equipped with 32 GB RAM, and runs with a load of <1, please > note: we are running 8.0, yet, since there was one issue with ZFS which > blocked the upgrade to 8-STABLE. > > After about 100d uptime, we had a sudden large increase in load of about > 5-7, nfsd had 100-400% WCPU. Also an rsync downloading files from that > machine was very slow. There was an issue with clock_t type overflow . It was fixed in r218429 on Feb 8th in 8-stable. One of its effects was that it would cause L2ARC feeding thread to spin endlessly after about a month of uptime. It's possible that there are other scenarios where clock_t overflow in ZFS code would cause strange things to happen. I would suggest migrating to 8-STABLE as there were number of ZFS-related fixes committed since 8.0. --Artem From owner-freebsd-fs@FreeBSD.ORG Thu May 5 17:01:14 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 052D6106566B; Thu, 5 May 2011 17:01:14 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.garage.freebsd.pl (60.wheelsystems.com [83.12.187.60]) by mx1.freebsd.org (Postfix) with ESMTP id 9CE8F8FC1D; Thu, 5 May 2011 17:01:12 +0000 (UTC) Received: by mail.garage.freebsd.pl (Postfix, from userid 65534) id D3D5E45CAC; Thu, 5 May 2011 19:01:11 +0200 (CEST) Received: from localhost (public-gprs14895.centertel.pl [87.96.58.47]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.garage.freebsd.pl (Postfix) with ESMTP id 4E24F45684; Thu, 5 May 2011 19:01:02 +0200 (CEST) Date: Thu, 5 May 2011 19:00:37 +0200 From: Pawel Jakub Dawidek To: Alexander Leidinger Message-ID: <20110505170037.GG14661@garage.freebsd.pl> References: <4DBBB20A.5050102@FreeBSD.org> <20110430072831.GA65598@icarus.home.lan> <20110501000656.00007ea1@unknown> <20110501133752.GC3245@garage.freebsd.pl> <20110503134826.712070yt2urhxp8g@webmail.leidinger.net> <20110505133156.GE14661@garage.freebsd.pl> <20110505164001.79532nb02isxjlxc@webmail.leidinger.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="lHGcFxmlz1yfXmOs" Content-Disposition: inline In-Reply-To: <20110505164001.79532nb02isxjlxc@webmail.leidinger.net> X-OS: FreeBSD 9.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on mail.garage.freebsd.pl X-Spam-Level: X-Spam-Status: No, score=-2.6 required=4.5 tests=BAYES_00 autolearn=ham version=3.0.4 Cc: freebsd-fs@FreeBSD.org, Alexander Motin Subject: Re: TRIM clustering X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 May 2011 17:01:14 -0000 --lHGcFxmlz1yfXmOs Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, May 05, 2011 at 04:40:01PM +0200, Alexander Leidinger wrote: > >>>>This brings up the question if a ZFS cache (where the contents do not > >>>>survive a reboot) is completely TRIMmed before used (and normally > >>>>trimmed during use)... > >>> > >>>It is not trimmed at all. > >> > >>This does not sound like the optimal solution... is there a way to >=20 > >TRIM will be more useful for regular data within a pool and most useful > >for log devices as we do free blocks there and this is where latency is > >critical (log devices are there to reduce latency). >=20 > Wait, does this mean that ZFS does not TRIM at all? I was > understanding your first answer as the cache is not trimmed at all. You asked for cache and I answered about cache, but ZFS does not TRIM in general. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com --lHGcFxmlz1yfXmOs Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAk3C17QACgkQForvXbEpPzSK/wCgy0KzaVqs5NDHmib8NnlBdyUl phgAoNTfMDvlX/weLtSpUz3fyPWjZorq =QJZr -----END PGP SIGNATURE----- --lHGcFxmlz1yfXmOs-- From owner-freebsd-fs@FreeBSD.ORG Thu May 5 17:53:38 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BCDC5106566B for ; Thu, 5 May 2011 17:53:38 +0000 (UTC) (envelope-from toasty@dragondata.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 81A518FC16 for ; Thu, 5 May 2011 17:53:38 +0000 (UTC) Received: by iwn33 with SMTP id 33so2836842iwn.13 for ; Thu, 05 May 2011 10:53:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dragondata.com; s=google; h=domainkey-signature:from:content-type:content-transfer-encoding :subject:date:message-id:to:mime-version:x-mailer; bh=+4yRosvY4OeW8+IRJ+ZBkW4eIlFoZpC24Zc1S17CZvU=; b=krWbJ57QBWUNbQqiPPx2+LmjBiInULnyU9Vprx7o1iSmfjxwFLE2+JEnch/nVjhJXD XCuZr6RuKg4qQjDL0OxnKJVZi6t+b56wpHH9I3DwOOZ5gavTMJDRRQEzTjT9Nr05QwtK MOM+OpiiND3zjDVOQVqka2NX35m3RXFSxItKM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=dragondata.com; s=google; h=from:content-type:content-transfer-encoding:subject:date:message-id :to:mime-version:x-mailer; b=ixB7kNDNxuD+FITjij7L/9MQGKsfNZbLOKeIC0smCqZd7DfiRd9XI7zzCURN5z7pG9 Mo+f+e8kphuvEI8WdXcLck1bzbqqU4pDOdYxj67OPPfd+lmlpmzWPE3Cg7AW4Q+UTkok 3ThqZ02ogE2w1xUacXbLy9jSwdFu39M8CZVDQ= Received: by 10.43.58.148 with SMTP id wk20mr1341731icb.242.1304616329041; Thu, 05 May 2011 10:25:29 -0700 (PDT) Received: from vpn177.ord02.your.org (vpn177.ord02.your.org [204.9.55.177]) by mx.google.com with ESMTPS id g16sm974803ibb.37.2011.05.05.10.25.26 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 05 May 2011 10:25:27 -0700 (PDT) From: Kevin Day Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Date: Thu, 5 May 2011 12:25:24 -0500 Message-Id: To: freebsd-fs@freebsd.org Mime-Version: 1.0 (Apple Message framework v1084) X-Mailer: Apple Mail (2.1084) Subject: "gpart show" stuck in loop X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 May 2011 17:53:38 -0000 We've had one of our boxes getting stuck with "gpart show" (called from = rc startup scripts) consuming 100% cpu after each reboot. Manually = running "gpart show" gives me: # gpart show |more =3D> 63 715571136 amrd0 MBR (341G) 63 715567167 1 freebsd [active] (341G) 715567230 3969 - free - (1.9M) =3D> 0 715567167 amrd0s1 BSD (341G) 0 696254464 1 freebsd-ufs (332G) 696254464 19312703 2 freebsd-swap (9.2G) =3D> 63 5860573110 da0 MBR (2.7T) 63 2147472747 1 freebsd [active] (1.0T) 2147472810 2147472810 2 freebsd [active] (1.0T) 4294945620 -2729352721 3 freebsd [active] () 1565592899 581879911 - free - (277G) 2147472810 2147472810 2 freebsd [active] (1.0T) 4294945620 -2729352721 3 freebsd [active] () 1565592899 581879911 - free - (277G) 2147472810 2147472810 2 freebsd [active] (1.0T) 4294945620 -2729352721 3 freebsd [active] () 1565592899 581879911 - free - (277G) 2147472810 2147472810 2 freebsd [active] (1.0T) 4294945620 -2729352721 3 freebsd [active] () 1565592899 581879911 - free - (277G) 2147472810 2147472810 2 freebsd [active] (1.0T) 4294945620 -2729352721 3 freebsd [active] () 1565592899 581879911 - free - (277G) (repeating forever) I'm guessing something is corrupt in the partition table. I'm happy to = file a PR on this, but I can only leave this untouched for a day or two = max before I'm going to have to wipe this and start over for a new = customer who needs this storage array. Is there anything anyone could = suggest looking at or preserving before I'm forced to delete this?=20 The storage system came to me configured like this, I don't know what = the previous owner was attempting to do, or how they ended up with the = partitions like this. -- Kevin da0 at mpt0 bus 0 scbus0 target 0 lun 0 da0: Fixed Direct Access SCSI-5 device=20 da0: 100.000MB/s transfers da0: Command Queueing enabled da0: 2861608MB (5860573184 512 byte sectors: 255H 63S/T 364803C) # fdisk da0 ******* Working on device /dev/da0 ******* parameters extracted from in-core disklabel are: cylinders=3D364803 heads=3D255 sectors/track=3D63 (16065 blks/cyl) Figures below won't work with BIOS for partitions not in cyl 1 parameters to be used for BIOS calculations are: cylinders=3D364803 heads=3D255 sectors/track=3D63 (16065 blks/cyl) Media sector size is 512 Warning: BIOS sector numbering starts with sector 1 Information from DOS bootblock is: The data for partition 1 is: sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD) start 63, size 2147472747 (1048570 Meg), flag 80 (active) beg: cyl 0/ head 1/ sector 1; end: cyl 1023/ head 254/ sector 63 The data for partition 2 is: sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD) start 2147472810, size 2147472810 (1048570 Meg), flag 80 (active) beg: cyl 1023/ head 255/ sector 63; end: cyl 1023/ head 254/ sector 63 The data for partition 3 is: sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD) start 4294945620, size 1565614575 (764460 Meg), flag 80 (active) beg: cyl 1023/ head 255/ sector 63; end: cyl 1023/ head 165/ sector 59 The data for partition 4 is: From owner-freebsd-fs@FreeBSD.ORG Fri May 6 01:31:02 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 118D0106566B for ; Fri, 6 May 2011 01:31:02 +0000 (UTC) (envelope-from marcel@xcllnt.net) Received: from mail.xcllnt.net (mail.xcllnt.net [70.36.220.4]) by mx1.freebsd.org (Postfix) with ESMTP id 894458FC12 for ; Fri, 6 May 2011 01:31:01 +0000 (UTC) Received: from sa-nc-mfg-210.static.jnpr.net (natint3.juniper.net [66.129.224.36]) (authenticated bits=0) by mail.xcllnt.net (8.14.4/8.14.4) with ESMTP id p45LPd57054019 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Thu, 5 May 2011 14:25:44 -0700 (PDT) (envelope-from marcel@xcllnt.net) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Marcel Moolenaar In-Reply-To: Date: Thu, 5 May 2011 14:25:35 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: References: To: Kevin Day X-Mailer: Apple Mail (2.1084) Cc: freebsd-fs@freebsd.org Subject: Re: "gpart show" stuck in loop X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 May 2011 01:31:02 -0000 On May 5, 2011, at 10:25 AM, Kevin Day wrote: >=20 > We've had one of our boxes getting stuck with "gpart show" (called = from rc startup scripts) consuming 100% cpu after each reboot. Manually = running "gpart show" gives me: Can you send me a binary image of the first sector of da0 privately and also tell me what FreeBSD version you're using. Thanks, --=20 Marcel Moolenaar marcel@xcllnt.net From owner-freebsd-fs@FreeBSD.ORG Fri May 6 03:12:09 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 81E21106566B for ; Fri, 6 May 2011 03:12:09 +0000 (UTC) (envelope-from marcel@xcllnt.net) Received: from mail.xcllnt.net (mail.xcllnt.net [70.36.220.4]) by mx1.freebsd.org (Postfix) with ESMTP id 559F28FC12 for ; Fri, 6 May 2011 03:12:09 +0000 (UTC) Received: from dhcp-192-168-2-13.wifi.xcllnt.net (atm.xcllnt.net [70.36.220.6]) (authenticated bits=0) by mail.xcllnt.net (8.14.4/8.14.4) with ESMTP id p463BxrZ001939 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Thu, 5 May 2011 20:12:05 -0700 (PDT) (envelope-from marcel@xcllnt.net) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Marcel Moolenaar In-Reply-To: Date: Thu, 5 May 2011 20:11:59 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <6B7B3E48-08D5-47D1-85B4-FAA1EEE6764C@xcllnt.net> References: To: Marcel Moolenaar X-Mailer: Apple Mail (2.1084) Cc: freebsd-fs@freebsd.org Subject: Re: "gpart show" stuck in loop X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 May 2011 03:12:09 -0000 On May 5, 2011, at 2:25 PM, Marcel Moolenaar wrote: >=20 > On May 5, 2011, at 10:25 AM, Kevin Day wrote: >=20 >>=20 >> We've had one of our boxes getting stuck with "gpart show" (called = from rc startup scripts) consuming 100% cpu after each reboot. Manually = running "gpart show" gives me: >=20 > Can you send me a binary image of the first sector of da0 privately > and also tell me what FreeBSD version you're using. (after receiving the dump) Hi Kevin, I reproduced the problem: ns1% sudo mdconfig -a -t malloc -s 5860573173 md0 ns1% sudo gpart create -s mbr md0 md0 created ns1% gpart show md0 =3D> 63 4294967229 md0 MBR (2.7T) 63 4294967229 - free - (2.0T) ns1% sudo dd if=3Dkevin-day.mbr of=3D/dev/md0 8+0 records in 8+0 records out 4096 bytes transferred in 0.006988 secs (586144 bytes/sec) ns1% gpart show md0 =3D> 63 5860573110 md0 MBR (2.7T) 63 2147472747 1 freebsd [active] (1.0T) 2147472810 2147472810 2 freebsd [active] (1.0T) 4294945620 -2729352721 3 freebsd [active] () 1565592899 581879911 - free - (277G) 2147472810 2147472810 2 freebsd [active] (1.0T) 4294945620 -2729352721 3 freebsd [active] () 1565592899 581879911 - free - (277G) 2147472810 2147472810 2 freebsd [active] (1.0T) 4294945620 -2729352721 3 freebsd [active] () 1565592899 581879911 - free - (277G) ^C The first problem you have is that the MBR has overflows. As you can see from my initial MBR, only 2.0TB out of the 2.7T can be addressed, whereas yours addresses the whole 2.7T. There must be an overflow condition. The second problem is that more than 1 slice is marked active. Now, on to the infinite recursion in gpart. The XML has the following pertaining the slices: r0w0e0 md0s3 -1397428593152 512 4294945620 1565592898 3 freebsd 2199012157440 18446742676280958464 165 active Notice how mediasize is negative. This is a bug in the kernel. This is also what leads to the recursion in gpart, because gpart looks up the next partition on the disk, given the LBA of the next sector following the partition just processed. This allows gpart to detect free space (the next partition found doesn't start at the given LBA) and it allows gpart to print the partitions in order on the disk. In any case: since the end of slice 3 is before the start of slice 3 and even before the start of slice 2, due to its negative size, gpart will continuously find the same partitions: 1. After partition 3 the "cursor" is at 1565592899, 2. The next partition found is partition 2, at 2147472810 3. Therefore, 1565592899-2147472810 is free space 4. Partition 2 is printed, and partition 3 is found next 5. Partition 3 is printed and due to the negative size: goto 1 I think we should do things: 1. Protect the gpart tool against this, 2. Fix the kernel to simply reject partitions that fall outside of the addressable space (as determined by the limitations of the scheme). In your case it would mean that slice 3 would result in slice 3 being inaccessable. Given that you've been hit by this: do you feel that such a change would be a better failure mode? --=20 Marcel Moolenaar marcel@xcllnt.net From owner-freebsd-fs@FreeBSD.ORG Fri May 6 06:30:04 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4C1091065672 for ; Fri, 6 May 2011 06:30:04 +0000 (UTC) (envelope-from toasty@dragondata.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id DEEB58FC17 for ; Fri, 6 May 2011 06:30:01 +0000 (UTC) Received: by iwn33 with SMTP id 33so3398993iwn.13 for ; Thu, 05 May 2011 23:30:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dragondata.com; s=google; h=domainkey-signature:subject:mime-version:content-type:from :in-reply-to:date:cc:content-transfer-encoding:message-id:references :to:x-mailer; bh=DjXDjVWUjuAzLLchb6K/E+cQ+5M2VxkhGvA+/4pgUfw=; b=nZBMT34w2PbQStFWMugQEd9jWY0XWiz/qSBT3jdO0en3eSCuaP4XagTnSU2sxd3i9P fD6ZthMuTpqbiJ/gp4r1EfeULXxtcNCLZrQ90gvUaB7x3yZYBev1qitd6nt7UA3UJ0Or WGLn8Br0EotADn7jz3IPr32KIrbTUEHkpN2cs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=dragondata.com; s=google; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; b=PBBzyf6/qxU+PJYpEMbBl0yV3WMGSSQAI/kJzMcK91trH0Rmznh0y07BdsC9qSA11e WSzV+mnmgcM5MGqijo8Z2FBcZSREXWYUj+cZ2CqICQBpst0fNZs1Bb7JiuKwatiRT/ij tjaq8svbW/Qgzy6fWcp27hh4JxxDyZ/pRN4Do= Received: by 10.42.159.65 with SMTP id k1mr2161427icx.174.1304663401225; Thu, 05 May 2011 23:30:01 -0700 (PDT) Received: from vpn168.ord02.your.org (vpn168.ord02.your.org [204.9.55.168]) by mx.google.com with ESMTPS id u17sm1229948ibm.28.2011.05.05.23.29.59 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 05 May 2011 23:29:59 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Kevin Day In-Reply-To: <6B7B3E48-08D5-47D1-85B4-FAA1EEE6764C@xcllnt.net> Date: Fri, 6 May 2011 01:29:57 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <4A043649-429E-4CAC-8DB2-2275ECF552A0@dragondata.com> References: <6B7B3E48-08D5-47D1-85B4-FAA1EEE6764C@xcllnt.net> To: Marcel Moolenaar X-Mailer: Apple Mail (2.1084) Cc: freebsd-fs@freebsd.org Subject: Re: "gpart show" stuck in loop X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 May 2011 06:30:04 -0000 On May 5, 2011, at 10:11 PM, Marcel Moolenaar wrote: > Hi Kevin, >=20 > I reproduced the problem: >=20 Yay! > The first problem you have is that the MBR has overflows. > As you can see from my initial MBR, only 2.0TB out of the > 2.7T can be addressed, whereas yours addresses the whole > 2.7T. There must be an overflow condition. >=20 > The second problem is that more than 1 slice is marked > active. Yeah, I'm not exactly sure how the previous user of this storage array = ended up with this MBR. I believe he was using it in FreeBSD, but = probably something much older (6.x?). I don't know if it was actually = working or not with all the partitions, but I honestly can't see how. > I think we should do things: > 1. Protect the gpart tool against this, > 2. Fix the kernel to simply reject partitions that > fall outside of the addressable space (as determined > by the limitations of the scheme). >=20 > In your case it would mean that slice 3 would result > in slice 3 being inaccessable. >=20 > Given that you've been hit by this: do you feel that such > a change would be a better failure mode? Definitely. As it stands now, slice 3 isn't accessible anyway: # dd if=3D/dev/da0s3 of=3D/dev/null=20 dd: /dev/da0s3: Input/output error 0+0 records in 0+0 records out 0 bytes transferred in 0.000233 secs (0 bytes/sec) So allowing the rc startup to finish without hanging would be much = improved. Thanks for the speedy answer. :) -- Kevin From owner-freebsd-fs@FreeBSD.ORG Fri May 6 08:14:10 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A1E991065672; Fri, 6 May 2011 08:14:10 +0000 (UTC) (envelope-from jh@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 79AB78FC1A; Fri, 6 May 2011 08:14:10 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p468EAqw004992; Fri, 6 May 2011 08:14:10 GMT (envelope-from jh@freefall.freebsd.org) Received: (from jh@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p468E9a6004986; Fri, 6 May 2011 08:14:09 GMT (envelope-from jh) Date: Fri, 6 May 2011 08:14:09 GMT Message-Id: <201105060814.p468E9a6004986@freefall.freebsd.org> To: vk@dss.kbb.ru, jh@FreeBSD.org, freebsd-fs@FreeBSD.org From: jh@FreeBSD.org Cc: Subject: Re: kern/149022: [hang] File system operations hangs with suspfs state X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 May 2011 08:14:10 -0000 Synopsis: [hang] File system operations hangs with suspfs state State-Changed-From-To: feedback->closed State-Changed-By: jh State-Changed-When: Fri May 6 08:14:09 UTC 2011 State-Changed-Why: Feedback timeout. http://www.freebsd.org/cgi/query-pr.cgi?pr=149022 From owner-freebsd-fs@FreeBSD.ORG Fri May 6 08:19:07 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 527C1106566C; Fri, 6 May 2011 08:19:07 +0000 (UTC) (envelope-from jh@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 2B2508FC19; Fri, 6 May 2011 08:19:07 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p468J7pC006395; Fri, 6 May 2011 08:19:07 GMT (envelope-from jh@freefall.freebsd.org) Received: (from jh@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p468J6dZ006390; Fri, 6 May 2011 08:19:06 GMT (envelope-from jh) Date: Fri, 6 May 2011 08:19:06 GMT Message-Id: <201105060819.p468J6dZ006390@freefall.freebsd.org> To: k0802647@telus.net, jh@FreeBSD.org, freebsd-fs@FreeBSD.org From: jh@FreeBSD.org Cc: Subject: Re: kern/154228: [md] md getting stuck in wdrain state X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 May 2011 08:19:07 -0000 Synopsis: [md] md getting stuck in wdrain state State-Changed-From-To: feedback->patched State-Changed-By: jh State-Changed-When: Fri May 6 08:16:03 UTC 2011 State-Changed-Why: Fixed in head (r217880) and stable/8 (r218188). http://www.freebsd.org/cgi/query-pr.cgi?pr=154228 From owner-freebsd-fs@FreeBSD.ORG Fri May 6 20:34:02 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2C1831065670 for ; Fri, 6 May 2011 20:34:02 +0000 (UTC) (envelope-from unix.co@gmail.com) Received: from mail-pv0-f182.google.com (mail-pv0-f182.google.com [74.125.83.182]) by mx1.freebsd.org (Postfix) with ESMTP id 041D98FC15 for ; Fri, 6 May 2011 20:34:01 +0000 (UTC) Received: by pvg11 with SMTP id 11so2080741pvg.13 for ; Fri, 06 May 2011 13:34:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:date:message-id:subject:from:to :content-type; bh=ZLZwCcnkhYjWKi6EEucEhtiswI/tQHycfl+yplfhEB8=; b=I/QXXTfocZWt7kUd3al5CeAvRDujpfLntsb+4l3+4mF+rI0Uj68Wq1L2WmE0Ei8NYf 46GzffhDcOndkX/kv1JmaK4uU9AhulbLxUDRwiG5SJfY2Q4X+TnknZjAV0QvIFlw+7EV eD+ZdN0M0Gbbg6vDHMAwz2HAkRAnyQSmiSG0M= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=WjLeI/NEdVxG00NNxOuwnoTl4grocfJRmTYExRhE85NlvfQI36GIfISjV3cuUfOVAy f7v9Ekq86BggG7W/Fm5X4NVAZA3OOuRbU/CRcmnILb0o9Kmgwjq6DQJ5u5JD+KV9HpJt 4f9Qr19Dvp14tivuIOys8DF+F8jH3OxFaudH8= MIME-Version: 1.0 Received: by 10.68.60.33 with SMTP id e1mr2816406pbr.174.1304712723322; Fri, 06 May 2011 13:12:03 -0700 (PDT) Received: by 10.68.54.41 with HTTP; Fri, 6 May 2011 13:12:03 -0700 (PDT) Date: Sat, 7 May 2011 01:12:03 +0500 Message-ID: From: "Tears !" To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Remote address not configured ?? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 May 2011 20:34:02 -0000 Hi I am trying to create hast0 pool hastctl create hast0 But i am getting error [ERROR] Remote address not configured for resource hast0. Here is my hast.conf both nodes are accessible and both side same hast.conf resource hast0 { on s1 { local /dev/ad3 remote 87.96.41.150 } on s2 { local /dev/ad3 remote 87.96.41.146 } } How to solve this ? Best Regards Umar From owner-freebsd-fs@FreeBSD.ORG Sat May 7 05:36:13 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EE1E71065676 for ; Sat, 7 May 2011 05:36:13 +0000 (UTC) (envelope-from igorz@yandex.ru) Received: from forward1.mail.yandex.net (forward1.mail.yandex.net [77.88.46.6]) by mx1.freebsd.org (Postfix) with ESMTP id A04B88FC16 for ; Sat, 7 May 2011 05:36:13 +0000 (UTC) Received: from web53.yandex.ru (web53.yandex.ru [77.88.47.159]) by forward1.mail.yandex.net (Yandex) with ESMTP id C5F4E124312A for ; Sat, 7 May 2011 09:20:58 +0400 (MSD) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1304745658; bh=Sqpf6dUA881NpTWu1h0UhcrVpLtaqsHXkB2UepiUTDk=; h=From:To:Subject:MIME-Version:Message-Id:Date: Content-Transfer-Encoding:Content-Type; b=Gxm3o8de/bf7HZLXTFtrnRiGYycbUdU/bmGn7d20MOEPRlJfAwJbcyNWpnxgVH1Sc tWzEpINnkls1R7ArXmM0NHdxfy1/uRth9CtQIwranzkKNcd6wfW+HXG5yD7U/xrGNE ChB/N5tOOJAI8dSgf6I2grRHWYSpx6xpBf3kLAnU= Received: from localhost (localhost.localdomain [127.0.0.1]) by web53.yandex.ru (Yandex) with ESMTP id BB530358331 for ; Sat, 7 May 2011 09:20:58 +0400 (MSD) X-Yandex-Spam: 1 Received: from ppp85-141-219-114.pppoe.mtu-net.ru (ppp85-141-219-114.pppoe.mtu-net.ru [85.141.219.114]) by mail.yandex.ru with HTTP; Sat, 07 May 2011 09:20:57 +0400 From: Igor Zabelin To: freebsd-fs@freebsd.org MIME-Version: 1.0 Message-Id: <210021304745658@web53.yandex.ru> Date: Sat, 07 May 2011 09:20:57 +0400 X-Mailer: Yamail [ http://yandex.ru ] 5.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain Subject: ZFS can't mount filesystem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 07 May 2011 05:36:14 -0000 Hi, I have trouble with ZFS. One of set filesystems can't mount. zpool scrub is not doing anything ZFS reports an error when geting the properties. SMART extended offline test for each disk completed without error. It's possible to recover data? Mount ignoring errors? FreeBSD 8.2-RELEASE ZFS reports an error when geting the properties. # zfs get all tank/var [skip normal output] internal error: unable to get version property internal error: unable to get utf8only property internal error: unable to get normalization property internal error: unable to get casesensitivity property [skip normal output] # zpool status -v tank pool: tank state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub stopped after 0h0m with 0 errors on Sat May 7 08:09:35 2011 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 36 raidz1 ONLINE 0 0 144 gpt/disk5 ONLINE 0 0 0 gpt/disk6 ONLINE 0 0 0 gpt/disk7 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: tank/var:<0x0> From owner-freebsd-fs@FreeBSD.ORG Sat May 7 06:44:06 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B49A6106564A for ; Sat, 7 May 2011 06:44:06 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta06.westchester.pa.mail.comcast.net (qmta06.westchester.pa.mail.comcast.net [76.96.62.56]) by mx1.freebsd.org (Postfix) with ESMTP id 625C68FC08 for ; Sat, 7 May 2011 06:44:06 +0000 (UTC) Received: from omta23.westchester.pa.mail.comcast.net ([76.96.62.74]) by qmta06.westchester.pa.mail.comcast.net with comcast id gWir1g0031c6gX856Wk6MU; Sat, 07 May 2011 06:44:06 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta23.westchester.pa.mail.comcast.net with comcast id gWk51g00A1t3BNj3jWk5qm; Sat, 07 May 2011 06:44:06 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id DC4F7102C19; Fri, 6 May 2011 23:44:03 -0700 (PDT) Date: Fri, 6 May 2011 23:44:03 -0700 From: Jeremy Chadwick To: Igor Zabelin Message-ID: <20110507064403.GA4324@icarus.home.lan> References: <210021304745658@web53.yandex.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <210021304745658@web53.yandex.ru> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS can't mount filesystem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 07 May 2011 06:44:06 -0000 On Sat, May 07, 2011 at 09:20:57AM +0400, Igor Zabelin wrote: > Hi, > > I have trouble with ZFS. One of set filesystems can't mount. > zpool scrub is not doing anything > ZFS reports an error when geting the properties. > SMART extended offline test for each disk completed without error. > It's possible to recover data? Mount ignoring errors? > > FreeBSD 8.2-RELEASE > > ZFS reports an error when geting the properties. > > # zfs get all tank/var > > [skip normal output] > internal error: unable to get version property > internal error: unable to get utf8only property > internal error: unable to get normalization property > internal error: unable to get casesensitivity property > [skip normal output] > > # zpool status -v tank > pool: tank > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: scrub stopped after 0h0m with 0 errors on Sat May 7 08:09:35 2011 > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 36 > raidz1 ONLINE 0 0 144 > gpt/disk5 ONLINE 0 0 0 > gpt/disk6 ONLINE 0 0 0 > gpt/disk7 ONLINE 0 0 0 > > errors: Permanent errors have been detected in the following files: > > tank/var:<0x0> Just to rule out disk problems, can you please provide "smartctl -a" output for each of the 3 disks in the pool and be sure to state what output matches each disk (gpt/XXX)? A long test doesn't act act as full validation of disk read integrity (it's slightly different than a surface scan but not the same thing), nor does it test things like communication between the controller and the disk. short vs. long vs. conveyance vs. offline vs. select SMART tests all do different things depending on how the vendor implements them, and it varies per model of disk; there is no standard. Others may be able to help with pool recovery in this case, but I always tend to resort to restoration from backups. Developers may be interested in the output from "zdb tank", so you may want to include that here. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Sat May 7 07:27:36 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 54FC61065676 for ; Sat, 7 May 2011 07:27:36 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-bw0-f54.google.com (mail-bw0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id CDE168FC21 for ; Sat, 7 May 2011 07:27:35 +0000 (UTC) Received: by bwz12 with SMTP id 12so4385112bwz.13 for ; Sat, 07 May 2011 00:27:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:from:to:cc:subject:references:x-comment-to :sender:date:in-reply-to:message-id:user-agent:mime-version :content-type; bh=BSronswnnxiVec78aDaD0szQlqD2vu52p8EEJsfcjVo=; b=bopXUlWwu3q0ms6skcaXGOKeeMjUetqPwRyw6tIOLoUTP64fmJXOUGZCSSqO2LojHx BJOrLtorgcIqn4zFvx0/crSgIhIasCX0/0tAAyIi/mRp01lmCWDw6rSzkN8TyL3oRKDQ 4RVORs9lo+059K7s3MJ5zu6CPXL9A71sbCKjM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:references:x-comment-to:sender:date:in-reply-to :message-id:user-agent:mime-version:content-type; b=RjachKAA45IZu7p0KZC86QV2wQYNfZt1Lp6wFUNn4gs99piZvBLbkV2vdL6kJ7mKLq BIvId/z0RYtEHc2z4sbyPtNu9i5bABtUNngVirNF/Ivxpc8TcrNlxOXvJm+fkuWBiPsb pQWKjUF0KeKfqm/KAMMv0EtaEhlB9I68uHnBA= Received: by 10.205.24.9 with SMTP id rc9mr3956140bkb.92.1304753254502; Sat, 07 May 2011 00:27:34 -0700 (PDT) Received: from localhost ([95.69.172.154]) by mx.google.com with ESMTPS id y22sm2427796bku.8.2011.05.07.00.27.32 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 07 May 2011 00:27:33 -0700 (PDT) From: Mikolaj Golub To: "Tears !" References: X-Comment-To: Tears ! Sender: Mikolaj Golub Date: Sat, 07 May 2011 10:27:30 +0300 In-Reply-To: (Tears !'s message of "Sat, 7 May 2011 01:12:03 +0500") Message-ID: <86wri3dl7h.fsf@kopusha.home.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-fs@freebsd.org Subject: Re: Remote address not configured ?? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 07 May 2011 07:27:36 -0000 On Sat, 7 May 2011 01:12:03 +0500 Tears ! wrote: T!> Hi T!> I am trying to create hast0 pool T!> hastctl create hast0 T!> But i am getting error T!> [ERROR] Remote address not configured for resource hast0. T!> Here is my hast.conf both nodes are accessible and both side same hast.conf T!> resource hast0 { T!> on s1 { T!> local /dev/ad3 T!> remote 87.96.41.150 T!> } T!> on s2 { T!> local /dev/ad3 T!> remote 87.96.41.146 T!> } T!> } T!> How to solve this ? It looks like hastd can't find configuration for its node. Is s1 and s2 are real hostnames of your hosts? As it is stated in hast.conf(5): The argument can be replaced either by a full hostname as obtained by gethostname(3), only first part of the hostname, or by node's UUID as found in the kern.hostuuid sysctl(8) variable. What version of FreeBSD are you running? I suspect some release, because in STABLE and CURRENT you would have more plain message in such case: "No resource configuration for this node...". -- Mikolaj Golub From owner-freebsd-fs@FreeBSD.ORG Sat May 7 08:57:04 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 67C3A1065677; Sat, 7 May 2011 08:57:04 +0000 (UTC) (envelope-from unix.co@gmail.com) Received: from mail-pv0-f182.google.com (mail-pv0-f182.google.com [74.125.83.182]) by mx1.freebsd.org (Postfix) with ESMTP id 369288FC0A; Sat, 7 May 2011 08:57:03 +0000 (UTC) Received: by pvg11 with SMTP id 11so2291591pvg.13 for ; Sat, 07 May 2011 01:57:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=mOerKN8A7mHWnlIXnZh+QvhV8c6OLIUm2yH0w2amDvc=; b=WsTNY7vkgjzz7aYkIPInZY/iINcuAFG2dDvgUHAEOjLFHmRUN4uLtFp27v6JAqzw8q J68qZjHCYEyp7GrAR4T9cuA7C+wP7zhQ37GP6RBOzyam30CJKVRtRTN+B8qijXJZG6ys QQgBHoa3wtiRApzB2CMp6FUSbPPAwZMkQxSBA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=PIT9j9Nf53d4B505w8bUagecbIjvY7l87DjbRqHN0l9tkerrd2a0q+EweYDPe9bAsm baAAs1eVZzuevwPwhMolPqOfJ8rSWeVgpAE1og2Hh4AeuFIepMICPkAHNryduOG75jnR pZbhFOKw4/MaGKNK8+vS9O1o3MO9pPgKYXeOo= MIME-Version: 1.0 Received: by 10.68.0.69 with SMTP id 5mr6128992pbc.241.1304758623707; Sat, 07 May 2011 01:57:03 -0700 (PDT) Received: by 10.68.54.41 with HTTP; Sat, 7 May 2011 01:57:03 -0700 (PDT) In-Reply-To: <86wri3dl7h.fsf@kopusha.home.net> References: <86wri3dl7h.fsf@kopusha.home.net> Date: Sat, 7 May 2011 13:57:03 +0500 Message-ID: From: "Tears !" To: Mikolaj Golub Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: Remote address not configured ?? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 07 May 2011 08:57:04 -0000 Hi Mikolaj, Thanks a lot its work after changing *s1* with system *hostname* Best Regards, Umar On Sat, May 7, 2011 at 12:27 PM, Mikolaj Golub wrote: > > On Sat, 7 May 2011 01:12:03 +0500 Tears ! wrote: > > T!> Hi > > T!> I am trying to create hast0 pool > > T!> hastctl create hast0 > > T!> But i am getting error > > T!> [ERROR] Remote address not configured for resource hast0. > > T!> Here is my hast.conf both nodes are accessible and both side same > hast.conf > > T!> resource hast0 { > T!> on s1 { > T!> local /dev/ad3 > T!> remote 87.96.41.150 > T!> } > T!> on s2 { > T!> local /dev/ad3 > T!> remote 87.96.41.146 > T!> } > T!> } > > T!> How to solve this ? > > It looks like hastd can't find configuration for its node. Is s1 and s2 are > real hostnames of your hosts? As it is stated in hast.conf(5): > > The argument can be replaced either by a full hostname as > obtained > by gethostname(3), only first part of the hostname, or by node's UUID > as > found in the kern.hostuuid sysctl(8) variable. > > What version of FreeBSD are you running? I suspect some release, because in > STABLE and CURRENT you would have more plain message in such case: "No > resource configuration for this node...". > > -- > Mikolaj Golub > -- Umar Draz Network Administrator From owner-freebsd-fs@FreeBSD.ORG Sat May 7 19:02:39 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 30D4D106564A for ; Sat, 7 May 2011 19:02:39 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id E14508FC16 for ; Sat, 7 May 2011 19:02:37 +0000 (UTC) Received: by qwc9 with SMTP id 9so3128911qwc.13 for ; Sat, 07 May 2011 12:02:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=zGvD/qi9gu0x0PVfSz1iIekuryOsh2pEnQ71i9H6Gc8=; b=BkFDc8DFWrf4IzS0BT727nr+Lln4SoMg8INI4bQaCDqIPsrrtN4fixZK2clsX111p5 EKQzOiL/1ihmHfQz74EFKqygJOt2ena2b+KeS6plVbibQ5iiqKXyPyoX5P+jw97NQUc3 2vG4+itktI0muzhLNbI3UJDG0g94aECbYLG18= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=qY4AV9n+7DOifGbtbw2davmaEYvzSEV9KyWNMm20Izwe66SNQvswjP8WlYUxxAicit ggoeQkp3E81wJIDeVHX0IN4DVn/M0jXlOX+U4IkogArwrpCazRVsi6M0ou+4tVZOsl9Y r8+s7hHB5tL43t+D57+/OyIyV6wZvRvKyLaek= MIME-Version: 1.0 Received: by 10.229.46.67 with SMTP id i3mr3412290qcf.234.1304794955473; Sat, 07 May 2011 12:02:35 -0700 (PDT) Sender: artemb@gmail.com Received: by 10.229.95.140 with HTTP; Sat, 7 May 2011 12:02:35 -0700 (PDT) In-Reply-To: <210021304745658@web53.yandex.ru> References: <210021304745658@web53.yandex.ru> Date: Sat, 7 May 2011 12:02:35 -0700 X-Google-Sender-Auth: FdfRFOeY7Xw91wAtee3Fo3jOwq4 Message-ID: From: Artem Belevich To: Igor Zabelin Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: ZFS can't mount filesystem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 07 May 2011 19:02:39 -0000 On Fri, May 6, 2011 at 10:20 PM, Igor Zabelin wrote: > Hi, > > I have trouble with ZFS. One of set filesystems can't mount. > zpool scrub is not doing anything > ZFS reports an error when geting the properties. > SMART extended offline test for each disk completed without error. > It's possible to recover data? Mount ignoring errors? > > FreeBSD 8.2-RELEASE > > ZFS reports an error when geting the properties. > > # zfs get all tank/var > > [skip normal output] > internal error: unable to get version property > internal error: unable to get utf8only property > internal error: unable to get normalization property > internal error: unable to get casesensitivity property > [skip normal output] > > # zpool status -v tank > =A0pool: tank > =A0state: ONLINE > status: One or more devices has experienced an error resulting in data > =A0 =A0 =A0 =A0corruption. =A0Applications may be affected. > action: Restore the file in question if possible. =A0Otherwise restore th= e > =A0 =A0 =A0 =A0entire pool from backup. > =A0 see: http://www.sun.com/msg/ZFS-8000-8A > =A0scrub: scrub stopped after 0h0m with 0 errors on Sat May =A07 08:09:35= 2011 > config: > > =A0 =A0 =A0 =A0NAME =A0 =A0 =A0 =A0 =A0 STATE =A0 =A0 READ WRITE CKSUM > =A0 =A0 =A0 =A0tank =A0 =A0 =A0 =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 = =A0 =A036 > =A0 =A0 =A0 =A0 =A0raidz1 =A0 =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = 144 > =A0 =A0 =A0 =A0 =A0 =A0gpt/disk5 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 > =A0 =A0 =A0 =A0 =A0 =A0gpt/disk6 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 > =A0 =A0 =A0 =A0 =A0 =A0gpt/disk7 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 > > errors: Permanent errors have been detected in the following files: > > =A0 =A0 =A0 =A0tank/var:<0x0> It may be good idea to test RAM on your system first. ZFS with its data consistency checks is often the first thing tripped by bad RAM. --Artem > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >