From owner-freebsd-fs@FreeBSD.ORG Sun May 6 02:13:21 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 58B73106566B for ; Sun, 6 May 2012 02:13:21 +0000 (UTC) (envelope-from hackish@gmail.com) Received: from mail-bk0-f54.google.com (mail-bk0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id D6CFD8FC16 for ; Sun, 6 May 2012 02:13:20 +0000 (UTC) Received: by bkvi17 with SMTP id i17so4221505bkv.13 for ; Sat, 05 May 2012 19:13:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=lW4bEV4p2/xLUVAiTlx/gbw6bu3FtLoYqan4YYBG8nk=; b=wf8MfXwXQjUWNmPooveiEokbQz1wJ3Kp80r37UDSDCVZIpCXvhiuDw3KV7hNqN+sCZ nf43tT1E8oHmAFhELgSS4mjQ29AcjYrnoDQCuygaewogornbmp69q/qdyoqIPKjrYe2s U2F41l0WhxYsMf6BXGB93ejv0pLHhJ7TEudw50mL8uFddWhPFqRUM9KvdSP/fVmcaL2H CZcS78Q2sBSOXNYwRw86fk7Ozt/5ZPlRM7thgkr+0+41ZGkdSWmMuDjf1BOz87IvCkVa m/rY45CuQzZVy/FdozcuIwYvnVbilYPnDJh9krcY5ku9uJEuURpRe5FHBmAbIz3ZaNi2 vpoQ== MIME-Version: 1.0 Received: by 10.205.131.7 with SMTP id ho7mr2369820bkc.62.1336270399710; Sat, 05 May 2012 19:13:19 -0700 (PDT) Received: by 10.205.82.205 with HTTP; Sat, 5 May 2012 19:13:19 -0700 (PDT) Date: Sat, 5 May 2012 22:13:19 -0400 Message-ID: From: Michael Richards To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Subject: ZFS Kernel Panics with 32 and 64 bit versions of 8.3 and 9.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 May 2012 02:13:21 -0000 Originally I had an 8.1 server setup on a 32bit kernel. The OS is on a UFS filesystem and (it's a mail server) the business part of the operation is on ZFS. One day it crashed with an odd kernel panic. I assumed it was a memory issue so I had more RAM installed. I tried to get a PAE kernel working to use this extra ram but it was crashing every few hours. Suspecting a hardware issue all the hardware was replaced. I had some difficulty trying to figure out how to mount my old ZFS partition but eventually did so. zpool import shows this: pool: email id: 10433152746165646153 state: ONLINE status: The pool was last accessed by another system. action: The pool can be imported using its name or numeric identifier and the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: email ONLINE ada1s1g ONLINE zpool import -f -R /altroot 10433152746165646153 olddata panics the kernel. Similar panic as seen in all the other kernel versions. http://forums.freebsd.org/attachment.php?attachmentid=1545&stc=1&d=1336261809 Shows a typical kernel panic. http://forums.freebsd.org/showthread.php?t=31820 Gives a bit more info about things I've tried. Whatever it is seems to affect a wide variety of kernels. Any ideas? From owner-freebsd-fs@FreeBSD.ORG Sun May 6 06:11:03 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 39171106564A for ; Sun, 6 May 2012 06:11:03 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-lpp01m010-f54.google.com (mail-lpp01m010-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id A281D8FC0C for ; Sun, 6 May 2012 06:11:02 +0000 (UTC) Received: by lagv3 with SMTP id v3so3868435lag.13 for ; Sat, 05 May 2012 23:11:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=E0VW8wKTRqXrRME1J4cytVfm//mysDvG+hl2n99rewc=; b=A5pKWOathTHKqv/vgQnb9evXrOBgkc0B0qjhtBv0CAekyTg11unxOqesF4qy4kLh/f j7O0YvHeh0tboB62biEvwkvXIz8n0y0eLnlZ28Lee7sq9/lOYSVc6AINoLj0rCdh6vAo hlalPbCA4q0coNH45Cpoqxa0W+oTOXq1mKzGLKhxKZpE1nEqAlk1DeIxbsQNcZOlJkDg /ef4kbh7DHygiu1ywX+9J6Gz+V80p68fqMlIKrndYfY0EJuKjGH1agErnLub7tzOE+b+ 9jvYmyMxVGnQC0CmtHHnBI5jDklgtBAz0E+FDYvIgsnlUh8ehZ+veeX95V0wwcx3BYKp 05dw== MIME-Version: 1.0 Received: by 10.152.129.74 with SMTP id nu10mr10414024lab.50.1336284661240; Sat, 05 May 2012 23:11:01 -0700 (PDT) Sender: artemb@gmail.com Received: by 10.112.2.5 with HTTP; Sat, 5 May 2012 23:11:01 -0700 (PDT) In-Reply-To: References: Date: Sat, 5 May 2012 23:11:01 -0700 X-Google-Sender-Auth: BWwuTikksOcymGUE1W1pSQIhiB8 Message-ID: From: Artem Belevich To: Michael Richards Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS Kernel Panics with 32 and 64 bit versions of 8.3 and 9.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 May 2012 06:11:03 -0000 I believe I've ran into this issue couple three times. In all cases the culprit was memory corruption. If were to guess, corruption damaged critical data *before* ZFS calculated checksum and was able to write it to disk. Once that happened, kernel would panic every time once the pool was in use. Crashes could happen as soon as zpool import or as late as after few days of uptime or next scheduled scrub. I even tried importing/scrubbing the pool on opensolaris without much success -- while solaris didn't crash outright, it failed to import the pool with internal assertion. On Sat, May 5, 2012 at 7:13 PM, Michael Richards wrote: > Originally I had an 8.1 server setup on a 32bit kernel. The OS is on a > UFS filesystem and (it's a mail server) the business part of the > operation is on ZFS. > > One day it crashed with an odd kernel panic. I assumed it was a memory > issue so I had more RAM installed. I tried to get a PAE kernel working > to use this extra ram but it was crashing every few hours. > > Suspecting a hardware issue all the hardware was replaced. Bad memory could indeed do that. > I had some difficulty trying to figure out how to mount my old ZFS > partition but eventually did so. ... > zpool import -f -R /altroot 10433152746165646153 olddata > panics the kernel. Similar panic as seen in all the other kernel versions. > Gives a bit more info about things I've tried. Whatever it is seems to > affect a wide variety of kernels. Kernel is just a messenger here. The root cause is that while ZFS does go an extra mile or two in order to ensure data consistency, there's only so much it can do if RAM is bad. Once that kind of problem happened, it may leave the pool in a state that ZFS will not be able to deal with out of the box. Not everything may be lost, though. First of all -- make a copy of your pool, if it's feasible. Probability of screwing it up even more is rather high. ZFS internally keeps large number of uberblocks. Each uberblock is sort of a periodic checkpoint of the pool state after ZFS writes next transaction group (every 10-40 sec, depending on vfs.zfs.txg.timeout sysctl, more often if there are a lot of ongoing write activity). Basically you need to destroy the most recent uberblock to manually roll-back your ZFS pool. Hopefully, you'll only need to nuke few most recent ones to restore the pool to the point before corruption ruined it. Now, ZFS keeps multiple copies of uberblocks. You will need to nuke *all* instances of the most recent uberblock in order to roll pool state backwards. Solaris internals site seems to have a script to do that now (I wish I knew about it back when I needed it): http://www.solarisinternals.com/wiki/index.php/ZFS_forensics_scrollback_script Good luck! --Artem From owner-freebsd-fs@FreeBSD.ORG Sun May 6 06:21:50 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 06CA3106566C for ; Sun, 6 May 2012 06:21:50 +0000 (UTC) (envelope-from skvortsov42@gmail.com) Received: from mail-wi0-f172.google.com (mail-wi0-f172.google.com [209.85.212.172]) by mx1.freebsd.org (Postfix) with ESMTP id 8D5918FC0A for ; Sun, 6 May 2012 06:21:49 +0000 (UTC) Received: by wibhj6 with SMTP id hj6so2063320wib.13 for ; Sat, 05 May 2012 23:21:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=jN+0c3MEpPch6G+qXL3FsLOp8agFTxxTLmAdrTM1+8s=; b=evMoQZOiQo2LuG30uFIZ+E0u1Bst10izPtEIgOVr6skZIod7cAy58oAw3xda57stUd 8ivSTO5NLDiGd2rZvU2cNQMZm4mRR4HvDJI/pafSwFctd3QktqA0ITjTvKJx6mpXJ9B2 V1tkWPfJFh8JMRbS36PUNeKG2ksZZFQki+uXVEh+SGCHm/+1vOH16Dl942isOtTt3S/D +tZJtbXU+cMlVU7zrtwFDDzQvDFmJvbt6huwRGgfEpUpRRSFwANW/F4UfmFv1UOtXMox aVvPVp6uTGrxC9KJFWYB6cw028qHuYxRpwzs2vTCUwj6yZNivnVTINDtvxtY7xrnS3Uf ywkw== MIME-Version: 1.0 Received: by 10.216.198.154 with SMTP id v26mr7340909wen.74.1336285303545; Sat, 05 May 2012 23:21:43 -0700 (PDT) Received: by 10.216.143.222 with HTTP; Sat, 5 May 2012 23:21:43 -0700 (PDT) Date: Sat, 5 May 2012 23:21:43 -0700 Message-ID: From: Chris To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Subject: ZFS 4K drive overhead X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 May 2012 06:21:50 -0000 Hi all, I'm planning on making a raidz2 with 6 2 TB drives - all 4K sectors, all reporting as 512 bytes. I've been reading some disturbing things about ZFS when used on 4K drives. In this discussion (http://mail.opensolaris.org/pipermail/zfs-discuss/2011-October/049959.html), Jim Klimov pointed out that when ZFS is used with ashift=12, the metadata overhead for a filesystem with a lot of small files can reach 100% (http://mail.opensolaris.org/pipermail/zfs-discuss/2011-October/049960.html)! That seems pretty bad to me. My questions are: Does anyone on this list have experience using ZFS on 4K drives with ashift=12? Is the overhead per file, such that having a relatively large average filesize, say, 19 MB, would render it insignificant? Or would the overhead be large regardless? What is the speed penalty for using ashift=9 on the array? Is the safety of the data on the array an issue (due to how ZFS can't write to a 512 byte sector but it's coded with the assumption that it can thus making it no longer strictly copy-on-write)? Does anyone have any experience with ashift=9 arrays on 4K drives? Thanks in advance. From owner-freebsd-fs@FreeBSD.ORG Sun May 6 08:46:41 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 88E751065672 for ; Sun, 6 May 2012 08:46:41 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4]) by mx1.freebsd.org (Postfix) with ESMTP id 412BE8FC12 for ; Sun, 6 May 2012 08:46:40 +0000 (UTC) Received: from elsa.codelab.cz (localhost [127.0.0.1]) by elsa.codelab.cz (Postfix) with ESMTP id 0FA3B28424; Sun, 6 May 2012 10:46:34 +0200 (CEST) Received: from [192.168.1.2] (static-84-242-120-26.net.upcbroadband.cz [84.242.120.26]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by elsa.codelab.cz (Postfix) with ESMTPSA id B363A28423; Sun, 6 May 2012 10:46:32 +0200 (CEST) Message-ID: <4FA63A68.3050607@quip.cz> Date: Sun, 06 May 2012 10:46:32 +0200 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.1.19) Gecko/20110420 Lightning/1.0b1 SeaMonkey/2.0.14 MIME-Version: 1.0 To: Chris References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: ZFS 4K drive overhead X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 May 2012 08:46:41 -0000 Chris wrote: > Hi all, > > I'm planning on making a raidz2 with 6 2 TB drives - all 4K sectors, > all reporting as 512 bytes. I've been reading some disturbing things > about ZFS when used on 4K drives. In this discussion > (http://mail.opensolaris.org/pipermail/zfs-discuss/2011-October/049959.html), > Jim Klimov pointed out that when ZFS is used with ashift=12, the > metadata overhead for a filesystem with a lot of small files can reach > 100% (http://mail.opensolaris.org/pipermail/zfs-discuss/2011-October/049960.html)! > That seems pretty bad to me. My questions are: > > Does anyone on this list have experience using ZFS on 4K drives with > ashift=12? Is the overhead per file, such that having a relatively > large average filesize, say, 19 MB, would render it insignificant? Or > would the overhead be large regardless? Average size of 19MB is much more larger than 4k (metadata), the overhead will be not so high as with really small files (files with size of few kB). > What is the speed penalty for using ashift=9 on the array? Is the > safety of the data on the array an issue (due to how ZFS can't write > to a 512 byte sector but it's coded with the assumption that it can > thus making it no longer strictly copy-on-write)? Does anyone have any > experience with ashift=9 arrays on 4K drives? Even if the overhead will be larger, the speed penalty is much higher. You should read about it in some post on this blog: http://blog.des.no/search/label/freebsd There are various articles with banchmarks of 4k sectors drives and some of them are almost useless with unaligned writes. So I strongly recommend you to use 4k (ashift=12). Use ashift=9 only if performance doesn't metter and you are concerned only on available space. Miroslav Lachman From owner-freebsd-fs@FreeBSD.ORG Sun May 6 11:59:56 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7A422106566B; Sun, 6 May 2012 11:59:56 +0000 (UTC) (envelope-from hackish@gmail.com) Received: from mail-ob0-f182.google.com (mail-ob0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 3236F8FC08; Sun, 6 May 2012 11:59:56 +0000 (UTC) Received: by obcni5 with SMTP id ni5so8963270obc.13 for ; Sun, 06 May 2012 04:59:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=33bIbskxxD4Bsjv70Y9Cxc7M30hrvJKprD5mq6XmxeM=; b=bdN8V046l8p87uz+9nE/vcyfzQy/EO2Ogjmo8sp3umbHTLps9ZlNNHZlJSxFu0xQET rfeVnZx1dVHceSovColQIWfZIZzSG6g94YzY03sRCBSko4Ek11LqwjYiw8pPpGIrrsjl kLkHyyY+Rz7ao2dcleZmGRAirnuqZBAFq5Q95kaUFMduiUOws2n4iW5Fd9I0t3mPdbRS ruw/eyCqa+Ec/hartuWlDZ9AkEIgCR046LM3NKXX6MeMgR042Bddj/1FvP7Pgh+pd9Ny SwbUhju/2RTR/ZEVU6oqR7sAEVcpNtIfu1mN3EZBNif81HvpdwViyotMOA5M+2GdkJvT XEIQ== MIME-Version: 1.0 Received: by 10.182.152.72 with SMTP id uw8mr16610179obb.73.1336305595858; Sun, 06 May 2012 04:59:55 -0700 (PDT) Received: by 10.182.27.130 with HTTP; Sun, 6 May 2012 04:59:55 -0700 (PDT) In-Reply-To: References: Date: Sun, 6 May 2012 07:59:55 -0400 Message-ID: From: Michael Richards To: Artem Belevich Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS Kernel Panics with 32 and 64 bit versions of 8.3 and 9.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 May 2012 11:59:56 -0000 >> Suspecting a hardware issue all the hardware was replaced. > > Bad memory could indeed do that. Indeed the memory was the bad hardware. > >> I had some difficulty trying to figure out how to mount my old ZFS >> partition but eventually did so. > ... >> zpool import -f -R /altroot 10433152746165646153 olddata >> panics the kernel. Similar panic as seen in all the other kernel versions. > > >> Gives a bit more info about things I've tried. Whatever it is seems to >> affect a wide variety of kernels. > > Kernel is just a messenger here. The root cause is that while ZFS does > go an extra mile or two in order to ensure data consistency, there's > only so much it can do if RAM is bad. Once that kind of problem > happened, it may leave the pool in a state that ZFS will not be able > to deal with out of the box. I believe that the kernel should be able to handle every type of non-hardware corruption without a panic. At present I'm in the process of running dd if=/dev/ad16s1g > zfsimage.dat with hopes I can send that file home and play with it locally. I'm not a kernel hacker but I'll see what I can do. Based on the backtrace I suspect it is happening during scrub: ... trap_fatal trap_pfault calltrap zio_vdev_io_start zio_execute dsl_scan_scrub_cb > First of all -- make a copy of your pool, if it's feasible. > Probability of screwing it up even more is rather high. I assume running the dd on that slice will do this. I think I read somewhere that you can specify a file instead of a block device when loading a zfs filesystem. I'll see what I can do so the problem itself can be fixed. -Michael From owner-freebsd-fs@FreeBSD.ORG Sun May 6 12:38:26 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 412881065672 for ; Sun, 6 May 2012 12:38:26 +0000 (UTC) (envelope-from simon@optinet.com) Received: from cobra.acceleratedweb.net (cobra-gw.acceleratedweb.net [207.99.79.37]) by mx1.freebsd.org (Postfix) with SMTP id E1A9B8FC17 for ; Sun, 6 May 2012 12:38:25 +0000 (UTC) Received: (qmail 12043 invoked by uid 110); 6 May 2012 12:38:18 -0000 Received: from ool-4571afe7.dyn.optonline.net (HELO desktop1) (simon@optinet.com@69.113.175.231) by cobra.acceleratedweb.net with SMTP; 6 May 2012 12:38:18 -0000 From: "Simon" To: "Artem Belevich" Date: Sun, 06 May 2012 08:38:18 -0400 Priority: Normal X-Mailer: PMMail 2000 Professional (2.20.2717) For Windows 2000 (5.1.2600;3) In-Reply-To: MIME-Version: 1.0 Message-Id: <20120506123826.412881065672@hub.freebsd.org> Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: "freebsd-fs@freebsd.org" Subject: Re: ZFS Kernel Panics with 32 and 64 bit versions of 8.3 and 9.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 May 2012 12:38:26 -0000 Are you suggesting that if a disk sector goes bad or memory corrupts few blocks of data, the entire zpool is gonna go bust? can the same occur with a ZRAID? I thought the ZFS was designed to overcome all these issues to begin with. Is this not the case? -Simon On Sat, 5 May 2012 23:11:01 -0700, Artem Belevich wrote: >I believe I've ran into this issue couple three times. In all cases >the culprit was memory corruption. If were to guess, corruption >damaged critical data *before* ZFS calculated checksum and was able to >write it to disk. Once that happened, kernel would panic every time >once the pool was in use. Crashes could happen as soon as zpool import >or as late as after few days of uptime or next scheduled scrub. I even >tried importing/scrubbing the pool on opensolaris without much success >-- while solaris didn't crash outright, it failed to import the pool >with internal assertion. >On Sat, May 5, 2012 at 7:13 PM, Michael Richards wrote: >> Originally I had an 8.1 server setup on a 32bit kernel. The OS is on a >> UFS filesystem and (it's a mail server) the business part of the >> operation is on ZFS. >> >> One day it crashed with an odd kernel panic. I assumed it was a memory >> issue so I had more RAM installed. I tried to get a PAE kernel working >> to use this extra ram but it was crashing every few hours. >> >> Suspecting a hardware issue all the hardware was replaced. >Bad memory could indeed do that. >> I had some difficulty trying to figure out how to mount my old ZFS >> partition but eventually did so. >... >> zpool import -f -R /altroot 10433152746165646153 olddata >> panics the kernel. Similar panic as seen in all the other kernel versions. >> Gives a bit more info about things I've tried. Whatever it is seems to >> affect a wide variety of kernels. >Kernel is just a messenger here. The root cause is that while ZFS does >go an extra mile or two in order to ensure data consistency, there's >only so much it can do if RAM is bad. Once that kind of problem >happened, it may leave the pool in a state that ZFS will not be able >to deal with out of the box. >Not everything may be lost, though. >First of all -- make a copy of your pool, if it's feasible. >Probability of screwing it up even more is rather high. >ZFS internally keeps large number of uberblocks. Each uberblock is >sort of a periodic checkpoint of the pool state after ZFS writes next >transaction group (every 10-40 sec, depending on vfs.zfs.txg.timeout >sysctl, more often if there are a lot of ongoing write activity). >Basically you need to destroy the most recent uberblock to manually >roll-back your ZFS pool. Hopefully, you'll only need to nuke few most >recent ones to restore the pool to the point before corruption ruined >it. >Now, ZFS keeps multiple copies of uberblocks. You will need to nuke >*all* instances of the most recent uberblock in order to roll pool >state backwards. >Solaris internals site seems to have a script to do that now (I wish I >knew about it back when I needed it): >http://www.solarisinternals.com/wiki/index.php/ZFS_forensics_scrollback_script >Good luck! >--Artem >_______________________________________________ >freebsd-fs@freebsd.org mailing list >http://lists.freebsd.org/mailman/listinfo/freebsd-fs >To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Sun May 6 13:55:32 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9291C1065740 for ; Sun, 6 May 2012 13:55:32 +0000 (UTC) (envelope-from shuey@fmepnet.org) Received: from mail-vb0-f54.google.com (mail-vb0-f54.google.com [209.85.212.54]) by mx1.freebsd.org (Postfix) with ESMTP id 48F048FC18 for ; Sun, 6 May 2012 13:55:32 +0000 (UTC) Received: by vbmv11 with SMTP id v11so3869218vbm.13 for ; Sun, 06 May 2012 06:55:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding:x-gm-message-state; bh=FVOpXmPnkURp/Cne2dGFz8p3C73smi4qbki+Nju5OPI=; b=iT7k1Lp0KWyWRQ7K5sH9YnhobTUmpfUByPS9w/mpyxu83iO8z7Pw/vvQNxtbiikJJG GMAGHMvzUXhEtBS7hwSAiC5tbxfuvAKOvY9A1N+ANQrJyP9Cg+XNInm/1trrJWk0yZUC KA9jQn1A/tf+wLqcgvqcMzxqW8MMEXgSwPOATKsliqtTFrWiHKyJQsDl1rNsPXjH1CDN ZFCGLPR5c48pxlP5EvM6p4q1qsLjzDRgop6yaprP88YC222Nrqbpf14E0DPUXqKrbXdv Z2inRSRBMLwXaR3gqzFxsz9H+gSeeQ9Qz94P7yr84ry2SAz5xq3yYy4SqDax+H4D42L8 ke3w== MIME-Version: 1.0 Received: by 10.52.21.51 with SMTP id s19mr5219668vde.35.1336312173697; Sun, 06 May 2012 06:49:33 -0700 (PDT) Received: by 10.220.210.71 with HTTP; Sun, 6 May 2012 06:49:33 -0700 (PDT) X-Originating-IP: [98.223.59.225] In-Reply-To: <4FA63A68.3050607@quip.cz> References: <4FA63A68.3050607@quip.cz> Date: Sun, 6 May 2012 09:49:33 -0400 Message-ID: From: Michael Shuey To: Miroslav Lachman <000.fbsd@quip.cz> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQmxFXk2McffXQE61w9q0U1c41259WKc1MZCivoSOHEUrZWk398Rah2x7pKkgfB46awd0iwp Cc: freebsd-fs@freebsd.org Subject: Re: ZFS 4K drive overhead X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 May 2012 13:55:32 -0000 A couple months back, I finished rebuilding my spools to use ashift=3D12, after trying a 4k drive in a pool with ashift=3D9. If you try a 4k drive on an ashift=3D9 pool, you're going to have a bad time. Performance for occasional IO (particularly streaming) isn't too bad with mis-aligned sectors. However, resilvering time is MUCH, MUCH, MUCH higher - I saw estimates for resilver completion go up by over an order of magnitude, and pool performance become nearly unusable while a resilver was in operation. ZFS will dynamically adjust block size for a file, between the smallest block size the media supports and 128k or so (IIRC). That means that even if you align a partition on your 4k disk, or use the raw disk itself (so ZFS starts on an aligned sector), after the first small file is written you'll be doing un-aligned IOs. Resilvering a 1.5 TB drive was estimated at over 230 hours for me; it was actually less time to abort and rebuild the server from backups. Given the propensity of 4k drives on the market now, and the likelihood that they'll be the only product available in the future, I'd highly recommend using ashift=3D12 on any new zpools. It's time to stop using ashift=3D9. On Sun, May 6, 2012 at 4:46 AM, Miroslav Lachman <000.fbsd@quip.cz> wrote: > Chris wrote: >> >> Hi all, >> >> I'm planning on making a raidz2 with 6 2 TB drives - all 4K sectors, >> all reporting as 512 bytes. I've been reading some disturbing things >> about ZFS when used on 4K drives. In this discussion >> >> (http://mail.opensolaris.org/pipermail/zfs-discuss/2011-October/049959.h= tml), >> Jim Klimov pointed out that when ZFS is used with ashift=3D12, the >> metadata overhead for a filesystem with a lot of small files can reach >> 100% >> (http://mail.opensolaris.org/pipermail/zfs-discuss/2011-October/049960.h= tml)! >> That seems pretty bad to me. My questions are: >> >> Does anyone on this list have experience using ZFS on 4K drives with >> ashift=3D12? Is the overhead per file, such that having a relatively >> large average filesize, say, 19 MB, would render it insignificant? Or >> would the overhead be large regardless? > > > Average size of 19MB is much more larger than 4k (metadata), the overhead > will be not so high as with really small files (files with size of few kB= ). > > >> What is the speed penalty for using ashift=3D9 on the array? Is the >> safety of the data on the array an issue =A0(due to how ZFS can't write >> to a 512 byte sector but it's coded with the assumption that it can >> thus making it no longer strictly copy-on-write)? Does anyone have any >> experience with ashift=3D9 arrays on 4K drives? > > > Even if the overhead will be larger, the speed penalty is much higher. Yo= u > should read about it in some post on this blog: > > http://blog.des.no/search/label/freebsd > > There are various articles with banchmarks of 4k sectors drives and some = of > them are almost useless with unaligned writes. So I strongly recommend yo= u > to use 4k (ashift=3D12). > > Use ashift=3D9 only if performance doesn't metter and you are concerned o= nly > on available space. > > Miroslav Lachman > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Sun May 6 15:14:13 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EFE19106566B for ; Sun, 6 May 2012 15:14:13 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id B28868FC08 for ; Sun, 6 May 2012 15:14:13 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id q46Exd1s011128; Sun, 6 May 2012 09:59:40 -0500 (CDT) Date: Sun, 6 May 2012 09:59:39 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Simon In-Reply-To: <20120506123826.412881065672@hub.freebsd.org> Message-ID: References: <20120506123826.412881065672@hub.freebsd.org> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Sun, 06 May 2012 09:59:40 -0500 (CDT) Cc: "freebsd-fs@freebsd.org" Subject: Re: ZFS Kernel Panics with 32 and 64 bit versions of 8.3 and 9.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 May 2012 15:14:14 -0000 On Sun, 6 May 2012, Simon wrote: > > Are you suggesting that if a disk sector goes bad or memory corrupts few blocks > of data, the entire zpool is gonna go bust? can the same occur with a ZRAID? > I thought the ZFS was designed to overcome all these issues to begin with. Is > this not the case? ZFS is designed to work with failing disks, but not failing memory. It is recommended to use only systems with ECC memory. The OS itself (any OS!) is succeptible to crash/corruption due to failing memory but without zfs's checksums, you might not be aware of such corruptions or the crash might be more delayed. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Sun May 6 15:14:43 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 82E48106566B for ; Sun, 6 May 2012 15:14:43 +0000 (UTC) (envelope-from simon@optinet.com) Received: from cobra.acceleratedweb.net (cobra-gw.acceleratedweb.net [207.99.79.37]) by mx1.freebsd.org (Postfix) with SMTP id 2DA6E8FC08 for ; Sun, 6 May 2012 15:14:42 +0000 (UTC) Received: (qmail 20243 invoked by uid 110); 6 May 2012 15:14:41 -0000 Received: from ool-4571afe7.dyn.optonline.net (HELO desktop1) (simon@optinet.com@69.113.175.231) by cobra.acceleratedweb.net with SMTP; 6 May 2012 15:14:41 -0000 From: "Simon" To: "Bob Friesenhahn" Date: Sun, 06 May 2012 11:14:42 -0400 Priority: Normal X-Mailer: PMMail 2000 Professional (2.20.2717) For Windows 2000 (5.1.2600;3) In-Reply-To: MIME-Version: 1.0 Message-Id: <20120506151443.82E48106566B@hub.freebsd.org> Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: "freebsd-fs@freebsd.org" Subject: Re: ZFS Kernel Panics with 32 and 64 bit versions of 8.3 and 9.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 May 2012 15:14:43 -0000 So if you have a 50TB ZFS filesystem and your memory goes bad, even if ECC, your entire 50TB is gonna go bunkers? disks fail, but memory doesn't? CPUs don't fail? There are many things in a server that can fail and cause corruption, but that shouldn't take down entire zpool. I'm okay with a few missing files ending up in lost+found, but entire filesystem? That renders the entire thing useless if you ask me. -Simon On Sun, 6 May 2012 09:59:39 -0500 (CDT), Bob Friesenhahn wrote: >On Sun, 6 May 2012, Simon wrote: >> >> Are you suggesting that if a disk sector goes bad or memory corrupts few blocks >> of data, the entire zpool is gonna go bust? can the same occur with a ZRAID? >> I thought the ZFS was designed to overcome all these issues to begin with. Is >> this not the case? >ZFS is designed to work with failing disks, but not failing memory. >It is recommended to use only systems with ECC memory. >The OS itself (any OS!) is succeptible to crash/corruption due to >failing memory but without zfs's checksums, you might not be aware of >such corruptions or the crash might be more delayed. >Bob >-- >Bob Friesenhahn >bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ >GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Sun May 6 15:43:22 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 0944F106566B for ; Sun, 6 May 2012 15:43:22 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id BDE528FC08 for ; Sun, 6 May 2012 15:43:21 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id q46FhLXM011346; Sun, 6 May 2012 10:43:21 -0500 (CDT) Date: Sun, 6 May 2012 10:43:21 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Simon In-Reply-To: <201205061521.q46FLML1011267@blade.simplesystems.org> Message-ID: References: <201205061521.q46FLML1011267@blade.simplesystems.org> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Sun, 06 May 2012 10:43:21 -0500 (CDT) Cc: "freebsd-fs@freebsd.org" Subject: Re: ZFS Kernel Panics with 32 and 64 bit versions of 8.3 and 9.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 May 2012 15:43:22 -0000 On Sun, 6 May 2012, Simon wrote: > > So if you have a 50TB ZFS filesystem and your memory goes bad, even if ECC, > your entire 50TB is gonna go bunkers? disks fail, but memory doesn't? CPUs > don't fail? > > There are many things in a server that can fail and cause corruption, but that > shouldn't take down entire zpool. I'm okay with a few missing files ending up > in lost+found, but entire filesystem? That renders the entire thing useless if you > ask me. By your definition, computers would be useless. :-) There is no telling what might happen if a program (including kernel code) was to execute wrong instructions or read wrong data. This is not specific to zfs. Zfs caches large amounts of data in its in-memory ARC cache, which is succeptible to in-memory corruption. If it tried to detect and prevent memory corruption, it would be extremely slow and likely not work at all if there were actual failures. Part of the metadata structure of the pool needs to be cached in RAM for performance reasons. On the zfs-discuss list we sometimes hear of zfs checksum errors which are due to memory errors rather than disk errors. Zfs can be used without ECC memory, but pool reliability will suffer. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Sun May 6 16:10:38 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 0142B106566C for ; Sun, 6 May 2012 16:10:38 +0000 (UTC) (envelope-from hackish@gmail.com) Received: from mail-ob0-f182.google.com (mail-ob0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id B9AE88FC08 for ; Sun, 6 May 2012 16:10:37 +0000 (UTC) Received: by obcni5 with SMTP id ni5so9323252obc.13 for ; Sun, 06 May 2012 09:10:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=8y9KXQV4GF7qHdAM7JvLjffQyrDkEru4ESwm53v57tM=; b=Bq3tnaqrOytkqU2Ro7yquYVddufuVVDLVd+E1GEEV3+TF5N94HFk6/fL0juFHXFVEc isprUDv7o8XTpe8qzd8X+fCmft1iMGvisg7weLnVLXSkEkTxT9R/FOAapTdjWnnMr+ZS 8bxKRobiw42GCm2zouYJvyxX/G2/kwS8c8idTdcaXqWrRzII94Vzshnj4IOiyehX3Etx A/cLV5BeVkp740bEL1FmcUiQ0c5czUPLR7JxMAniy8CyhIrpgRDMwbZGbAEKhII9kFIT 21szhAX+ZxXjVjJszLo1zx9cOCr8Lg3x6eNfkuRKkkQJPMgAd6Ox5EWMAti2WkMKw/sK Vp1w== MIME-Version: 1.0 Received: by 10.182.167.101 with SMTP id zn5mr6197992obb.13.1336320637350; Sun, 06 May 2012 09:10:37 -0700 (PDT) Received: by 10.182.27.130 with HTTP; Sun, 6 May 2012 09:10:37 -0700 (PDT) In-Reply-To: References: <20120506123826.412881065672@hub.freebsd.org> Date: Sun, 6 May 2012 12:10:37 -0400 Message-ID: From: Michael Richards To: Bob Friesenhahn Content-Type: text/plain; charset=ISO-8859-1 Cc: "freebsd-fs@freebsd.org" Subject: Re: ZFS Kernel Panics with 32 and 64 bit versions of 8.3 and 9.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 May 2012 16:10:38 -0000 On Sun, May 6, 2012 at 10:59 AM, Bob Friesenhahn wrote: > On Sun, 6 May 2012, Simon wrote: > >> >> Are you suggesting that if a disk sector goes bad or memory corrupts few >> blocks >> of data, the entire zpool is gonna go bust? can the same occur with a >> ZRAID? >> I thought the ZFS was designed to overcome all these issues to begin with. >> Is >> this not the case? > > > ZFS is designed to work with failing disks, but not failing memory. It is > recommended to use only systems with ECC memory. > > The OS itself (any OS!) is succeptible to crash/corruption due to failing > memory but without zfs's checksums, you might not be aware of such > corruptions or the crash might be more delayed. I can accept the fact that some filesystem corruption may have happened from the bad RAM. The issue now is recovering it. All the hardware has been replaced but I cannot import the ZFS pool without causing a kernel panic and that is the the problem here. To me it matters not if the corruption occurred from RAM or the hard disk - I don't think it's a good idea to blindly trust any filesystem data. At minimum fail to import the pool but don't bring the entire system to a halt. This isn't even a system drive - it's purely data. From owner-freebsd-fs@FreeBSD.ORG Sun May 6 17:50:08 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A565E106566B for ; Sun, 6 May 2012 17:50:08 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id 5C8638FC0C for ; Sun, 6 May 2012 17:50:08 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id q46Ho7Fm011767; Sun, 6 May 2012 12:50:07 -0500 (CDT) Date: Sun, 6 May 2012 12:50:07 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Michael Richards In-Reply-To: Message-ID: References: <20120506123826.412881065672@hub.freebsd.org> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Sun, 06 May 2012 12:50:07 -0500 (CDT) Cc: "freebsd-fs@freebsd.org" Subject: Re: ZFS Kernel Panics with 32 and 64 bit versions of 8.3 and 9.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 May 2012 17:50:08 -0000 On Sun, 6 May 2012, Michael Richards wrote: > > I can accept the fact that some filesystem corruption may have > happened from the bad RAM. The issue now is recovering it. All the > hardware has been replaced but I cannot import the ZFS pool without > causing a kernel panic and that is the the problem here. To me it > matters not if the corruption occurred from RAM or the hard disk - I > don't think it's a good idea to blindly trust any filesystem data. At > minimum fail to import the pool but don't bring the entire system to a > halt. This isn't even a system drive - it's purely data. These are sentiments that I can agree with. If the import can be so dangerous, it seems that there should be a way to import the pool in a user-mode (outside if kernel space) so that issues can be fixed without panicing the kernel. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Sun May 6 20:59:43 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 926D8106564A for ; Sun, 6 May 2012 20:59:43 +0000 (UTC) (envelope-from simon@optinet.com) Received: from cobra.acceleratedweb.net (cobra-gw.acceleratedweb.net [207.99.79.37]) by mx1.freebsd.org (Postfix) with SMTP id 3554C8FC12 for ; Sun, 6 May 2012 20:59:42 +0000 (UTC) Received: (qmail 42031 invoked by uid 110); 6 May 2012 20:59:41 -0000 Received: from ool-4571afe7.dyn.optonline.net (HELO desktop1) (simon@optinet.com@69.113.175.231) by cobra.acceleratedweb.net with SMTP; 6 May 2012 20:59:41 -0000 From: "Simon" To: "Bob Friesenhahn" Date: Sun, 06 May 2012 16:59:41 -0400 Priority: Normal X-Mailer: PMMail 2000 Professional (2.20.2717) For Windows 2000 (5.1.2600;3) In-Reply-To: MIME-Version: 1.0 Message-Id: <20120506205943.926D8106564A@hub.freebsd.org> Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: "freebsd-fs@freebsd.org" Subject: Re: ZFS Kernel Panics with 32 and 64 bit versions of 8.3 and 9.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 May 2012 20:59:43 -0000 I fully understand the concept behind ECC memory and why it is important to use it in a server environment, especially with a filesystem like ZFS. However, I think my entire point was missed. It appears there are simply too many documented cases where entire pool becomes inaccessible due to limited corruption. Most of the data is still intact but there is no way to recover it. And there is no way to fix the limited inconsistencies which caused the entire pool to become unimportable to begin with. Again, I'm okay with few corrupted or missing files. I'm not okay with entire pool becoming inaccessible due to limited corruption due to faulty memory or otherwise. There needs to be a way to import a corrupted zpool and at very least to be able to read the remaining intact data. -Simon On Sun, 6 May 2012 09:59:39 -0500 (CDT), Bob Friesenhahn wrote: >On Sun, 6 May 2012, Simon wrote: >> >> Are you suggesting that if a disk sector goes bad or memory corrupts few blocks >> of data, the entire zpool is gonna go bust? can the same occur with a ZRAID? >> I thought the ZFS was designed to overcome all these issues to begin with. Is >> this not the case? >ZFS is designed to work with failing disks, but not failing memory. >It is recommended to use only systems with ECC memory. >The OS itself (any OS!) is succeptible to crash/corruption due to >failing memory but without zfs's checksums, you might not be aware of >such corruptions or the crash might be more delayed. >Bob >-- >Bob Friesenhahn >bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ >GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Mon May 7 11:07:11 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D58E31065678 for ; Mon, 7 May 2012 11:07:11 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id BEE118FC15 for ; Mon, 7 May 2012 11:07:11 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q47B7BHm072358 for ; Mon, 7 May 2012 11:07:11 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q47B7BQN072353 for freebsd-fs@FreeBSD.org; Mon, 7 May 2012 11:07:11 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 7 May 2012 11:07:11 GMT Message-Id: <201205071107.q47B7BQN072353@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 May 2012 11:07:11 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/167467 fs [zfs][patch] improve zdb(8) manpage and help. o kern/167447 fs [zfs] [patch] patch to zfs rename -f to perform force o kern/167370 fs [zfs][patch] Unnecessary break point on zfs_main.c. o kern/167272 fs [zfs] ZFS Disks reordering causes ZFS to pick the wron o kern/167266 fs [zfs] [nfs] ZFS + new NFS export (sharenfs) leads to N o kern/167260 fs [msdosfs] msdosfs disk was mounted the second time whe o kern/167109 fs [zfs] [panic] zfs diff kernel panic Fatal trap 9: gene o kern/167105 fs [nfs] mount_nfs can not handle source exports wiht mor o kern/167067 fs [zfs] [panic] ZFS panics the server o kern/167066 fs [zfs] ZVOLs not appearing in /dev/zvol o kern/167065 fs [zfs] boot fails when a spare is the boot disk o kern/167048 fs [nfs] [patch] RELEASE-9 crash when using ZFS+NULLFS+NF o kern/166912 fs [ufs] [panic] Panic after converting Softupdates to jo o kern/166851 fs [zfs] [hang] Copying directory from the mounted UFS di o kern/166566 fs [zfs] zfs split renders 2 disk (MBR based) mirror unbo o kern/166477 fs [nfs] NFS data corruption. o kern/165950 fs [ffs] SU+J and fsck problem o kern/165923 fs [nfs] Writing to NFS-backed mmapped files fails if flu o kern/165521 fs [zfs] [hang] livelock on 1 Gig of RAM with zfs when 31 o kern/165392 fs Multiple mkdir/rmdir fails with errno 31 o kern/165087 fs [unionfs] lock violation in unionfs o kern/164472 fs [ufs] fsck -B panics on particular data inconsistency o kern/164370 fs [zfs] zfs destroy for snapshot fails on i386 and sparc o kern/164261 fs [nullfs] [patch] fix panic with NFS served from NULLFS o kern/164256 fs [zfs] device entry for volume is not created after zfs o kern/164184 fs [ufs] [panic] Kernel panic with ufs_makeinode o kern/163801 fs [md] [request] allow mfsBSD legacy installed in 'swap' o kern/163770 fs [zfs] [hang] LOR between zfs&syncer + vnlru leading to o kern/163501 fs [nfs] NFS exporting a dir and a subdir in that dir to o kern/162944 fs [coda] Coda file system module looks broken in 9.0 o kern/162860 fs [zfs] Cannot share ZFS filesystem to hosts with a hyph o kern/162751 fs [zfs] [panic] kernel panics during file operations o kern/162591 fs [nullfs] cross-filesystem nullfs does not work as expe o kern/162519 fs [zfs] "zpool import" relies on buggy realpath() behavi o kern/162362 fs [snapshots] [panic] ufs with snapshot(s) panics when g o kern/161968 fs [zfs] [hang] renaming snapshot with -r including a zvo o kern/161897 fs [zfs] [patch] zfs partition probing causing long delay o kern/161864 fs [ufs] removing journaling from UFS partition fails on o bin/161807 fs [patch] add option for explicitly specifying metadata o kern/161579 fs [smbfs] FreeBSD sometimes panics when an smb share is o kern/161533 fs [zfs] [panic] zfs receive panic: system ioctl returnin o kern/161511 fs [unionfs] Filesystem deadlocks when using multiple uni o kern/161438 fs [zfs] [panic] recursed on non-recursive spa_namespace_ o kern/161424 fs [nullfs] __getcwd() calls fail when used on nullfs mou o kern/161280 fs [zfs] Stack overflow in gptzfsboot o kern/161205 fs [nfs] [pfsync] [regression] [build] Bug report freebsd o kern/161169 fs [zfs] [panic] ZFS causes kernel panic in dbuf_dirty o kern/161112 fs [ufs] [lor] filesystem LOR in FreeBSD 9.0-BETA3 o kern/160893 fs [zfs] [panic] 9.0-BETA2 kernel panic o kern/160860 fs [ufs] Random UFS root filesystem corruption with SU+J o kern/160801 fs [zfs] zfsboot on 8.2-RELEASE fails to boot from root-o o kern/160790 fs [fusefs] [panic] VPUTX: negative ref count with FUSE o kern/160777 fs [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/impo o kern/160706 fs [zfs] zfs bootloader fails when a non-root vdev exists o kern/160591 fs [zfs] Fail to boot on zfs root with degraded raidz2 [r o kern/160410 fs [smbfs] [hang] smbfs hangs when transferring large fil o kern/160283 fs [zfs] [patch] 'zfs list' does abort in make_dataset_ha o kern/159930 fs [ufs] [panic] kernel core o kern/159402 fs [zfs][loader] symlinks cause I/O errors o kern/159357 fs [zfs] ZFS MAXNAMELEN macro has confusing name (off-by- o kern/159356 fs [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s o kern/159351 fs [nfs] [patch] - divide by zero in mountnfs() o kern/159251 fs [zfs] [request]: add FLETCHER4 as DEDUP hash option o kern/159077 fs [zfs] Can't cd .. with latest zfs version o kern/159048 fs [smbfs] smb mount corrupts large files o kern/159045 fs [zfs] [hang] ZFS scrub freezes system o kern/158839 fs [zfs] ZFS Bootloader Fails if there is a Dead Disk o kern/158802 fs amd(8) ICMP storm and unkillable process. o kern/158231 fs [nullfs] panic on unmounting nullfs mounted over ufs o f kern/157929 fs [nfs] NFS slow read o kern/157399 fs [zfs] trouble with: mdconfig force delete && zfs strip o kern/157179 fs [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov o kern/156797 fs [zfs] [panic] Double panic with FreeBSD 9-CURRENT and o kern/156781 fs [zfs] zfs is losing the snapshot directory, p kern/156545 fs [ufs] mv could break UFS on SMP systems o kern/156193 fs [ufs] [hang] UFS snapshot hangs && deadlocks processes o kern/156039 fs [nullfs] [unionfs] nullfs + unionfs do not compose, re o kern/155615 fs [zfs] zfs v28 broken on sparc64 -current o kern/155587 fs [zfs] [panic] kernel panic with zfs p kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 p kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153520 fs [zfs] Boot from GPT ZFS root on HP BL460c G1 unstable o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/151111 fs [zfs] vnodes leakage during zfs unmount o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147560 fs [zfs] [boot] Booting 8.1-PRERELEASE raidz system take o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an f bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139597 fs [patch] [tmpfs] tmpfs initializes va_gen but doesn't u o kern/139564 fs [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o kern/118318 fs [nfs] NFS server hangs under special circumstances o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118126 fs [nfs] [patch] Poor NFS server write performance o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 275 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon May 7 11:48:01 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 59741106564A; Mon, 7 May 2012 11:48:01 +0000 (UTC) (envelope-from vermaden@interia.pl) Received: from smtpo.poczta.interia.pl (smtpo.poczta.interia.pl [217.74.65.208]) by mx1.freebsd.org (Postfix) with ESMTP id 0C32F8FC08; Mon, 7 May 2012 11:48:01 +0000 (UTC) Date: Mon, 07 May 2012 13:47:53 +0200 From: vermaden To: "Randal L. Schwartz" X-Mailer: interia.pl/pf09 In-Reply-To: <86d36iycca.fsf@red.stonehenge.com> References: <86ipgbg2p6.fsf@red.stonehenge.com> <86d36jzk16.fsf@red.stonehenge.com> <867gwrzjwc.fsf@red.stonehenge.com> <86397fzjgi.fsf@red.stonehenge.com> <86y5p7y478.fsf@red.stonehenge.com> <86d36iycca.fsf@red.stonehenge.com> Message-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=interia.pl; s=biztos; t=1336391273; bh=K70/k8WpehJJZ6DmAZRTkkaPG8cEIqYRKCxb7LyWhwQ=; h=Date:From:Subject:To:Cc:X-Mailer:In-Reply-To:References: Message-Id:MIME-Version:Content-Type:Content-Transfer-Encoding; b=HCLjCGEH4jseiaquJHWquV9J9vJYvDm52agLgY9OrfUlR0iF0FdZHzuFXWL++QS0B sC3Cp8WKSm+ZUVTrVrYFP4qBJictjns5e/GxP2jd4zSDOOjxIqaHXaaR4lc4YAKW9X N0jRdwwGTryUhkkw8XkHmtO29QZKBxPv0+XV4t9U= Cc: freebsd-fs@FreeBSD.org, freebsd-questions@freebsd.org Subject: Re: HOWTO: FreeBSD ZFS Madness (Boot Environments) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 May 2012 11:48:01 -0000 > Good to see you've finally been burned. > You'll never make that mistake again. :) I liked that syntax: ASD && { asd } || { bsd } mostly because of syntax highlighting, to be precise highlighting of the second bracket of a pair at editors, nor VIM neither GEANY highlight if/then/elif/else/fi unfortunately, seems that I will have to live with that ;p > OK, I'll give that a try. Thanks for being persistent with me. Did it worked? Regards, vermaden -- ... From owner-freebsd-fs@FreeBSD.ORG Mon May 7 13:03:29 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 142F11065672; Mon, 7 May 2012 13:03:29 +0000 (UTC) (envelope-from merlyn@stonehenge.com) Received: from gw15.lax01.mailroute.net (lax-gw15.mailroute.net [199.89.0.115]) by mx1.freebsd.org (Postfix) with ESMTP id E1DA68FC0A; Mon, 7 May 2012 13:03:28 +0000 (UTC) Received: from localhost (localhost.localdomain [127.0.0.1]) by gw15.lax01.mailroute.net (Postfix) with ESMTP id D5A41E36368; Mon, 7 May 2012 13:03:22 +0000 (GMT) X-Virus-Scanned: by MailRoute Received: from gw15.lax01.mailroute.net ([199.89.0.115]) by localhost (gw15.lax01.mailroute.net.mailroute.net [127.0.0.1]) (mroute_mailscanner, port 10026) with LMTP id TQEIdq212u-e; Mon, 7 May 2012 13:03:17 +0000 (GMT) Received: from red.stonehenge.com (red.stonehenge.com [208.79.95.2]) by gw15.lax01.mailroute.net (Postfix) with ESMTP id CB2ACE363B4; Mon, 7 May 2012 13:03:17 +0000 (GMT) Received: by red.stonehenge.com (Postfix, from userid 1001) id C39831803; Mon, 7 May 2012 06:03:17 -0700 (PDT) From: merlyn@stonehenge.com (Randal L. Schwartz) To: vermaden References: <86ipgbg2p6.fsf@red.stonehenge.com> <86d36jzk16.fsf@red.stonehenge.com> <867gwrzjwc.fsf@red.stonehenge.com> <86397fzjgi.fsf@red.stonehenge.com> <86y5p7y478.fsf@red.stonehenge.com> <86d36iycca.fsf@red.stonehenge.com> x-mayan-date: Long count = 12.19.19.6.12; tzolkin = 10 Eb; haab = 15 Uo Date: Mon, 07 May 2012 06:03:17 -0700 In-Reply-To: (vermaden@interia.pl's message of "Mon, 07 May 2012 13:47:53 +0200") Message-ID: <86ehqwb0tm.fsf@red.stonehenge.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-fs@FreeBSD.org, freebsd-questions@freebsd.org Subject: Re: HOWTO: FreeBSD ZFS Madness (Boot Environments) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 May 2012 13:03:29 -0000 >>>>> "vermaden" == vermaden writes: >> Good to see you've finally been burned. >> You'll never make that mistake again. :) vermaden> I liked that syntax: vermaden> ASD && { vermaden> asd vermaden> } || { vermaden> bsd vermaden> } vermaden> mostly because of syntax highlighting, to be precise highlighting vermaden> of the second bracket of a pair at editors, nor VIM neither GEANY vermaden> highlight if/then/elif/else/fi unfortunately, seems that I will have vermaden> to live with that ;p Emacs indents it nicely, and colorizes the keywords so that it stands out. -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc. See http://methodsandmessages.posterous.com/ for Smalltalk discussion From owner-freebsd-fs@FreeBSD.ORG Mon May 7 14:05:30 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 84E46106564A; Mon, 7 May 2012 14:05:30 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 589578FC0C; Mon, 7 May 2012 14:05:30 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id C1E01B95B; Mon, 7 May 2012 10:05:29 -0400 (EDT) From: John Baldwin To: freebsd-fs@freebsd.org Date: Mon, 7 May 2012 09:53:03 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p13; KDE/4.5.5; amd64; ; ) References: <4F8999D2.1080902@FreeBSD.org> <4FA4F36A.6030903@FreeBSD.org> <4FA4F883.2060008@FreeBSD.org> In-Reply-To: <4FA4F883.2060008@FreeBSD.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201205070953.04032.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 07 May 2012 10:05:29 -0400 (EDT) Cc: freebsd-hackers@freebsd.org, Andriy Gapon Subject: Re: [review request] zfsboot/zfsloader: support accessing filesystems within a pool X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 May 2012 14:05:30 -0000 On Saturday, May 05, 2012 5:53:07 am Andriy Gapon wrote: > on 05/05/2012 12:31 Andriy Gapon said the following: > > on 04/05/2012 18:25 John Baldwin said the following: > >> On Thursday, May 03, 2012 11:23:51 am Andriy Gapon wrote: > >>> on 03/05/2012 18:02 Andriy Gapon said the following: > >>>> > >>>> Here's the latest version of the patches: > >>>> http://people.freebsd.org/~avg/zfsboot.patches.4.diff > >>> > >>> I've found a couple of problems in the previous version, so here's another one: > >>> http://people.freebsd.org/~avg/zfsboot.patches.5.diff > >>> The important change is in the first patch (__exec args). > >> > >> A few comments/suggestions on the args bits: > > > > John, > > > > these are excellent suggestions! Thank you! > > The new patchset: http://people.freebsd.org/~avg/zfsboot.patches.7.diff Looks great, thanks! A few replies below: > >> - Add a CTASSERT() in loader/main.c that BI_SIZE == sizeof(struct bootinfo) > > > > I have added a definition of CTASSERT to boostrap.h as it was not available for > > sys/boot and there were two local definitions of the macro in individual files. > > > > However the assertion would fail right now. > > The backward-compatible value of BI_SIZE (72 == 0x48) covers only part of the > > fields in struct bootinfo, those up to the following comment: > > /* Items below only from advanced bootloader */ > > > > I am a little bit hesitant: should I increase BI_SIZE to cover the whole struct > > bootinfo or should I compare BI_SIZE to offsetof bi_kernend? Actually, we should probably be reading the 'bi_size' field and not using a BI_SIZE constant at all? Looks like only the non-functional EFI boot loader doesn't set bi_size (and it should just be fixed to do so since it needs to pass new fields in anyway). > > I've decided to define ARGADJ in the new common header, then I've had to rename > > btxcsu.s to .S, so that the preprocessing is executed for it. Ok. Maybe add one comment to the bootargs.h head to explain that the 'bootargs' struct starts at ARGOFF and can grow up, while struct bootinfo is copied such that it's end is at the top of the argument area and grows down. Also, at some point we could use a genassym.c file ala the kernel builds to generate some of the constants in bootargs.h instead (e.g. the offsets of fields within structures, and BA_SIZE, though we probably want to ensure that BA_SIZE never changes). -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Mon May 7 14:35:56 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 667BF106564A; Mon, 7 May 2012 14:35:56 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 0C2B08FC0C; Mon, 7 May 2012 14:35:54 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA06392; Mon, 07 May 2012 17:34:37 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4FA7DD7C.4070703@FreeBSD.org> Date: Mon, 07 May 2012 17:34:36 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:12.0) Gecko/20120503 Thunderbird/12.0.1 MIME-Version: 1.0 To: Bruce Evans References: <4F8999D2.1080902@FreeBSD.org> <4FA29E1B.7040005@FreeBSD.org> <4FA2A307.2090108@FreeBSD.org> <201205041125.15155.jhb@freebsd.org> <4FA4F36A.6030903@FreeBSD.org> <20120505194459.D1295@besplex.bde.org> In-Reply-To: <20120505194459.D1295@besplex.bde.org> X-Enigmail-Version: 1.5pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, freebsd-hackers@FreeBSD.org, John Baldwin Subject: Re: [review request] zfsboot/zfsloader: support accessing filesystems within a pool X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 May 2012 14:35:56 -0000 on 05/05/2012 13:49 Bruce Evans said the following: > On Sat, 5 May 2012, Andriy Gapon wrote: > >> on 04/05/2012 18:25 John Baldwin said the following: >>> On Thursday, May 03, 2012 11:23:51 am Andriy Gapon wrote: >>>> on 03/05/2012 18:02 Andriy Gapon said the following: >>>>> >>>>> Here's the latest version of the patches: >>>>> http://people.freebsd.org/~avg/zfsboot.patches.4.diff >>>> >>>> I've found a couple of problems in the previous version, so here's another one: >>>> http://people.freebsd.org/~avg/zfsboot.patches.5.diff >>>> The important change is in the first patch (__exec args). >>> >>> A few comments/suggestions on the args bits: >> >> John, >> >> these are excellent suggestions! Thank you! >> Some comments: >>> - Add #ifndef LOCORE guards to the new header around the structure so >>> it can be used in assembly as well as C. >> >> Done. I have had to go into a few btx makefiles and add a necessary include >> path and -DLOCORE to make the header usable from asm. Bruce, first a note that the change that we discussed affects (should affect) only BTX code and as such only boot1/2 -> loader interface. > Ugh, why not use genassym, as is done for all old uses of this header in > locore.s, at least on i386 (5% of the i386 genassym.c is for this). Can not parse 'this header' in this context. We were talking about a new header file, so there could not be any old uses of it :-) Probably you meant sys/i386/include/bootinfo.h ? But, as you say later, it's probably not easy to use genassym with sys/boot code. Not sure if it would be worth while going this path given the possible alternatives. >>> - Move BI_SIZE and ARGOFF into the header as constants. >> >> Done. >> >>> - Add a CTASSERT() in loader/main.c that BI_SIZE == sizeof(struct bootinfo) > > Ugh, BI_SIZE was already used in locore.s. OK, but this is "the other" BI_SIZE. Maybe the name clash is not nice indeed, though. > It wasn't the size of the struct, > but was the offset of the field that gives the size. No CTASSERT() was > needed -- the size is whatever it is, as given by sizeof() on the struct > at the time of compilation of the utility that initializes the struct. > It was a feature that sizeof() and offsetof() can't be used in asm so they > must be translated in genassym and no macros are needed in the header (the > size was fully dynamic, so the asm code only needs the offsetof() values). > Of course, you could use CTASSERT()s to check that the struct layout didn't > get broken. The old code just assumes that the struct is packed by the > programmer and that the arch's struct packing conventions don't change, > so that for example BI_SIZE = offsetof(struct bootinfo, bi_size) never > changes. It seems that boot1/2 -> kernel interface and boo1/2 -> {btxldr, btx} -> loader interfaces are quite independent and a bit different. > genassym is hard to use in boot programs, but the old design was that > boot programs shouldn't use bootinfo in asm and should just use the > target bootinfo.h at compile time (whatever time the target is compiled). I am not sure if it is worthwhile adapting genassym to sys/boot... BTX code needs to know only "some size" of bootinfo. Although it doesn't look like boot1/2 passes anything really useful to loader via bootinfo except for bi_bios_dev. For that matter it looks like maybe only two fields from the whole (x86) bootinfo are useful to (x86) kernel either... > Anyway, LOCORE means "for use in locore.[sS]", so other uses of it, e.g. > in boot programs, are bogus. That's a good point. Maybe we should use some more generic name. Maybe there is even some macro that is always set for .S files that we can check. Oh, thank google, is __ASSEMBLER__ it? It seems like couple of non-x86 headers already use this macro. >> I have added a definition of CTASSERT to boostrap.h as it was not available for >> sys/boot and there were two local definitions of the macro in individual files. >> >> However the assertion would fail right now. >> The backward-compatible value of BI_SIZE (72 == 0x48) covers only part of the > > This isn't backwards compatible. BI_SIZE was decimal 48 (covers everything > up to the bi_size field). I meant backward compatible with the BTX code that I was changing, of course. >> fields in struct bootinfo, those up to the following comment: >> /* Items below only from advanced bootloader */ >> I am a little bit hesitant: should I increase BI_SIZE to cover the whole struct >> bootinfo or should I compare BI_SIZE to offsetof bi_kernend? > > Neither. BI_SIZE shouldn't exist. It defeats the bi_size field. Using the bi_size field may be the proper solution indeed. Even if no data beyond certain offset is ever used by loader. The planned changes to BTX code should make using bi_size easier. >>> Maybe >>> create a 'struct kargs_ext' that looks like this: >>> >>> struct kargs_ext { >>> uint32_t size; >>> char data[0]; >>> }; >> >> I've decided to skip on this. > > Use KNF indentation and KNF field prefixes (ka_) if you add it :-). Generic > field names like `size' and `data' need prefixes more than mos. > > The old struct was: > > % #define N_BIOS_GEOM 8 > % ... > % /* > % * A zero bootinfo field often means that there is no info available. > % * Flags are used to indicate the validity of fields where zero is a > % * normal value. > % */ > % struct bootinfo { > % u_int32_t bi_version; > % u_int32_t bi_kernelname; /* represents a char * */ > % u_int32_t bi_nfs_diskless; /* struct nfs_diskless * */ > % /* End of fields that are always present. */ > > The original size was apparently 12. > > % #define bi_endcommon bi_n_bios_used > > Another style difference. The magic 12 is essentially given by this macro. > This macro is a pseudo-field, like the ones for the copyable and zeroable > regions in struct proc. Its name is in lower case. > > % u_int32_t bi_n_bios_used; > % u_int32_t bi_bios_geom[N_BIOS_GEOM]; > > The struct was broken in 1994 by adding the above 2 fields without providing > any way to distinguish it from the old struct. > > % u_int32_t bi_size; > % u_int8_t bi_memsizes_valid; > % u_int8_t bi_bios_dev; /* bootdev BIOS unit number */ > % u_int8_t bi_pad[2]; > % u_int32_t bi_basemem; > % u_int32_t bi_extmem; > % u_int32_t bi_symtab; /* struct symtab * */ > % u_int32_t bi_esymtab; /* struct symtab * */ > > The above 8 fields were added in 1995 (together with fixing style bugs > like no prefixes for the old field names). Now the struct is determined > by its size according to the bi_size field, and the bi_version field is > not really used (it's much easier to add stuff to the end than to support > multiple versions). This gives a range of old sizes/versions: > > 12: ~1993 (FreeBSD-1) > 48: ~1994 (FreeBSD-1 and/or 2) > 0x48: FreeBSD-2 post-1995 > > But these old sizes are uninteresting since only boot loaders from before > 1993-1995 support only the above fields, and these loaders can't boot current > kernels. > > % /* Items below only from advanced bootloader */ > % u_int32_t bi_kernend; /* end of kernel space */ > % u_int32_t bi_envp; /* environment */ > % u_int32_t bi_modulep; /* preloaded modules */ > > Added in 1998. Still uninteresting, since boot loaders newer than that > are needed to boot current kernels (mainly for elf). > > % uint64_t bi_hcdp; /* DIG64 HCDP table */ > % uint64_t bi_fpswa; /* FPSWA interface */ > % uint64_t bi_systab; /* pa of EFI system table */ > % uint64_t bi_memmap; /* pa of EFI memory map */ > % uint64_t bi_memmap_size; /* size of EFI memory map */ > % uint64_t bi_memdesc_size; /* sizeof EFI memory desc */ > % uint32_t bi_memdesc_version; /* EFI memory desc version */ > > Added in 2010. Are all of these uint64_t types correct? The padding seems > to be broken, so that these fields would not work for amd64: we're at offset > 0x48 for bi_kernend. The 3 uint32_t's added in 1998 reach 0x54. Then all > the uint64_t fields are misaligned on i386, and on amd64 there is unnamed > padding before the first of them to align them. But and64 doesn't use > bootinfo.h in the kernel, so I think only the i386 version is used on amd64 > (in the boot loader), so the misaligned case isn't used. Interesting observations. It looks like these newest fields were ported from IA64 for EFI support, but it doesn't look like that support is actually in x86 yet. > The struct declaration is also broken at the end. The last field is 32 bits, > so there is unnamed padding after it on amd64 only. This padding should be > explicit, like the padding before the uint64_t fields, or just put the 32-bit > field before the 64-bit fields. > > % }; > > So apart you could hard-code the size to the 1998 value of 0x54 without > losing anything except the buggy 2010 fields. But it shouldn't be > hard-coded. I am inclined to agree. Thank you again. P.S. Actually I feel like arguing if the genassym approach is totally correct/safe for BI_SIZE. One could easily insert a field before bi_size and thus change BI_SIZE and thus break compatibility with binaries compiled before the change. And all that without getting any hint during compilation. OTOH, if BI_SIZE is explicitly defined to constant and there is a CTASSERT to assert that BI_SIZE == offsetof(..., bi_size), then the chances of unwittingly breaking things are smaller. Of course, something like this would never happen in reality. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Mon May 7 14:47:09 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D8898106567A; Mon, 7 May 2012 14:47:09 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id C12CE8FC14; Mon, 7 May 2012 14:47:08 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA06503; Mon, 07 May 2012 17:47:07 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4FA7E069.8020208@FreeBSD.org> Date: Mon, 07 May 2012 17:47:05 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:12.0) Gecko/20120503 Thunderbird/12.0.1 MIME-Version: 1.0 To: John Baldwin References: <4F8999D2.1080902@FreeBSD.org> <4FA4F36A.6030903@FreeBSD.org> <4FA4F883.2060008@FreeBSD.org> <201205070953.04032.jhb@freebsd.org> In-Reply-To: <201205070953.04032.jhb@freebsd.org> X-Enigmail-Version: 1.5pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, freebsd-hackers@FreeBSD.org Subject: Re: [review request] zfsboot/zfsloader: support accessing filesystems within a pool X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 May 2012 14:47:10 -0000 on 07/05/2012 16:53 John Baldwin said the following: > On Saturday, May 05, 2012 5:53:07 am Andriy Gapon wrote: [snip] >> The new patchset: http://people.freebsd.org/~avg/zfsboot.patches.7.diff > > Looks great, thanks! A few replies below: Here's a followup patch for the suggestions: http://people.freebsd.org/~avg/bootargs.followup.diff I will merge it into the main patch. What do you think about the -LOCORE- change that Bruce inspired? >>>> - Add a CTASSERT() in loader/main.c that BI_SIZE == sizeof(struct bootinfo) >>> >>> I have added a definition of CTASSERT to boostrap.h as it was not available for >>> sys/boot and there were two local definitions of the macro in individual files. >>> >>> However the assertion would fail right now. >>> The backward-compatible value of BI_SIZE (72 == 0x48) covers only part of the >>> fields in struct bootinfo, those up to the following comment: >>> /* Items below only from advanced bootloader */ >>> >>> I am a little bit hesitant: should I increase BI_SIZE to cover the whole struct >>> bootinfo or should I compare BI_SIZE to offsetof bi_kernend? > > Actually, we should probably be reading the 'bi_size' field and not using a BI_SIZE > constant at all? Done in the above patch. > Looks like only the non-functional EFI boot loader doesn't set bi_size (and it should > just be fixed to do so since it needs to pass new fields in anyway). > >>> I've decided to define ARGADJ in the new common header, then I've had to rename >>> btxcsu.s to .S, so that the preprocessing is executed for it. > > Ok. Maybe add one comment to the bootargs.h head to explain that the 'bootargs' > struct starts at ARGOFF and can grow up, while struct bootinfo is copied such that > it's end is at the top of the argument area and grows down. Will do. > Also, at some point we could use a genassym.c file ala the kernel builds to generate > some of the constants in bootargs.h instead (e.g. the offsets of fields within > structures, and BA_SIZE, though we probably want to ensure that BA_SIZE never > changes). The genassym approach sounds good, but, indeed - later :) Thank you. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Mon May 7 15:15:57 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id ABA39106566C; Mon, 7 May 2012 15:15:57 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 92CE08FC16; Mon, 7 May 2012 15:15:56 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id SAA06720; Mon, 07 May 2012 18:15:54 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4FA7E729.4000308@FreeBSD.org> Date: Mon, 07 May 2012 18:15:53 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:12.0) Gecko/20120503 Thunderbird/12.0.1 MIME-Version: 1.0 To: John Baldwin References: <4F8999D2.1080902@FreeBSD.org> <4FA4F36A.6030903@FreeBSD.org> <4FA4F883.2060008@FreeBSD.org> <201205070953.04032.jhb@freebsd.org> <4FA7E069.8020208@FreeBSD.org> In-Reply-To: <4FA7E069.8020208@FreeBSD.org> X-Enigmail-Version: 1.5pre Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, freebsd-hackers@FreeBSD.org Subject: Re: [review request] zfsboot/zfsloader: support accessing filesystems within a pool X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 May 2012 15:15:57 -0000 on 07/05/2012 17:47 Andriy Gapon said the following: > on 07/05/2012 16:53 John Baldwin said the following: >> Ok. Maybe add one comment to the bootargs.h head to explain that the 'bootargs' >> struct starts at ARGOFF and can grow up, while struct bootinfo is copied such that >> it's end is at the top of the argument area and grows down. > > Will do. Could you please check the wording and correct it or suggest alternatives? Thank you. diff --git a/sys/boot/i386/common/bootargs.h b/sys/boot/i386/common/bootargs.h index 510efdd..8bc1b32 100644 --- a/sys/boot/i386/common/bootargs.h +++ b/sys/boot/i386/common/bootargs.h @@ -29,6 +29,15 @@ #define BF_OFF 8 /* offsetof(struct bootargs, bootflags) */ #define BI_OFF 20 /* offsetof(struct bootargs, bootinfo) */ +/* + * We reserve some space above BTX allocated stack for the arguments + * and certain data that could hang off them. Currently only struct bootinfo + * is supported in that category. The bootinfo is placed at the top + * of the arguments area and the actual arguments are placed at ARGOFF offset + * from the top and grow towards the top. Hopefully we have enough space + * for bootinfo and the arguments to not run into each other. + * Arguments area below ARGOFF is reserved for future use. + */ #define ARGSPACE 0x1000 /* total size of the BTX args area */ #define ARGOFF 0x800 /* actual args offset within the args area */ #define ARGADJ (ARGSPACE - ARGOFF) -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Mon May 7 17:46:12 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2A83D1065670; Mon, 7 May 2012 17:46:12 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id F217C8FC14; Mon, 7 May 2012 17:46:11 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 6F03EB95D; Mon, 7 May 2012 13:46:11 -0400 (EDT) From: John Baldwin To: Andriy Gapon Date: Mon, 7 May 2012 13:38:12 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p13; KDE/4.5.5; amd64; ; ) References: <4F8999D2.1080902@FreeBSD.org> <4FA7E069.8020208@FreeBSD.org> <4FA7E729.4000308@FreeBSD.org> In-Reply-To: <4FA7E729.4000308@FreeBSD.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201205071338.12807.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 07 May 2012 13:46:11 -0400 (EDT) Cc: freebsd-fs@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: [review request] zfsboot/zfsloader: support accessing filesystems within a pool X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 May 2012 17:46:12 -0000 On Monday, May 07, 2012 11:15:53 am Andriy Gapon wrote: > on 07/05/2012 17:47 Andriy Gapon said the following: > > on 07/05/2012 16:53 John Baldwin said the following: > >> Ok. Maybe add one comment to the bootargs.h head to explain that the 'bootargs' > >> struct starts at ARGOFF and can grow up, while struct bootinfo is copied such that > >> it's end is at the top of the argument area and grows down. > > > > Will do. > > Could you please check the wording and correct it or suggest alternatives? > Thank you. > > diff --git a/sys/boot/i386/common/bootargs.h b/sys/boot/i386/common/bootargs.h > index 510efdd..8bc1b32 100644 > --- a/sys/boot/i386/common/bootargs.h > +++ b/sys/boot/i386/common/bootargs.h > @@ -29,6 +29,15 @@ > #define BF_OFF 8 /* offsetof(struct bootargs, bootflags) */ > #define BI_OFF 20 /* offsetof(struct bootargs, bootinfo) */ > > +/* > + * We reserve some space above BTX allocated stack for the arguments > + * and certain data that could hang off them. Currently only struct bootinfo > + * is supported in that category. The bootinfo is placed at the top > + * of the arguments area and the actual arguments are placed at ARGOFF offset > + * from the top and grow towards the top. Hopefully we have enough space > + * for bootinfo and the arguments to not run into each other. > + * Arguments area below ARGOFF is reserved for future use. > + */ > #define ARGSPACE 0x1000 /* total size of the BTX args area */ > #define ARGOFF 0x800 /* actual args offset within the args area */ > #define ARGADJ (ARGSPACE - ARGOFF) I think this is good, thanks! -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Mon May 7 17:46:12 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id ED4A5106566B; Mon, 7 May 2012 17:46:12 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id C18788FC15; Mon, 7 May 2012 17:46:12 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 27BB1B978; Mon, 7 May 2012 13:46:12 -0400 (EDT) From: John Baldwin To: Andriy Gapon Date: Mon, 7 May 2012 13:43:45 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p13; KDE/4.5.5; amd64; ; ) References: <4F8999D2.1080902@FreeBSD.org> <201205070953.04032.jhb@freebsd.org> <4FA7E069.8020208@FreeBSD.org> In-Reply-To: <4FA7E069.8020208@FreeBSD.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201205071343.45955.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 07 May 2012 13:46:12 -0400 (EDT) Cc: freebsd-fs@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: [review request] zfsboot/zfsloader: support accessing filesystems within a pool X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 May 2012 17:46:13 -0000 On Monday, May 07, 2012 10:47:05 am Andriy Gapon wrote: > on 07/05/2012 16:53 John Baldwin said the following: > > On Saturday, May 05, 2012 5:53:07 am Andriy Gapon wrote: > [snip] > >> The new patchset: http://people.freebsd.org/~avg/zfsboot.patches.7.diff > > > > Looks great, thanks! A few replies below: > > Here's a followup patch for the suggestions: > http://people.freebsd.org/~avg/bootargs.followup.diff > I will merge it into the main patch. > > What do you think about the -LOCORE- change that Bruce inspired? In general I think this looks good. I have only one suggestion. In other code (e.g. the genassym constants in the kernel) where we define constants for field offsets, we make the constant be the uppercase name of the field itself (e.g. TD_PCB for offsetof(struct thread, td_pcb)). I would rather do that here as well. In this case the field names do not have a prefix, but let's just use a BA_ prefix for members of 'bootargs'. BI_SIZE is already correct, but this would mean renaming HT_OFF to BA_HOWTO, BF_OFF to BA_BOOTFLAGS, and BI_OFF to BA_BOOTINFO. I think you can probably leave BA_SIZE as-is. > > Also, at some point we could use a genassym.c file ala the kernel builds to generate > > some of the constants in bootargs.h instead (e.g. the offsets of fields within > > structures, and BA_SIZE, though we probably want to ensure that BA_SIZE never > > changes). > > The genassym approach sounds good, but, indeed - later :) Yes, that can wait. I think it would not be very hard to do however. All you really need is access to sys/kern/genassym.sh and nm. -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Mon May 7 17:48:22 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3A138106566C for ; Mon, 7 May 2012 17:48:22 +0000 (UTC) (envelope-from simon@comsys.ntu-kpi.kiev.ua) Received: from comsys.kpi.ua (comsys.kpi.ua [77.47.192.42]) by mx1.freebsd.org (Postfix) with ESMTP id A640D8FC1A for ; Mon, 7 May 2012 17:48:21 +0000 (UTC) Received: from pm513-1.comsys.kpi.ua ([10.18.52.101] helo=pm513-1.comsys.ntu-kpi.kiev.ua) by comsys.kpi.ua with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.63) (envelope-from ) id 1SRS2f-0006I1-Qx; Mon, 07 May 2012 20:48:13 +0300 Received: by pm513-1.comsys.ntu-kpi.kiev.ua (Postfix, from userid 1001) id B0C111CC31; Mon, 7 May 2012 20:48:13 +0300 (EEST) Date: Mon, 7 May 2012 20:48:13 +0300 From: Andrey Simonenko To: Rick Macklem Message-ID: <20120507174813.GA5927@pm513-1.comsys.ntu-kpi.kiev.ua> References: <1494135294.103829.1335731763653.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1494135294.103829.1335731763653.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Mutt/1.5.21 (2010-09-15) X-Authenticated-User: simon@comsys.ntu-kpi.kiev.ua X-Authenticator: plain X-Sender-Verify: SUCCEEDED (sender exists & accepts mail) X-Exim-Version: 4.63 (build at 28-Apr-2011 07:11:12) X-Date: 2012-05-07 20:48:13 X-Connected-IP: 10.18.52.101:29819 X-Message-Linecount: 62 X-Body-Linecount: 46 X-Message-Size: 2847 X-Body-Size: 2070 Cc: freebsd-fs@freebsd.org Subject: Re: NFSv4 Questions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 May 2012 17:48:22 -0000 On Sun, Apr 29, 2012 at 04:36:03PM -0400, Rick Macklem wrote: > > Also, be sure to check "man nfsv4" and maybe reference it (it is currently > in the See Also list, but that might not be strong enough). There is another question not explained in documentation (I could not find the answer at least). Currently NFSv3 client uses reserved port for NFS mounts and uses non reserved port if "noresvport" is specified. NFSv4 client always uses non reserved port, ignoring the "resvport" option in the mount_nfs command. Such behaviour of NFS client was introduced in 1.18 version of fs/nfsclient/nfs_clvfsops.c [1], where the "resvport" flag is cleared for NFSv4 mounts. Why does "reserved port logic" differ in NFSv3 and NFSv4 clients? > > If I may, perhaps a switch in /etc/rc.conf: > > nfsv4_only="YES" > > > I might call it nfsv4_server_only, but sounds like a good suggestion. > > > This would set the -nfsv4-only switch you mention for mountd, and it > > would set vfs.nfsd.server_min_nfsvers=4 > > > It could also be used by /etc/rc.d/mountd to indicate "don't force rpcbind". > I'm sure that you know all these, but let me add some comments. 1. Using server_min_nfsvers and server_max_nfsvers are global settings and do not allow to make one file system NFSv2/3/4 exported and another one NFSv4 exported only for example. 2. MOUNT protocol is not used only for MNT/UMNT/UMNTALL requests from NFSv2/3 clients. As I know some automounters use MOUNT EXPORT requests to get information about exported file systems. So, MOUNT protocol can be usefull for somebody who uses NFSv4 only. Both items have something common. There should be options that enable/ disables NFSv2, NFSv3 and/or NFSv4 per address specification and/or per file system. And there should be the option that allows to disable the MOUNT protocol entirely, some of its versions or some of its netconfigs (some of visible netconfigs that can be used by the MOUNT protocol). [1] http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/fs/nfsclient/nfs_clvfsops.c.diff?r1=1.17;r2=1.18 From owner-freebsd-fs@FreeBSD.ORG Mon May 7 18:49:21 2012 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 0233B106566B; Mon, 7 May 2012 18:49:21 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id CAB5F8FC0C; Mon, 7 May 2012 18:49:20 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q47InKlr008282; Mon, 7 May 2012 18:49:20 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q47InKbk008278; Mon, 7 May 2012 18:49:20 GMT (envelope-from linimon) Date: Mon, 7 May 2012 18:49:20 GMT Message-Id: <201205071849.q47InKbk008278@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/167685: [zfs] ZFS on USB drive prevents shutdown / reboot X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 May 2012 18:49:21 -0000 Old Synopsis: ZFS on USB drive prevents shutdown / reboot New Synopsis: [zfs] ZFS on USB drive prevents shutdown / reboot Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Mon May 7 18:49:08 UTC 2012 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=167685 From owner-freebsd-fs@FreeBSD.ORG Mon May 7 18:49:45 2012 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E9B48106566C; Mon, 7 May 2012 18:49:45 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id BE1648FC08; Mon, 7 May 2012 18:49:45 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q47InjUH008368; Mon, 7 May 2012 18:49:45 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q47InjHP008364; Mon, 7 May 2012 18:49:45 GMT (envelope-from linimon) Date: Mon, 7 May 2012 18:49:45 GMT Message-Id: <201205071849.q47InjHP008364@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/167688: [fusefs] Incorrect signal handling with direct_io X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 May 2012 18:49:46 -0000 Old Synopsis: fusefs. Incorrect signal handling with direct_io New Synopsis: [fusefs] Incorrect signal handling with direct_io Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Mon May 7 18:49:28 UTC 2012 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=167688 From owner-freebsd-fs@FreeBSD.ORG Mon May 7 18:55:20 2012 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 999381065679; Mon, 7 May 2012 18:55:20 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 6C8418FC17; Mon, 7 May 2012 18:55:20 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q47ItFft017488; Mon, 7 May 2012 18:55:15 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q47ItFfJ017484; Mon, 7 May 2012 18:55:15 GMT (envelope-from linimon) Date: Mon, 7 May 2012 18:55:15 GMT Message-Id: <201205071855.q47ItFfJ017484@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/167612: [portalfs] The portal file system gets stuck inside portal_open(). ("1 extra fds") X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 May 2012 18:55:20 -0000 Old Synopsis: The portal file system gets stuck inside portal_open(). ("1 extra fds") New Synopsis: [portalfs] The portal file system gets stuck inside portal_open(). ("1 extra fds") Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Mon May 7 18:55:01 UTC 2012 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=167612 From owner-freebsd-fs@FreeBSD.ORG Mon May 7 21:57:53 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id EC1C91065670; Mon, 7 May 2012 21:57:53 +0000 (UTC) (envelope-from vermaden@interia.pl) Received: from smtpo.poczta.interia.pl (smtpo.poczta.interia.pl [217.74.65.208]) by mx1.freebsd.org (Postfix) with ESMTP id 9FAA28FC08; Mon, 7 May 2012 21:57:53 +0000 (UTC) Date: Mon, 07 May 2012 23:57:52 +0200 From: vermaden To: "Randal L. Schwartz" X-Mailer: interia.pl/pf09 In-Reply-To: <86ehqwb0tm.fsf@red.stonehenge.com> References: <86ipgbg2p6.fsf@red.stonehenge.com> <86d36jzk16.fsf@red.stonehenge.com> <867gwrzjwc.fsf@red.stonehenge.com> <86397fzjgi.fsf@red.stonehenge.com> <86y5p7y478.fsf@red.stonehenge.com> <86d36iycca.fsf@red.stonehenge.com> <86ehqwb0tm.fsf@red.stonehenge.com> Message-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=interia.pl; s=biztos; t=1336427872; bh=bJUG9nepGkMwbHtCHmdnoMdYGI8hjURrYTV5Pto3tD0=; h=Date:From:Subject:To:Cc:X-Mailer:In-Reply-To:References: Message-Id:MIME-Version:Content-Type:Content-Transfer-Encoding; b=tU5rAowv7pbgQf51z/ON58Xzqva0F3iw9yOBWd4RbhicAqfdy3hWFvD5K/C3P6Sux q4WykTAgpAVqveFLHyRaYbMNHWHWFI8FH3fSN3ifsouK8jg4HvU/5Ou0xVS+hgxFmt g29XNxnAAV884hCNzchqnYI4nmBygG/U9Pl8qtbc= Cc: freebsd-fs@FreeBSD.org, freebsd-questions@freebsd.org Subject: Re: HOWTO: FreeBSD ZFS Madness (Boot Environments) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 May 2012 21:57:54 -0000 > Emacs indents it nicely, and colorizes the > keywords so that it stands out. Indentification is not a problem, it work both in geany and vim. Probably I haven't made clear what I meant ;) Take a look at this picture: http://ompldr.org/vZG50bQ The brackets in that specific section (asd) are highlighted, other are not, its not possible with if/then/fi, only the keywords are highlighted, but they are highlighted for the whole script so ... ;) With { } I can also (un)fold the section/function, its not possible with if/then/fi. Regards, vermaden --=20 ... From owner-freebsd-fs@FreeBSD.ORG Mon May 7 23:40:25 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A0ADE106564A for ; Mon, 7 May 2012 23:40:25 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 5A73B8FC0A for ; Mon, 7 May 2012 23:40:25 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap4EAEdcqE+DaFvO/2dsb2JhbABEhXKuPYIMAQEEASNWGw4KAgINGQJZBhyIAAULqA6Se4EviVCEcYEYBJV+kEKDBQ X-IronPort-AV: E=Sophos;i="4.75,546,1330923600"; d="scan'208";a="168310392" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 07 May 2012 19:40:18 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 215D1B4031; Mon, 7 May 2012 19:40:18 -0400 (EDT) Date: Mon, 7 May 2012 19:40:18 -0400 (EDT) From: Rick Macklem To: Andrey Simonenko Message-ID: <1357768784.50127.1336434018113.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20120507174813.GA5927@pm513-1.comsys.ntu-kpi.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org Subject: Re: NFSv4 Questions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 May 2012 23:40:25 -0000 Andrey Simonenko wrote: > On Sun, Apr 29, 2012 at 04:36:03PM -0400, Rick Macklem wrote: > > > > Also, be sure to check "man nfsv4" and maybe reference it (it is > > currently > > in the See Also list, but that might not be strong enough). > > There is another question not explained in documentation (I could not > find the answer at least). Currently NFSv3 client uses reserved port > for NFS mounts and uses non reserved port if "noresvport" is > specified. > NFSv4 client always uses non reserved port, ignoring the "resvport" > option in the mount_nfs command. > > Such behaviour of NFS client was introduced in 1.18 version of > fs/nfsclient/nfs_clvfsops.c [1], where the "resvport" flag is cleared > for NFSv4 mounts. > > Why does "reserved port logic" differ in NFSv3 and NFSv4 clients? > It is my understanding that NFSv4 servers are not supposed to require a "reserved" port#. However, at a quick glance, I can't find that stated in RFC 3530. (It may be implied by the fact that NFSv4 uses a "user" based security model and not a "host" based one.) As such, the client should never need to "waste" a reserved port# on a NFSv4 connection. rick > > > If I may, perhaps a switch in /etc/rc.conf: > > > nfsv4_only="YES" > > > > > I might call it nfsv4_server_only, but sounds like a good > > suggestion. > > > > > This would set the -nfsv4-only switch you mention for mountd, and > > > it > > > would set vfs.nfsd.server_min_nfsvers=4 > > > > > It could also be used by /etc/rc.d/mountd to indicate "don't force > > rpcbind". > > > > I'm sure that you know all these, but let me add some comments. > > 1. Using server_min_nfsvers and server_max_nfsvers are global settings > and do not allow to make one file system NFSv2/3/4 exported and > another > one NFSv4 exported only for example. > > 2. MOUNT protocol is not used only for MNT/UMNT/UMNTALL requests from > NFSv2/3 clients. As I know some automounters use MOUNT EXPORT requests > to get information about exported file systems. So, MOUNT protocol > can be usefull for somebody who uses NFSv4 only. > > Both items have something common. There should be options that enable/ > disables NFSv2, NFSv3 and/or NFSv4 per address specification and/or > per > file system. And there should be the option that allows to disable the > MOUNT protocol entirely, some of its versions or some of its > netconfigs > (some of visible netconfigs that can be used by the MOUNT protocol). > > [1] > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/fs/nfsclient/nfs_clvfsops.c.diff?r1=1.17;r2=1.18 From owner-freebsd-fs@FreeBSD.ORG Tue May 8 01:35:34 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1B9A81065670 for ; Tue, 8 May 2012 01:35:33 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id 6E49E8FC0C for ; Tue, 8 May 2012 01:35:33 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id q481ZPbK023829; Mon, 7 May 2012 20:35:26 -0500 (CDT) Date: Mon, 7 May 2012 20:35:25 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Rick Macklem In-Reply-To: <1357768784.50127.1336434018113.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: References: <1357768784.50127.1336434018113.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Mon, 07 May 2012 20:35:26 -0500 (CDT) Cc: freebsd-fs@freebsd.org Subject: Re: NFSv4 Questions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 May 2012 01:35:34 -0000 On Mon, 7 May 2012, Rick Macklem wrote: >> > It is my understanding that NFSv4 servers are not supposed to require > a "reserved" port#. However, at a quick glance, I can't find that stated > in RFC 3530. (It may be implied by the fact that NFSv4 uses a "user" based > security model and not a "host" based one.) > > As such, the client should never need to "waste" a reserved port# on a NFSv4 > connection. Firewalls might use the reserved port as part of a filtering algorithm. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Tue May 8 07:14:48 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 92C36106564A; Tue, 8 May 2012 07:14:48 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 756D58FC0C; Tue, 8 May 2012 07:14:47 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id KAA12978; Tue, 08 May 2012 10:14:45 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1SRedA-000K6X-JS; Tue, 08 May 2012 10:14:44 +0300 Message-ID: <4FA8C7E3.8070006@FreeBSD.org> Date: Tue, 08 May 2012 10:14:43 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:12.0) Gecko/20120503 Thunderbird/12.0.1 MIME-Version: 1.0 To: John Baldwin References: <4F8999D2.1080902@FreeBSD.org> <201205070953.04032.jhb@freebsd.org> <4FA7E069.8020208@FreeBSD.org> <201205071343.45955.jhb@freebsd.org> In-Reply-To: <201205071343.45955.jhb@freebsd.org> X-Enigmail-Version: 1.5pre Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, freebsd-hackers@FreeBSD.org Subject: Re: [review request] zfsboot/zfsloader: support accessing filesystems within a pool X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 May 2012 07:14:48 -0000 on 07/05/2012 20:43 John Baldwin said the following: > On Monday, May 07, 2012 10:47:05 am Andriy Gapon wrote: >> on 07/05/2012 16:53 John Baldwin said the following: [snip] >> What do you think about the -LOCORE- change that Bruce inspired? > > In general I think this looks good. I have only one suggestion. In other > code (e.g. the genassym constants in the kernel) where we define constants > for field offsets, we make the constant be the uppercase name of the field > itself (e.g. TD_PCB for offsetof(struct thread, td_pcb)). I would rather > do that here as well. In this case the field names do not have a prefix, > but let's just use a BA_ prefix for members of 'bootargs'. BI_SIZE is > already correct, but this would mean renaming HT_OFF to BA_HOWTO, BF_OFF to > BA_BOOTFLAGS, and BI_OFF to BA_BOOTINFO. OK, doing this. > I think you can probably leave BA_SIZE as-is. I see that i386 genassym has a few different styles for sizeof constants: ABBRSIZE FULL_NAME_SIZE ABBR_SIZEOF FULL_NAME_SIZE looked the most appealing to me (and seems to be the most common), so I decided to change BA_SIZE to BOOTARGS_SIZE. I hope that this makes sense and I am not starting a bikeshed :-) >>> Also, at some point we could use a genassym.c file ala the kernel builds to generate >>> some of the constants in bootargs.h instead (e.g. the offsets of fields within >>> structures, and BA_SIZE, though we probably want to ensure that BA_SIZE never >>> changes). >> >> The genassym approach sounds good, but, indeed - later :) > > Yes, that can wait. I think it would not be very hard to do however. All > you really need is access to sys/kern/genassym.sh and nm. > -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue May 8 11:13:48 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7455F106566B for ; Tue, 8 May 2012 11:13:48 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 2E51E8FC16 for ; Tue, 8 May 2012 11:13:48 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap4EAKP+qE+DaFvO/2dsb2JhbABEhXKuOoIMAQEEASNWBRYOCgICDRkCWQYTCYgABQundJMggS+JVIRxgRgElX6QQoMF X-IronPort-AV: E=Sophos;i="4.75,550,1330923600"; d="scan'208";a="171082730" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 08 May 2012 07:13:47 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 12CCAB3EFE; Tue, 8 May 2012 07:13:47 -0400 (EDT) Date: Tue, 8 May 2012 07:13:47 -0400 (EDT) From: Rick Macklem To: Bob Friesenhahn Message-ID: <1387389132.59565.1336475627040.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org Subject: Re: NFSv4 Questions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 May 2012 11:13:48 -0000 Bob Friesenhahn wrote: > On Mon, 7 May 2012, Rick Macklem wrote: > >> > > It is my understanding that NFSv4 servers are not supposed to > > require > > a "reserved" port#. However, at a quick glance, I can't find that > > stated > > in RFC 3530. (It may be implied by the fact that NFSv4 uses a "user" > > based > > security model and not a "host" based one.) > > > > As such, the client should never need to "waste" a reserved port# on > > a NFSv4 > > connection. > > Firewalls might use the reserved port as part of a filtering > algorithm. > Hmm, since the IETF working group was determined to "get rid of this bunk w.r.t. reserved port #s being used to enhance security", I might argue that said firewalls were misconfigured/broken. However, I can see an argument that, instead of silently ignoring the option, it should be obeyed, but with a note in the man page that it shouldn't be used for NFSv4. rick > Bob > -- > Bob Friesenhahn > bfriesen@simple.dallas.tx.us, > http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Tue May 8 14:15:38 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 17371106566B; Tue, 8 May 2012 14:15:38 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id DD9FE8FC08; Tue, 8 May 2012 14:15:37 +0000 (UTC) Received: from John-Baldwins-MacBook-Air.local (c-68-39-198-164.hsd1.de.comcast.net [68.39.198.164]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 3BE00B93B; Tue, 8 May 2012 10:15:37 -0400 (EDT) Message-ID: <4FA92A88.2030000@FreeBSD.org> Date: Tue, 08 May 2012 10:15:36 -0400 From: John Baldwin User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Andriy Gapon References: <4F8999D2.1080902@FreeBSD.org> <201205070953.04032.jhb@freebsd.org> <4FA7E069.8020208@FreeBSD.org> <201205071343.45955.jhb@freebsd.org> <4FA8C7E3.8070006@FreeBSD.org> In-Reply-To: <4FA8C7E3.8070006@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Tue, 08 May 2012 10:15:37 -0400 (EDT) Cc: freebsd-fs@FreeBSD.org, freebsd-hackers@FreeBSD.org Subject: Re: [review request] zfsboot/zfsloader: support accessing filesystems within a pool X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 May 2012 14:15:38 -0000 On 5/8/12 3:14 AM, Andriy Gapon wrote: > on 07/05/2012 20:43 John Baldwin said the following: >> On Monday, May 07, 2012 10:47:05 am Andriy Gapon wrote: >>> on 07/05/2012 16:53 John Baldwin said the following: > [snip] >>> What do you think about the -LOCORE- change that Bruce inspired? >> >> In general I think this looks good. I have only one suggestion. In other >> code (e.g. the genassym constants in the kernel) where we define constants >> for field offsets, we make the constant be the uppercase name of the field >> itself (e.g. TD_PCB for offsetof(struct thread, td_pcb)). I would rather >> do that here as well. In this case the field names do not have a prefix, >> but let's just use a BA_ prefix for members of 'bootargs'. BI_SIZE is >> already correct, but this would mean renaming HT_OFF to BA_HOWTO, BF_OFF to >> BA_BOOTFLAGS, and BI_OFF to BA_BOOTINFO. > > OK, doing this. > >> I think you can probably leave BA_SIZE as-is. > > I see that i386 genassym has a few different styles for sizeof constants: > ABBRSIZE > FULL_NAME_SIZE > ABBR_SIZEOF > > FULL_NAME_SIZE looked the most appealing to me (and seems to be the most > common), so I decided to change BA_SIZE to BOOTARGS_SIZE. > I hope that this makes sense and I am not starting a bikeshed :-) Yeah, given the inconsistency in sizeof() constants in genassym.c, just about anything is fine, which is why I hesitated to suggest any change. BOOTARGS_SIZE is fine. I probably slightly prefer that because it is less ambiguous (in case the structure has a foo_size member such as bi_size in bootinfo). Bruce might even suggest adding a ba_ prefix to all the members of struct bootargs btw. I would not be opposed, but you've already done a fair bit of work on this patch. -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Tue May 8 14:40:00 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B3564106566B for ; Tue, 8 May 2012 14:40:00 +0000 (UTC) (envelope-from freebsd@grem.de) Received: from mail.grem.de (outcast.grem.de [213.239.217.27]) by mx1.freebsd.org (Postfix) with SMTP id D767C8FC15 for ; Tue, 8 May 2012 14:39:59 +0000 (UTC) Received: (qmail 92372 invoked by uid 89); 8 May 2012 14:33:16 -0000 Received: from unknown (HELO ?172.20.10.3?) (mg@grem.de@109.43.0.73) by mail.grem.de with ESMTPA; 8 May 2012 14:33:16 -0000 From: Michael Gmelin Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Date: Tue, 8 May 2012 16:33:14 +0200 Message-Id: <73F8D020-04F3-44B2-97D4-F08E3B253C32@grem.de> To: freebsd-fs@freebsd.org Mime-Version: 1.0 (Apple Message framework v1084) X-Mailer: Apple Mail (2.1084) Subject: ZFS resilvering strangles IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 May 2012 14:40:00 -0000 Hello, I know I'm not the first one to ask this, but I couldn't find a = definitive answers in previous threads. I'm running a FreeBSD 9.0 RELEASE-p1 amd64 system, 8 x 1TB SATA2 drives = (not SAS) and an LSI SAS 9211 controller in IT mode (HBAs, da0-da7). = Zpool version 28, raidz2 container. Machine has 4GB of RAM, therefore = ZFS prefetch is disabled. No manual tuning of ZFS options. Pool contains = about 1TB of data right now (so about 25% full). In normal operations = the pool shows excellent performance. Yesterday I had to replace a = drive, so resilvering started. The resilver process took about 15 hours = - which seems a little bit slow to me, but whatever - what really struck = me was that during resilvering the pool performance got really bad. Read = performance was acceptable, but write performance got down to 500kb/s = (for almost all of the 15 hours). After resilvering finished, system = performance returned to normal. Fortunately this is a backup server and no full backups were scheduled, = so no drama, but I really don't want to have to replace a drive in a = database (or other high IO) server this way (I would have been forced to = offline the drive somehow and migrate data to another server). So the question is, is there anything I can do to improve the situation? = Is this because of memory constraints? Are there any other knobs to = adjust? As far as I know zfs_resilver_delay can't be changed in FreeBSD = yet. I have more drives around, so I could replace another one in the server, = just to replicate the exact situation. Cheers, Michael Disk layout: daXp1128 boot daXp2 16G frebsd-swap daXp3 915G freebsd-zfs Zpool status during resilvering: [root@backup /tmp]# zpool status -v pool: tank state: DEGRADED status: One or more devices is currently being resilvered. The pool = will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Mon May 7 20:18:34 2012 249G scanned out of 908G at 18.2M/s, 10h17m to go 31.2G resilvered, 27.46% done config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 replacing-0 REMOVED 0 0 0 15364271088212071398 REMOVED 0 0 0 was /dev/da0p3/old da0p3 ONLINE 0 0 0 (resilvering) da1p3 ONLINE 0 0 0 da2p3 ONLINE 0 0 0 da3p3 ONLINE 0 0 0 da4p3 ONLINE 0 0 0 da5p3 ONLINE 0 0 0 da6p3 ONLINE 0 0 0 da7p3 ONLINE 0 0 0 errors: No known data errors Zpool status later in the process: root@backup /tmp]# zpool status pool: tank state: DEGRADED status: One or more devices is currently being resilvered. The pool = will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Mon May 7 20:18:34 2012 833G scanned out of 908G at 19.1M/s, 1h7m to go 104G resilvered, 91.70% done config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 replacing-0 REMOVED 0 0 0 15364271088212071398 REMOVED 0 0 0 was /dev/da0p3/old da0p3 ONLINE 0 0 0 (resilvering) da1p3 ONLINE 0 0 0 da2p3 ONLINE 0 0 0 da3p3 ONLINE 0 0 0 da4p3 ONLINE 0 0 0 da5p3 ONLINE 0 0 0 da6p3 ONLINE 0 0 0 da7p3 ONLINE 0 0 0 errors: No known data errors Zpool status after resilvering finished: root@backup /]# zpool status pool: tank state: ONLINE scan: resilvered 113G in 14h54m with 0 errors on Tue May 8 11:13:31 = 2012 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 da0p3 ONLINE 0 0 0 da1p3 ONLINE 0 0 0 da2p3 ONLINE 0 0 0 da3p3 ONLINE 0 0 0 da4p3 ONLINE 0 0 0 da5p3 ONLINE 0 0 0 da6p3 ONLINE 0 0 0 da7p3 ONLINE 0 0 0 errors: No known data errors From owner-freebsd-fs@FreeBSD.ORG Tue May 8 14:58:34 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E25B1106564A for ; Tue, 8 May 2012 14:58:34 +0000 (UTC) (envelope-from tevans.uk@googlemail.com) Received: from mail-qa0-f47.google.com (mail-qa0-f47.google.com [209.85.216.47]) by mx1.freebsd.org (Postfix) with ESMTP id 9A7098FC16 for ; Tue, 8 May 2012 14:58:34 +0000 (UTC) Received: by qabg1 with SMTP id g1so629457qab.13 for ; Tue, 08 May 2012 07:58:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=M/KJKOHzKPTwW9o7tCJov0qBFeKC8BbkbOk8z7sGi7Y=; b=tIVF1vpw84Ehh72mcbO7kjPx5psF5Xf0Aen7A09jzVUyPwtYtckW79T9qCKsod7WTn 6Pq5gYfwtwYR8PsKetnWo9mvKToRU8cqK316+CLHzFWWezIJcRZESWdjeeHsYaypC2ek jUjysewnz/fQYhydbta8EoHmL4+CXdSOUEY6aO/Ya7F0ua+Es1EVPgUrf3yuyJamcW1O eXT8J54EUDubJxytYCgpkxy9n2mo1qwCssfV5yxUX8iaHcWewi/4OEldSlVd0kJe/FbA WJmlHFLDcE+N7vx+ndcAqLZrO/ElhJLyO8w0+vgtN9SrJQ/T4NNyh/DrCbHlmIi4NV8Z XsGQ== MIME-Version: 1.0 Received: by 10.220.150.12 with SMTP id w12mr7919764vcv.39.1336489113763; Tue, 08 May 2012 07:58:33 -0700 (PDT) Received: by 10.52.28.240 with HTTP; Tue, 8 May 2012 07:58:33 -0700 (PDT) In-Reply-To: <73F8D020-04F3-44B2-97D4-F08E3B253C32@grem.de> References: <73F8D020-04F3-44B2-97D4-F08E3B253C32@grem.de> Date: Tue, 8 May 2012 15:58:33 +0100 Message-ID: From: Tom Evans To: Michael Gmelin Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS resilvering strangles IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 May 2012 14:58:35 -0000 On Tue, May 8, 2012 at 3:33 PM, Michael Gmelin wrote: > So the question is, is there anything I can do to improve the situation? > Is this because of memory constraints? Are there any other knobs to > adjust? As far as I know zfs_resilver_delay can't be changed in FreeBSD yet. > > I have more drives around, so I could replace another one in the server, > just to replicate the exact situation. > In general, raidz is pretty fast, but when it's resilvering it is just too busy. The first thing I would do to speed up writes is to add a log device, preferably a SSD. Having a log device will allow the pool to buffer writes to the pool much more effectively than normally during a resilver. Having lots of small writes will kill read speed during the resilver, which is the critical thing. If your workload would benefit, you could split the SSD down the middle, use half for a log device, and half for a cache device to accelerate reads. I've never tried using a regular disk as a log device, I wonder if that would speed up resilvering? Cheers Tom From owner-freebsd-fs@FreeBSD.ORG Tue May 8 17:23:57 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 966B1106564A for ; Tue, 8 May 2012 17:23:57 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-qa0-f49.google.com (mail-qa0-f49.google.com [209.85.216.49]) by mx1.freebsd.org (Postfix) with ESMTP id 550B08FC17 for ; Tue, 8 May 2012 17:23:57 +0000 (UTC) Received: by qabj40 with SMTP id j40so927946qab.15 for ; Tue, 08 May 2012 10:23:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=Izgdx5+DGlDSot6hV9UlqxX4rvEXJABTL61JDjXdtq0=; b=CUwtGxtva+g1VBaKZ0vWJusapIcwBeKXi/q7lKgeThB+EM+lm6gy147jZUIjSbDlR4 RXNv6GBvEQZKDV61Q5k0yc/D2f+PRzGa2i/1bJIRLoARFzoOln1beaMtR8Iqrf1nNQpy qXNucSfhEpLDc8QV+9tBydXppOs4/21Te5bXPcYki0Mbh27zrtWmy9CGHfb/7+fMAfGF zVK8wvo/XKcUDjCkDXW+bjzpi0KatsbtWc0OhvIVqxrlJYRsKZLlAZ+1HAB/oCLOaKnh Uyv7kE0HBG8zj9yIAkQnXVCg//gyxOIm7gUwcM577Iw9Se5sDrryOv2DJMhwcr0Rlrs+ qtVA== MIME-Version: 1.0 Received: by 10.224.109.65 with SMTP id i1mr26705140qap.39.1336497836185; Tue, 08 May 2012 10:23:56 -0700 (PDT) Received: by 10.229.224.147 with HTTP; Tue, 8 May 2012 10:23:56 -0700 (PDT) Date: Tue, 8 May 2012 10:23:56 -0700 Message-ID: From: Freddie Cash To: FreeBSD Filesystems Content-Type: text/plain; charset=UTF-8 Subject: Broken ZFS filesystem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 May 2012 17:23:57 -0000 I have an interesting issue with one single ZFS filesystem in a pool. All the other filesystems are fine, and can be mounted, snapshoted, destroyed, etc. But this one filesystem, if I try to do any operation on it (zfs mount, zfs snapshot, zfs destroy, zfs set ), it spins the system until all RAM is used up (wired), and then hangs the box. The zfs process sits in tx -> tx_sync_done_cv state until the box locks up. CTRL+T of the process only ever shows this: load: 0.46 cmd: zfs 3115 [tx->tx_sync_done_cv)] 36.63r 0.00u 0.00s 0% 2440k Anyone come across anything similar? And found a way to fix it, or to destroy the filesystem? Any suggestions on how to go about debugging this? Any magical zdb commands to use? The filesystem only has 5 MB of data in it (log files), compressed via LZJB for a compressratio of ~6x. There are no snapshots for this filesystem. Dedupe is enabled on the pool and all filesystems. System is running 64-bit 9-RELEASE: FreeBSD alphadrive.sd73.bc.ca 9.0-RELEASE FreeBSD 9.0-RELEASE #0 r229803: Sun Jan 8 00:43:00 PST 2012 root@alphadrive.sd73.bc.ca:/usr/obj/usr/src/sys/ZFSHOST90 amd64 Hardware is fairly generic: - SuperMicro H8DGi-F motherboard - AMD Opteron 6128 CPU (8 cores) - 24 GB of DDR3 RAM - 3x SuperMicro AOC-USAS-L8i SATA controllers - 24x harddrives ranging from 500 GB to 2.0 TB (6 of each kind in raidz2 vdevs) - 64 GB SSD partitioned for OS, swap, with 32 GB for L2ARC Filesystem properties: # zfs get all storage/logs/rsync NAME PROPERTY VALUE SOURCE storage/logs/rsync type filesystem - storage/logs/rsync creation Tue May 10 9:55 2011 - storage/logs/rsync used 5.48M - storage/logs/rsync available 4.61T - storage/logs/rsync referenced 5.48M - storage/logs/rsync compressratio 5.93x - storage/logs/rsync mounted no - storage/logs/rsync quota none default storage/logs/rsync reservation none default storage/logs/rsync recordsize 128K default storage/logs/rsync mountpoint /var/log/rsync local storage/logs/rsync sharenfs off default storage/logs/rsync checksum sha256 inherited from storage storage/logs/rsync compression lzjb inherited from storage storage/logs/rsync atime off inherited from storage storage/logs/rsync devices on default storage/logs/rsync exec on default storage/logs/rsync setuid on default storage/logs/rsync readonly off default storage/logs/rsync jailed off default storage/logs/rsync snapdir visible inherited from storage storage/logs/rsync aclmode discard default storage/logs/rsync aclinherit restricted default storage/logs/rsync canmount on default storage/logs/rsync xattr on default storage/logs/rsync copies 1 default storage/logs/rsync version 5 - storage/logs/rsync utf8only off - storage/logs/rsync normalization none - storage/logs/rsync casesensitivity sensitive - storage/logs/rsync vscan off default storage/logs/rsync nbmand off default storage/logs/rsync sharesmb off default storage/logs/rsync refquota none default storage/logs/rsync refreservation none default storage/logs/rsync primarycache all inherited from storage storage/logs/rsync secondarycache metadata inherited from storage storage/logs/rsync usedbysnapshots 0 - storage/logs/rsync usedbydataset 5.48M - storage/logs/rsync usedbychildren 0 - storage/logs/rsync usedbyrefreservation 0 - storage/logs/rsync logbias latency default storage/logs/rsync dedup sha256 inherited from storage storage/logs/rsync mlslabel - storage/logs/rsync sync standard default storage/logs/rsync refcompressratio 5.93x -- Freddie Cash fjwcash@gmail.com From owner-freebsd-fs@FreeBSD.ORG Tue May 8 18:34:54 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 18DAD106566B; Tue, 8 May 2012 18:34:54 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 01BA08FC14; Tue, 8 May 2012 18:34:52 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id VAA17715; Tue, 08 May 2012 21:34:50 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1SRpFJ-000KeQ-MO; Tue, 08 May 2012 21:34:49 +0300 Message-ID: <4FA96747.3060106@FreeBSD.org> Date: Tue, 08 May 2012 21:34:47 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:12.0) Gecko/20120503 Thunderbird/12.0.1 MIME-Version: 1.0 To: John Baldwin References: <4F8999D2.1080902@FreeBSD.org> <201205070953.04032.jhb@freebsd.org> <4FA7E069.8020208@FreeBSD.org> <201205071343.45955.jhb@freebsd.org> <4FA8C7E3.8070006@FreeBSD.org> <4FA92A88.2030000@FreeBSD.org> In-Reply-To: <4FA92A88.2030000@FreeBSD.org> X-Enigmail-Version: 1.5pre Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, freebsd-hackers@FreeBSD.org Subject: Re: [review request] zfsboot/zfsloader: support accessing filesystems within a pool X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 May 2012 18:34:54 -0000 on 08/05/2012 17:15 John Baldwin said the following: > Bruce might even suggest adding a ba_ prefix to all the members of > struct bootargs btw. I would not be opposed, but you've already done > a fair bit of work on this patch. Thank you for sparing me :-) So I hope to get busy committing this stuff soon. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Tue May 8 19:48:51 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 266B51065672 for ; Tue, 8 May 2012 19:48:51 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-qa0-f47.google.com (mail-qa0-f47.google.com [209.85.216.47]) by mx1.freebsd.org (Postfix) with ESMTP id CFE7B8FC1C for ; Tue, 8 May 2012 19:48:50 +0000 (UTC) Received: by qabg1 with SMTP id g1so1039569qab.13 for ; Tue, 08 May 2012 12:48:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=KUrRoDLZSrl13CIwO7rk7VD7Z6w0lDhdLW1hcxxlLk8=; b=FjxacbXySHdP+5tvSEk0m5EdZj2JW7HXwsrP7MLyQlwCZxJTjVSW8Xmg29wvFEQdHz AK68g54ecPunp7FKALlhASNXHFGdADzFjifoXoU+65YuLIG+zR7jO1nNh5gczMeHqMiM QPSCyxvLf/Sqm3EsnlFjASdA5xBdoIAuuV6m4eMaUz+T1GbTktMNSx88H31VadjH8JAa 8HCd8WwH7ruBYT1Pm4C/s+JVwBT4FF5YGzils6PZ4P9B6doBBd8bei/rN3FSIKivOZZi jbuKzGihV/S52wWJXtEP07FGi6Hr2E/Q7/U6MfZLXjyLIoVvAChw/Ay0UiikyRYkCQzo IsYw== MIME-Version: 1.0 Received: by 10.224.73.1 with SMTP id o1mr619844qaj.43.1336506529316; Tue, 08 May 2012 12:48:49 -0700 (PDT) Received: by 10.229.224.147 with HTTP; Tue, 8 May 2012 12:48:49 -0700 (PDT) In-Reply-To: References: Date: Tue, 8 May 2012 12:48:49 -0700 Message-ID: From: Freddie Cash To: FreeBSD Filesystems Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: Broken ZFS filesystem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 May 2012 19:48:51 -0000 On Tue, May 8, 2012 at 10:23 AM, Freddie Cash wrote: > I have an interesting issue with one single ZFS filesystem in a pool. > All the other filesystems are fine, and can be mounted, snapshoted, > destroyed, etc. =C2=A0But this one filesystem, if I try to do any operati= on > on it (zfs mount, zfs snapshot, zfs destroy, zfs set ), it > spins the system until all RAM is used up (wired), and then hangs the > box. =C2=A0The zfs process sits in tx -> tx_sync_done_cv state until the > box locks up. =C2=A0CTRL+T of the process only ever shows this: > =C2=A0 =C2=A0load: 0.46 =C2=A0cmd: zfs 3115 [tx->tx_sync_done_cv)] 36.63r= 0.00u 0.00s 0% 2440k > > Anyone come across anything similar? =C2=A0And found a way to fix it, or = to > destroy the filesystem? =C2=A0Any suggestions on how to go about debuggin= g > this? =C2=A0Any magical zdb commands to use? > > The filesystem only has 5 MB of data in it (log files), compressed via > LZJB for a compressratio of ~6x. =C2=A0There are no snapshots for this > filesystem. > > Dedupe is enabled on the pool and all filesystems. After more fiddling, testing, and experimenting, it all came down to not enough RAM in the box to mount the 5 MB filesystem. After installing an extra 8 GB of RAM (32 GB total), everything mounted correctly. Took 27 GB of wired kernel memory (guessing ARC space) to do it. Unmount, mount, export, import, change properties all completed successfully. And the box is running correctly with 24 GB of RAM again. We'll be ordering more RAM for our ZFS boxes, now. :) --=20 Freddie Cash fjwcash@gmail.com From owner-freebsd-fs@FreeBSD.ORG Tue May 8 20:02:32 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DE7931065673 for ; Tue, 8 May 2012 20:02:32 +0000 (UTC) (envelope-from freebsd@grem.de) Received: from mail.grem.de (outcast.grem.de [213.239.217.27]) by mx1.freebsd.org (Postfix) with SMTP id 423FC8FC14 for ; Tue, 8 May 2012 20:02:31 +0000 (UTC) Received: (qmail 96300 invoked by uid 89); 8 May 2012 20:02:29 -0000 Received: from unknown (HELO ?192.168.250.164?) (mg@grem.de@80.137.83.22) by mail.grem.de with ESMTPA; 8 May 2012 20:02:29 -0000 Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Michael Gmelin In-Reply-To: Date: Tue, 8 May 2012 22:02:29 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <180B72CE-B285-4702-B16D-0714AA07022C@grem.de> References: <73F8D020-04F3-44B2-97D4-F08E3B253C32@grem.de> To: freebsd-fs@freebsd.org X-Mailer: Apple Mail (2.1084) Cc: Tom Evans Subject: Re: ZFS resilvering strangles IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 May 2012 20:02:32 -0000 On May 8, 2012, at 16:58, Tom Evans wrote: > On Tue, May 8, 2012 at 3:33 PM, Michael Gmelin = wrote: >> So the question is, is there anything I can do to improve the = situation? >> Is this because of memory constraints? Are there any other knobs to >> adjust? As far as I know zfs_resilver_delay can't be changed in = FreeBSD yet. >>=20 >> I have more drives around, so I could replace another one in the = server, >> just to replicate the exact situation. >>=20 >=20 > In general, raidz is pretty fast, but when it's resilvering it is just > too busy. The first thing I would do to speed up writes is to add a > log device, preferably a SSD. Having a log device will allow the pool > to buffer writes to the pool much more effectively than normally > during a resilver. > Having lots of small writes will kill read speed during the resilver, > which is the critical thing. >=20 > If your workload would benefit, you could split the SSD down the > middle, use half for a log device, and half for a cache device to > accelerate reads. >=20 > I've never tried using a regular disk as a log device, I wonder if > that would speed up resilvering? >=20 > Cheers >=20 > Tom Thanks for your constructive feedback. It would be interesting to see if = adding an SSD could actually help in this case (it definitely would = benefit the machine also during normal operation). Unfortunately it's = not an option (the server is maxed out, there is simply no room to add a = log device at the moment). The general question remains - is there a way to make ZFS perform better = during resilvering - has anybody experience tuning zfs_resilver_delay on = Solaris and if this makes a difference (the variable is in the FreeBSD = source code, but I couldn't find a way to change without touching the = source)? - or is there something I missed that's specific about my = setup. Especially in configurations using raidz2 and raidz3, that can = withstand the loss of 2 or even 3 drives, having a longer resilver = period shouldn't be an issue, as long as system performance is no = degraded - or only degraded to a certain degree (I could see up to 50% = more or less tolerable, in my case read performace was OKish, but write = performance was reduced by more than 90%, so the machine was almost = unusable). Do you think it would make sense to try to play with zfs_resilver_delay = directly in the ZFS kernel module?=20 (We have about 20 servers that could run ZFS around here, which = currently run various combinations of UFS2+SU (no SUJ, since snapshots = are broken currently), either on hardware RAID1 or some gmirror setup. I = would like to standardize these setups to use ZFS, but I can't add = logging devices to all of the for obvious reasons.) I somehow feel that simulating this in a virtual machine is probably = pointless :) Cheers, Michael From owner-freebsd-fs@FreeBSD.ORG Tue May 8 21:31:30 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9EAD21065670 for ; Tue, 8 May 2012 21:31:30 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id 558D28FC16 for ; Tue, 8 May 2012 21:31:30 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id q48LVS8W028606; Tue, 8 May 2012 16:31:29 -0500 (CDT) Date: Tue, 8 May 2012 16:31:28 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Michael Gmelin In-Reply-To: <180B72CE-B285-4702-B16D-0714AA07022C@grem.de> Message-ID: References: <73F8D020-04F3-44B2-97D4-F08E3B253C32@grem.de> <180B72CE-B285-4702-B16D-0714AA07022C@grem.de> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Tue, 08 May 2012 16:31:29 -0500 (CDT) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS resilvering strangles IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 May 2012 21:31:30 -0000 On Tue, 8 May 2012, Michael Gmelin wrote: > > Do you think it would make sense to try to play with zfs_resilver_delay directly in the ZFS kernel module? This may be the wrong approach if the issue is really that there are too many I/Os queued for the device. Finding a tunable which reduces the maximum number of I/Os queued for a disk device may help reduce write latencies by limiting the backlog. On my Solaris 10 system, I accomplished this via a tunable in /etc/system: set zfs:zfs_vdev_max_pending = 5 What is the equivalent for FreeBSD? Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Tue May 8 21:33:24 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id F1684106564A for ; Tue, 8 May 2012 21:33:24 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-qa0-f47.google.com (mail-qa0-f47.google.com [209.85.216.47]) by mx1.freebsd.org (Postfix) with ESMTP id A8D5C8FC08 for ; Tue, 8 May 2012 21:33:24 +0000 (UTC) Received: by qabg1 with SMTP id g1so1160854qab.13 for ; Tue, 08 May 2012 14:33:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=5k1em8eZqrWYkKiV39ucmQEHlrRO1LxzE/7T00pGiBI=; b=GWJEM3ebZfcZOqdP0xJnRmOVMp//i82WkU7ETSgvh6dybgM4uhMwaz9q7mveybRsOG X/4t7AX0pYDdIefM6h8e+uwIj36Io+CwSAab5Rnqtiskk8QmWN1fxet3oSupXXj2+pur zi/kxxwDoemBYKa2AYMBJwFoX2Rh5f5F/4jgqi3Vhh1ZXSgftxzrTybY0aNikCD1s2Kt c2RIRlasu6m6KS93vCmIkQ7S39Vqhk+RuamhUmKPdR8Dyb97ZR38HUHKvANioTjB0cuU rkBc7ZpZPBOnhzKwI/tfHn9fEUpWC+xTrA43FZ4if/5wEHppp2DMh8ROq44XopDjHL/F nK5g== MIME-Version: 1.0 Received: by 10.224.178.9 with SMTP id bk9mr932141qab.98.1336512804006; Tue, 08 May 2012 14:33:24 -0700 (PDT) Received: by 10.229.224.147 with HTTP; Tue, 8 May 2012 14:33:23 -0700 (PDT) In-Reply-To: References: <73F8D020-04F3-44B2-97D4-F08E3B253C32@grem.de> <180B72CE-B285-4702-B16D-0714AA07022C@grem.de> Date: Tue, 8 May 2012 14:33:23 -0700 Message-ID: From: Freddie Cash To: Bob Friesenhahn Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, Michael Gmelin Subject: Re: ZFS resilvering strangles IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 May 2012 21:33:25 -0000 On Tue, May 8, 2012 at 2:31 PM, Bob Friesenhahn wrote: > On Tue, 8 May 2012, Michael Gmelin wrote: >> >> Do you think it would make sense to try to play with zfs_resilver_delay >> directly in the ZFS kernel module? > > This may be the wrong approach if the issue is really that there are too > many I/Os queued for the device. =C2=A0Finding a tunable which reduces th= e > maximum number of I/Os queued for a disk device may help reduce write > latencies by limiting the backlog. > > On my Solaris 10 system, I accomplished this via a tunable in /etc/system= : > set zfs:zfs_vdev_max_pending =3D 5 > > What is the equivalent for FreeBSD? Setting vfs.zfs.vdev_max_pending=3D"4" in /boot/loader.conf (or whatever value you want). The default is 10. > Bob > -- > Bob Friesenhahn > bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen= / > GraphicsMagick Maintainer, =C2=A0 =C2=A0http://www.GraphicsMagick.org/ > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" --=20 Freddie Cash fjwcash@gmail.com From owner-freebsd-fs@FreeBSD.ORG Tue May 8 22:06:27 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id EC82D106566B for ; Tue, 8 May 2012 22:06:27 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-lpp01m010-f54.google.com (mail-lpp01m010-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id 643228FC0C for ; Tue, 8 May 2012 22:06:27 +0000 (UTC) Received: by lagv3 with SMTP id v3so6321915lag.13 for ; Tue, 08 May 2012 15:06:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=w1F8BTxAYJxw69oxrPuziXBpWnSPa3wiBEW78Qs3Bso=; b=Y45u1wrfOPcqT7rIXL4OkPDldQBVzAHLSc+SMULWEx7akWdLk5+EznUsYoF3BVV36l Jyr/D2LFCNk+OpuWdjTNFY+N9LMa/CxHKUo68BzwUHR0E8yYpLyXjImxuTsZ+MK/RvVB 3RgW/sPKs4odTAeCv34xLYda5xHwulcqvoIaHnUQYxYvq/J3Dze8dA2eaWWWMqpYlEs0 RLKiVERTlgGWheyKj9GcWYDu7ty9j9vah9WBiZNkq8P/CGpa6fZMgyNlkZ3KoOo/SfPs 3W5hrtFNUsSYww98/zR8L4B2eAK91UGSDlKz6a3m5psHaBZYR6XbmmEiMm5vDM51j+UF C8Vw== MIME-Version: 1.0 Received: by 10.112.44.129 with SMTP id e1mr2406510lbm.44.1336514786032; Tue, 08 May 2012 15:06:26 -0700 (PDT) Sender: artemb@gmail.com Received: by 10.112.2.5 with HTTP; Tue, 8 May 2012 15:06:25 -0700 (PDT) In-Reply-To: References: <73F8D020-04F3-44B2-97D4-F08E3B253C32@grem.de> <180B72CE-B285-4702-B16D-0714AA07022C@grem.de> Date: Tue, 8 May 2012 15:06:25 -0700 X-Google-Sender-Auth: MVSkhVa7_2YNJMurZZ8hTH_-SsQ Message-ID: From: Artem Belevich To: Freddie Cash Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, Michael Gmelin Subject: Re: ZFS resilvering strangles IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 May 2012 22:06:28 -0000 On Tue, May 8, 2012 at 2:33 PM, Freddie Cash wrote: > On Tue, May 8, 2012 at 2:31 PM, Bob Friesenhahn > wrote: >> On Tue, 8 May 2012, Michael Gmelin wrote: >>> >>> Do you think it would make sense to try to play with zfs_resilver_delay >>> directly in the ZFS kernel module? >> >> This may be the wrong approach if the issue is really that there are too >> many I/Os queued for the device. =A0Finding a tunable which reduces the >> maximum number of I/Os queued for a disk device may help reduce write >> latencies by limiting the backlog. >> >> On my Solaris 10 system, I accomplished this via a tunable in /etc/syste= m: >> set zfs:zfs_vdev_max_pending =3D 5 >> >> What is the equivalent for FreeBSD? > > Setting vfs.zfs.vdev_max_pending=3D"4" in /boot/loader.conf (or whatever > value you want). =A0The default is 10. You may also want to look at vfs.zfs.scrub_limit sysctl. According to description it's "Maximum scrub/resilver I/O queue" which sounds like something that may help in this case. --Artem From owner-freebsd-fs@FreeBSD.ORG Tue May 8 22:15:36 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7E2911065732 for ; Tue, 8 May 2012 22:15:36 +0000 (UTC) (envelope-from freebsd@grem.de) Received: from mail.grem.de (outcast.grem.de [213.239.217.27]) by mx1.freebsd.org (Postfix) with SMTP id D14F28FC21 for ; Tue, 8 May 2012 22:15:35 +0000 (UTC) Received: (qmail 97884 invoked by uid 89); 8 May 2012 22:15:34 -0000 Received: from unknown (HELO ?192.168.250.164?) (mg@grem.de@80.137.83.22) by mail.grem.de with ESMTPA; 8 May 2012 22:15:34 -0000 Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Michael Gmelin In-Reply-To: Date: Wed, 9 May 2012 00:15:32 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <44759017-6FAC-4982-B382-CE17DED83262@grem.de> References: <73F8D020-04F3-44B2-97D4-F08E3B253C32@grem.de> <180B72CE-B285-4702-B16D-0714AA07022C@grem.de> To: freebsd-fs@freebsd.org X-Mailer: Apple Mail (2.1084) Cc: Subject: Re: ZFS resilvering strangles IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 May 2012 22:15:36 -0000 On May 9, 2012, at 00:06, Artem Belevich wrote: > On Tue, May 8, 2012 at 2:33 PM, Freddie Cash = wrote: >> On Tue, May 8, 2012 at 2:31 PM, Bob Friesenhahn >> wrote: >>> On Tue, 8 May 2012, Michael Gmelin wrote: >>>>=20 >>>> Do you think it would make sense to try to play with = zfs_resilver_delay >>>> directly in the ZFS kernel module? >>>=20 >>> This may be the wrong approach if the issue is really that there are = too >>> many I/Os queued for the device. Finding a tunable which reduces = the >>> maximum number of I/Os queued for a disk device may help reduce = write >>> latencies by limiting the backlog. >>>=20 >>> On my Solaris 10 system, I accomplished this via a tunable in = /etc/system: >>> set zfs:zfs_vdev_max_pending =3D 5 >>>=20 >>> What is the equivalent for FreeBSD? >>=20 >> Setting vfs.zfs.vdev_max_pending=3D"4" in /boot/loader.conf (or = whatever >> value you want). The default is 10. >=20 Do you think this will actually make a difference. As far as I understand my primary problem is not latency but throughput. Simple example is dd if=3D/dev/zero of=3Dfilename bs=3D1m, which gave me = 500kb/s. Latency might be an additional problem (or am I mislead and a shorter queue would raise the processes chance to get data through?). > You may also want to look at vfs.zfs.scrub_limit sysctl. According to > description it's "Maximum scrub/resilver I/O queue" which sounds like > something that may help in this case. >=20 > --Artem Very good point, thank you. I also found this entry in the FreeBSD forums indicating that this might ease the pain (even though he's also talking about scrub, not resilver, hopefully the tunable does both as indicated in the comments): http://forums.freebsd.org/showthread.php?t=3D31628 /* maximum scrub/resilver I/O queue per leaf vdev */ int zfs_scrub_limit =3D 10; TUNABLE_INT("vfs.zfs.scrub_limit", &zfs_scrub_limit); SYSCTL_INT(_vfs_zfs, OID_AUTO, scrub_limit, CTLFLAG_RDTUN, &zfs_scrub_limit, 0, "Maximum scrub/resilver I/O queue"); =20 I will try lowering the value zfs_scrub_limit to 6 in loader.conf and replace the drive once more later this month. --=20 Michael From owner-freebsd-fs@FreeBSD.ORG Tue May 8 22:42:15 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 81B72106566C for ; Tue, 8 May 2012 22:42:15 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id 326548FC0C for ; Tue, 8 May 2012 22:42:15 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id q48MgDL5028777; Tue, 8 May 2012 17:42:14 -0500 (CDT) Date: Tue, 8 May 2012 17:42:13 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Michael Gmelin In-Reply-To: <44759017-6FAC-4982-B382-CE17DED83262@grem.de> Message-ID: References: <73F8D020-04F3-44B2-97D4-F08E3B253C32@grem.de> <180B72CE-B285-4702-B16D-0714AA07022C@grem.de> <44759017-6FAC-4982-B382-CE17DED83262@grem.de> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Tue, 08 May 2012 17:42:14 -0500 (CDT) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS resilvering strangles IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 May 2012 22:42:15 -0000 On Wed, 9 May 2012, Michael Gmelin wrote: >>> >>> Setting vfs.zfs.vdev_max_pending="4" in /boot/loader.conf (or whatever >>> value you want). The default is 10. > > Do you think this will actually make a difference. As far as I > understand my primary problem is not latency but throughput. Simple > example is dd if=/dev/zero of=filename bs=1m, which gave me 500kb/s. > Latency might be an additional problem (or am I mislead and a shorter > queue would raise the processes chance to get data through?). The effect may be observed in real-time on a running system. Latency and throughput go hand in hand. The 'dd' command is not threaded and is sequential. It waits for the current I/O to return before it starts the next one. If the wait is shorter (fewer pending requests in line), then throughput does increase. System total throughput (which includes the resilver operations) may not increase but the throughput observed by an individual waiter may increase. The default for vdev_max_pending on Solaris was/is 32. If FreeBSD uses a default of 10 then reducing from the default may be less dramatic. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Tue May 8 22:48:25 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B5E031065680 for ; Tue, 8 May 2012 22:48:25 +0000 (UTC) (envelope-from freebsd@grem.de) Received: from mail.grem.de (outcast.grem.de [213.239.217.27]) by mx1.freebsd.org (Postfix) with SMTP id 1486F8FC16 for ; Tue, 8 May 2012 22:48:24 +0000 (UTC) Received: (qmail 98385 invoked by uid 89); 8 May 2012 22:48:23 -0000 Received: from unknown (HELO ?192.168.250.164?) (mg@grem.de@80.137.83.22) by mail.grem.de with ESMTPA; 8 May 2012 22:48:23 -0000 Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Michael Gmelin In-Reply-To: Date: Wed, 9 May 2012 00:48:22 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <1CEFF50A-4CD5-4947-8A38-2EEAE3311E67@grem.de> References: <73F8D020-04F3-44B2-97D4-F08E3B253C32@grem.de> <180B72CE-B285-4702-B16D-0714AA07022C@grem.de> <44759017-6FAC-4982-B382-CE17DED83262@grem.de> To: freebsd-fs@freebsd.org X-Mailer: Apple Mail (2.1084) Cc: Subject: Re: ZFS resilvering strangles IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 May 2012 22:48:25 -0000 On May 9, 2012, at 00:42, Bob Friesenhahn wrote: > On Wed, 9 May 2012, Michael Gmelin wrote: >>>>=20 >>>> Setting vfs.zfs.vdev_max_pending=3D"4" in /boot/loader.conf (or = whatever >>>> value you want). The default is 10. >>=20 >> Do you think this will actually make a difference. As far as I >> understand my primary problem is not latency but throughput. Simple >> example is dd if=3D/dev/zero of=3Dfilename bs=3D1m, which gave me = 500kb/s. >> Latency might be an additional problem (or am I mislead and a shorter >> queue would raise the processes chance to get data through?). >=20 > The effect may be observed in real-time on a running system. Latency = and throughput go hand in hand. The 'dd' command is not threaded and is = sequential. It waits for the current I/O to return before it starts the = next one. If the wait is shorter (fewer pending requests in line), then = throughput does increase. System total throughput (which includes the = resilver operations) may not increase but the throughput observed by an = individual waiter may increase. >=20 > The default for vdev_max_pending on Solaris was/is 32. If FreeBSD = uses a default of 10 then reducing from the default may be less = dramatic. >=20 That makes sense. I will run more sophisticated I/O tests next time to get a more complete picture. --=20 Michael > Bob > --=20 > Bob Friesenhahn > bfriesen@simple.dallas.tx.us, = http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Wed May 9 06:55:27 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D70B8106566C for ; Wed, 9 May 2012 06:55:27 +0000 (UTC) (envelope-from peter.maloney@brockmann-consult.de) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.187]) by mx1.freebsd.org (Postfix) with ESMTP id 809228FC0A for ; Wed, 9 May 2012 06:55:27 +0000 (UTC) Received: from [10.3.0.26] ([141.4.215.32]) by mrelayeu.kundenserver.de (node=mreu3) with ESMTP (Nemesis) id 0M5mBh-1SD5NZ2JyL-00xmMP; Wed, 09 May 2012 08:55:20 +0200 Message-ID: <4FAA14D6.8060302@brockmann-consult.de> Date: Wed, 09 May 2012 08:55:18 +0200 From: Peter Maloney User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120312 Thunderbird/11.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <73F8D020-04F3-44B2-97D4-F08E3B253C32@grem.de> In-Reply-To: <73F8D020-04F3-44B2-97D4-F08E3B253C32@grem.de> X-Enigmail-Version: 1.4.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:QNorriCnYF1BJDt2p63oj2R2nLGavWxLJayIVvg37lG eGGpCELqNz8+vNe2YPkmYYri8iAA/HyCDlURhkWA5JFSMSg4QK +8mpxnpJ/qynYYo4e+RfkuqXtsdYJ8RlslT+GzOG5B7umFH/pF 7x0+ck5l+zBF9+dj4cRMK9POoKfKQj+GOgVpY08wEhqQdgVUQ3 Jp3JDMBDSUtwBlXUrh/wawGXjDL4fMk83v6r5Q8E9LhPQfJ9wm oatc8CrsSsFou7heYAh4yUY8vdkOtwTgKu9EbUTyR5Dkt0cmMZ tAms1K8xu6AQofehyjw0Lg5/fsoOLu6FuupzygZqGeWrgWuU/U Wv6Hy6unCRDv4cYHiJF6YCZWIfTs2GKJaH6gPq60r Subject: Re: ZFS resilvering strangles IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 May 2012 06:55:27 -0000 About the slow performance during resilver, Are they consumer disks? If so, one guess is you have a bad disk. Check by looking at load and ms per x on disks. If one is high and others are low, then it's probably bad. If a single 'good' disk is bad, the whole thing will run very slow. Bad consumer disks run very slow trying over and over to read the not-yet-bad sectors where enterprise disks would throw errors and fail. My other guess is that this is because FreeBSD, unlike Linux and Solaris, lacks IO scheduling. So there is no way for the zfs code to truly put the resilver on lower priority than the regular production applications. I've read that IO scheduling was developed for 8.2, but never officially adopted. I would love to see it in FreeBSD... I use "ionice" on Linux all the time (for copying, backups, zipping, installing a a huge batch of packages [noticeable >300 MB], etc. while I work on other things), so I miss it. IO scheduling on Solaris also helps with dedup performance. Does anyone know if there is a movement to add the IO scheduling code into the base system? On 05/08/2012 04:33 PM, Michael Gmelin wrote: > Hello, > > I know I'm not the first one to ask this, but I couldn't find a definitive answers in previous threads. > > I'm running a FreeBSD 9.0 RELEASE-p1 amd64 system, 8 x 1TB SATA2 drives (not SAS) and an LSI SAS 9211 controller in IT mode (HBAs, da0-da7). Zpool version 28, raidz2 container. Machine has 4GB of RAM, therefore ZFS prefetch is disabled. No manual tuning of ZFS options. Pool contains about 1TB of data right now (so about 25% full). In normal operations the pool shows excellent performance. Yesterday I had to replace a drive, so resilvering started. The resilver process took about 15 hours - which seems a little bit slow to me, but whatever - what really struck me was that during resilvering the pool performance got really bad. Read performance was acceptable, but write performance got down to 500kb/s (for almost all of the 15 hours). After resilvering finished, system performance returned to normal. > > Fortunately this is a backup server and no full backups were scheduled, so no drama, but I really don't want to have to replace a drive in a database (or other high IO) server this way (I would have been forced to offline the drive somehow and migrate data to another server). > > So the question is, is there anything I can do to improve the situation? Is this because of memory constraints? Are there any other knobs to adjust? As far as I know zfs_resilver_delay can't be changed in FreeBSD yet. > > I have more drives around, so I could replace another one in the server, just to replicate the exact situation. > > Cheers, > Michael > > Disk layout: > > daXp1128 boot > daXp2 16G frebsd-swap > daXp3 915G freebsd-zfs > > > Zpool status during resilvering: > > [root@backup /tmp]# zpool status -v > pool: tank > state: DEGRADED > status: One or more devices is currently being resilvered. The pool will > continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scan: resilver in progress since Mon May 7 20:18:34 2012 > 249G scanned out of 908G at 18.2M/s, 10h17m to go > 31.2G resilvered, 27.46% done > config: > > NAME STATE READ WRITE CKSUM > tank DEGRADED 0 0 0 > raidz2-0 DEGRADED 0 0 0 > replacing-0 REMOVED 0 0 0 > 15364271088212071398 REMOVED 0 0 0 was > /dev/da0p3/old > da0p3 ONLINE 0 0 0 > (resilvering) > da1p3 ONLINE 0 0 0 > da2p3 ONLINE 0 0 0 > da3p3 ONLINE 0 0 0 > da4p3 ONLINE 0 0 0 > da5p3 ONLINE 0 0 0 > da6p3 ONLINE 0 0 0 > da7p3 ONLINE 0 0 0 > > errors: No known data errors > > Zpool status later in the process: > root@backup /tmp]# zpool status > pool: tank > state: DEGRADED > status: One or more devices is currently being resilvered. The pool will > continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scan: resilver in progress since Mon May 7 20:18:34 2012 > 833G scanned out of 908G at 19.1M/s, 1h7m to go > 104G resilvered, 91.70% done > config: > > NAME STATE READ WRITE CKSUM > tank DEGRADED 0 0 0 > raidz2-0 DEGRADED 0 0 0 > replacing-0 REMOVED 0 0 0 > 15364271088212071398 REMOVED 0 0 0 was > /dev/da0p3/old > da0p3 ONLINE 0 0 0 > (resilvering) > da1p3 ONLINE 0 0 0 > da2p3 ONLINE 0 0 0 > da3p3 ONLINE 0 0 0 > da4p3 ONLINE 0 0 0 > da5p3 ONLINE 0 0 0 > da6p3 ONLINE 0 0 0 > da7p3 ONLINE 0 0 0 > > errors: No known data errors > > > Zpool status after resilvering finished: > root@backup /]# zpool status > pool: tank > state: ONLINE > scan: resilvered 113G in 14h54m with 0 errors on Tue May 8 11:13:31 2012 > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > raidz2-0 ONLINE 0 0 0 > da0p3 ONLINE 0 0 0 > da1p3 ONLINE 0 0 0 > da2p3 ONLINE 0 0 0 > da3p3 ONLINE 0 0 0 > da4p3 ONLINE 0 0 0 > da5p3 ONLINE 0 0 0 > da6p3 ONLINE 0 0 0 > da7p3 ONLINE 0 0 0 > > errors: No known data errors > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" -- -------------------------------------------- Peter Maloney Brockmann Consult Max-Planck-Str. 2 21502 Geesthacht Germany Tel: +49 4152 889 300 Fax: +49 4152 889 333 E-mail: peter.maloney@brockmann-consult.de Internet: http://www.brockmann-consult.de -------------------------------------------- From owner-freebsd-fs@FreeBSD.ORG Wed May 9 14:05:09 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id ED825106564A for ; Wed, 9 May 2012 14:05:09 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from plane.gmane.org (plane.gmane.org [80.91.229.3]) by mx1.freebsd.org (Postfix) with ESMTP id 93BC28FC15 for ; Wed, 9 May 2012 14:05:09 +0000 (UTC) Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1SS7Vm-00049f-Tf for freebsd-fs@freebsd.org; Wed, 09 May 2012 16:05:02 +0200 Received: from dyn1219-111.wlan.ic.ac.uk ([129.31.219.111]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 09 May 2012 16:05:02 +0200 Received: from johannes by dyn1219-111.wlan.ic.ac.uk with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Wed, 09 May 2012 16:05:02 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Johannes Totz Date: Wed, 09 May 2012 15:04:52 +0100 Lines: 141 Message-ID: References: <73F8D020-04F3-44B2-97D4-F08E3B253C32@grem.de> <4FAA14D6.8060302@brockmann-consult.de> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: dyn1219-111.wlan.ic.ac.uk User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 In-Reply-To: <4FAA14D6.8060302@brockmann-consult.de> Subject: Re: ZFS resilvering strangles IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 May 2012 14:05:10 -0000 On 09/05/2012 07:55, Peter Maloney wrote: > About the slow performance during resilver, > > Are they consumer disks? If so, one guess is you have a bad disk. Check > by looking at load and ms per x on disks. If one is high and others are > low, then it's probably bad. If a single 'good' disk is bad, the whole > thing will run very slow. Bad consumer disks run very slow trying over > and over to read the not-yet-bad sectors where enterprise disks would > throw errors and fail. > > My other guess is that this is because FreeBSD, unlike Linux and > Solaris, lacks IO scheduling. So there is no way for the zfs code to > truly put the resilver on lower priority than the regular production > applications. I've read that IO scheduling was developed for 8.2, but > never officially adopted. I would love to see it in FreeBSD... I use > "ionice" on Linux all the time (for copying, backups, zipping, > installing a a huge batch of packages [noticeable >300 MB], etc. while I > work on other things), so I miss it. IO scheduling on Solaris also helps > with dedup performance. > > Does anyone know if there is a movement to add the IO scheduling code > into the base system? There was a geom module for io scheduling: gsched(8) But I've never used it and don't know what the state of it is... > On 05/08/2012 04:33 PM, Michael Gmelin wrote: >> Hello, >> >> I know I'm not the first one to ask this, but I couldn't find a definitive answers in previous threads. >> >> I'm running a FreeBSD 9.0 RELEASE-p1 amd64 system, 8 x 1TB SATA2 drives (not SAS) and an LSI SAS 9211 controller in IT mode (HBAs, da0-da7). Zpool version 28, raidz2 container. Machine has 4GB of RAM, therefore ZFS prefetch is disabled. No manual tuning of ZFS options. Pool contains about 1TB of data right now (so about 25% full). In normal operations the pool shows excellent performance. Yesterday I had to replace a drive, so resilvering started. The resilver process took about 15 hours - which seems a little bit slow to me, but whatever - what really struck me was that during resilvering the pool performance got really bad. Read performance was acceptable, but write performance got down to 500kb/s (for almost all of the 15 hours). After resilvering finished, system performance returned > to normal. >> >> Fortunately this is a backup server and no full backups were scheduled, so no drama, but I really don't want to have to replace a drive in a database (or other high IO) server this way (I would have been forced to offline the drive somehow and migrate data to another server). >> >> So the question is, is there anything I can do to improve the situation? Is this because of memory constraints? Are there any other knobs to adjust? As far as I know zfs_resilver_delay can't be changed in FreeBSD yet. >> >> I have more drives around, so I could replace another one in the server, just to replicate the exact situation. >> >> Cheers, >> Michael >> >> Disk layout: >> >> daXp1128 boot >> daXp2 16G frebsd-swap >> daXp3 915G freebsd-zfs >> >> >> Zpool status during resilvering: >> >> [root@backup /tmp]# zpool status -v >> pool: tank >> state: DEGRADED >> status: One or more devices is currently being resilvered. The pool will >> continue to function, possibly in a degraded state. >> action: Wait for the resilver to complete. >> scan: resilver in progress since Mon May 7 20:18:34 2012 >> 249G scanned out of 908G at 18.2M/s, 10h17m to go >> 31.2G resilvered, 27.46% done >> config: >> >> NAME STATE READ WRITE CKSUM >> tank DEGRADED 0 0 0 >> raidz2-0 DEGRADED 0 0 0 >> replacing-0 REMOVED 0 0 0 >> 15364271088212071398 REMOVED 0 0 0 was >> /dev/da0p3/old >> da0p3 ONLINE 0 0 0 >> (resilvering) >> da1p3 ONLINE 0 0 0 >> da2p3 ONLINE 0 0 0 >> da3p3 ONLINE 0 0 0 >> da4p3 ONLINE 0 0 0 >> da5p3 ONLINE 0 0 0 >> da6p3 ONLINE 0 0 0 >> da7p3 ONLINE 0 0 0 >> >> errors: No known data errors >> >> Zpool status later in the process: >> root@backup /tmp]# zpool status >> pool: tank >> state: DEGRADED >> status: One or more devices is currently being resilvered. The pool will >> continue to function, possibly in a degraded state. >> action: Wait for the resilver to complete. >> scan: resilver in progress since Mon May 7 20:18:34 2012 >> 833G scanned out of 908G at 19.1M/s, 1h7m to go >> 104G resilvered, 91.70% done >> config: >> >> NAME STATE READ WRITE CKSUM >> tank DEGRADED 0 0 0 >> raidz2-0 DEGRADED 0 0 0 >> replacing-0 REMOVED 0 0 0 >> 15364271088212071398 REMOVED 0 0 0 was >> /dev/da0p3/old >> da0p3 ONLINE 0 0 0 >> (resilvering) >> da1p3 ONLINE 0 0 0 >> da2p3 ONLINE 0 0 0 >> da3p3 ONLINE 0 0 0 >> da4p3 ONLINE 0 0 0 >> da5p3 ONLINE 0 0 0 >> da6p3 ONLINE 0 0 0 >> da7p3 ONLINE 0 0 0 >> >> errors: No known data errors >> >> >> Zpool status after resilvering finished: >> root@backup /]# zpool status >> pool: tank >> state: ONLINE >> scan: resilvered 113G in 14h54m with 0 errors on Tue May 8 11:13:31 2012 >> config: >> >> NAME STATE READ WRITE CKSUM >> tank ONLINE 0 0 0 >> raidz2-0 ONLINE 0 0 0 >> da0p3 ONLINE 0 0 0 >> da1p3 ONLINE 0 0 0 >> da2p3 ONLINE 0 0 0 >> da3p3 ONLINE 0 0 0 >> da4p3 ONLINE 0 0 0 >> da5p3 ONLINE 0 0 0 >> da6p3 ONLINE 0 0 0 >> da7p3 ONLINE 0 0 0 >> >> errors: No known data errors >> >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > From owner-freebsd-fs@FreeBSD.ORG Wed May 9 22:04:14 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 34EF7106567A for ; Wed, 9 May 2012 22:04:14 +0000 (UTC) (envelope-from lists@hurricane-ridge.com) Received: from mail-vb0-f54.google.com (mail-vb0-f54.google.com [209.85.212.54]) by mx1.freebsd.org (Postfix) with ESMTP id E43EA8FC18 for ; Wed, 9 May 2012 22:04:13 +0000 (UTC) Received: by vbmv11 with SMTP id v11so1136797vbm.13 for ; Wed, 09 May 2012 15:04:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=UtyDRBuCnoOfD09bl97pkolHm7gwOd8JQ+XFWrXUEn0=; b=MbLiJGnJf6WC3LqeZCqoQ51GnBFV8fleeJLA6ghDl7CVrIZ7ENQnkifkQ54lQo6MCg QSwl8aorgC6Ny55d0HQhHxgl8qMgVHvQUMkHN7q4gsWf9CmSKmFdGBrbnW6wkjp7Fpjb YNatKNNzvU8ZMASEBLaT+rq3lY3thsd42Omv9KR+R1eFGWPwkcZ0YAk6pyuNF/bOIsf8 aFVXt6KqCsZp65Z3TpiAoCjoFLB3Fiqqfcg2S/saAzKRW7DPcm8rJ88iUchBLyqfJCwz +pyJR3rvCudGYuRZjNmfaHa8Cn5v+XM6GgZXRPKjlwoKskdbDjiO48K6uqUJpFmMb+OK eZxA== MIME-Version: 1.0 Received: by 10.52.100.67 with SMTP id ew3mr874556vdb.36.1336601053287; Wed, 09 May 2012 15:04:13 -0700 (PDT) Received: by 10.220.22.199 with HTTP; Wed, 9 May 2012 15:04:13 -0700 (PDT) X-Originating-IP: [98.247.224.125] Date: Wed, 9 May 2012 15:04:13 -0700 Message-ID: From: Andrew Leonard To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQm2Oa11BZev8vdWQGPxusZARaciS3SEWCTiU0MysX7eFrG+nOhFdyizd2gHjKN0OR+jSr5m Subject: Unable to set ACLs on ZFS file system over NFSv4? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 May 2012 22:04:14 -0000 I have a ZFS file system on which I can successfully manipulate ACLs locally, but am unable to do so when it is mounted remotely using NFSv4 on both FreeBSD and Linux (CentOS 5) clients. The system in question is running 8-STABLE: FreeBSD zfs07.example.com 8.2-STABLE FreeBSD 8.2-STABLE #0: Thu Nov 17 17:46:00 PST 2011 root@zfs07.example.com:/usr/obj/usr/src/sys/GENERIC amd64 ACLs can be successfully manipulated locally; e.g. the following returns no error and works as expected: > setfacl -m g:group2:rwxpDaRWcs:fd:allow /tank01/ngs/test.dir The file system is exported as follows in /etc/exports: /tank01/ngs -sec=sys V4: /tank01 -sec=sys On the FreeBSD client, it is mounted using NFSv4, and behaves as follows under the same user (sanitized to "user1", who is in "group1"): > whoami user1 > groups group1 [...] > mount | grep /mnt zfs07b:/ngs on /mnt (newnfs, nfsv4acls) > getfacl /mnt/test2.dir # file: /mnt/test2.dir # owner: user1 # group: group1 group:group1:rwxpDdaARWcCo-:fd----:allow owner@:rwxp--aARWcCo-:------:allow group@:r-x---a-R-c---:------:allow everyone@:r-x---a-R-c---:------:allow > setfacl -m g:group2:rwxpDaRWcs:fd:allow /mnt/test2.dir setfacl: /mnt/test2.dir: acl_set_file() failed: Input/output error In all other respects, ACLs appear to be honored over NFSv4 - the user can access, create, modify and delete files as expected, and ACLs are appropriately inherited - the ACLs just cannot be manipulated. Linux client behavior is functionally identical: > mount | grep /mnt zfs07b:/ngs on /mnt type nfs4 (rw,addr=192.168.x.y) > nfs4_setfacl -a A:gfd:group2:rwxaDdtnNcy test2.dir Failed setxattr operation: Input/output error Is this a misconfiguration on my part, a known limitation, or a bug? More details: > zfs get version tank01/ngs NAME PROPERTY VALUE SOURCE tank01/ngs version 5 - > zpool get version tank01 NAME PROPERTY VALUE SOURCE tank01 version 28 default > zfs get all tank01/ngs NAME PROPERTY VALUE SOURCE tank01/ngs type filesystem - tank01/ngs creation Tue May 1 16:15 2012 - tank01/ngs used 61.6G - tank01/ngs available 4.47T - tank01/ngs referenced 33.8G - tank01/ngs compressratio 4.23x - tank01/ngs mounted yes - tank01/ngs quota none default tank01/ngs reservation none default tank01/ngs recordsize 128K default tank01/ngs mountpoint /tank01/ngs default tank01/ngs sharenfs off default tank01/ngs checksum on default tank01/ngs compression gzip local tank01/ngs atime on default tank01/ngs devices on default tank01/ngs exec on default tank01/ngs setuid off inherited from tank01 tank01/ngs readonly off default tank01/ngs jailed off default tank01/ngs snapdir hidden default tank01/ngs aclmode passthrough local tank01/ngs aclinherit passthrough-x local tank01/ngs canmount on default tank01/ngs xattr off temporary tank01/ngs copies 1 default tank01/ngs version 5 - tank01/ngs utf8only off - tank01/ngs normalization none - tank01/ngs casesensitivity sensitive - tank01/ngs vscan off default tank01/ngs nbmand off default tank01/ngs sharesmb off default tank01/ngs refquota none default tank01/ngs refreservation none default tank01/ngs primarycache all default tank01/ngs secondarycache all default tank01/ngs usedbysnapshots 27.8G - tank01/ngs usedbydataset 33.8G - tank01/ngs usedbychildren 0 - tank01/ngs usedbyrefreservation 0 - tank01/ngs logbias latency default tank01/ngs dedup off default tank01/ngs mlslabel - tank01/ngs sync standard default tank01/ngs refcompressratio 4.14x - > egrep 'nfs|zfs' /etc/rc.conf.local nfscbd_enable="YES" nfs_client_enable="YES" nfsuserd_enable="YES" nfsv4_server_enable="YES" nfs_server_enable="YES" zfs_enable="YES" Thanks, Andy From owner-freebsd-fs@FreeBSD.ORG Thu May 10 04:13:17 2012 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 068F71065670; Thu, 10 May 2012 04:13:17 +0000 (UTC) (envelope-from mm@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id CF5908FC0A; Thu, 10 May 2012 04:13:16 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q4A4DGnW018090; Thu, 10 May 2012 04:13:16 GMT (envelope-from mm@freefall.freebsd.org) Received: (from mm@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q4A4DGwa018086; Thu, 10 May 2012 04:13:16 GMT (envelope-from mm) Date: Thu, 10 May 2012 04:13:16 GMT Message-Id: <201205100413.q4A4DGwa018086@freefall.freebsd.org> To: mm@FreeBSD.org, freebsd-fs@FreeBSD.org, mm@FreeBSD.org From: mm@FreeBSD.org Cc: Subject: Re: kern/167467: [zfs][patch] improve zdb(8) manpage and help. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 May 2012 04:13:17 -0000 Synopsis: [zfs][patch] improve zdb(8) manpage and help. Responsible-Changed-From-To: freebsd-fs->mm Responsible-Changed-By: mm Responsible-Changed-When: Thu May 10 04:13:16 UTC 2012 Responsible-Changed-Why: I'll take it. http://www.freebsd.org/cgi/query-pr.cgi?pr=167467 From owner-freebsd-fs@FreeBSD.ORG Thu May 10 04:13:36 2012 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B704A106566B; Thu, 10 May 2012 04:13:36 +0000 (UTC) (envelope-from mm@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 8A6658FC15; Thu, 10 May 2012 04:13:36 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q4A4DahJ018300; Thu, 10 May 2012 04:13:36 GMT (envelope-from mm@freefall.freebsd.org) Received: (from mm@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q4A4DaeO018295; Thu, 10 May 2012 04:13:36 GMT (envelope-from mm) Date: Thu, 10 May 2012 04:13:36 GMT Message-Id: <201205100413.q4A4DaeO018295@freefall.freebsd.org> To: mm@FreeBSD.org, freebsd-fs@FreeBSD.org, mm@FreeBSD.org From: mm@FreeBSD.org Cc: Subject: Re: kern/167370: [zfs][patch] Unnecessary break point on zfs_main.c. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 May 2012 04:13:36 -0000 Synopsis: [zfs][patch] Unnecessary break point on zfs_main.c. Responsible-Changed-From-To: freebsd-fs->mm Responsible-Changed-By: mm Responsible-Changed-When: Thu May 10 04:13:36 UTC 2012 Responsible-Changed-Why: I'll take it. http://www.freebsd.org/cgi/query-pr.cgi?pr=167370 From owner-freebsd-fs@FreeBSD.ORG Thu May 10 04:13:45 2012 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C64CC1065672; Thu, 10 May 2012 04:13:45 +0000 (UTC) (envelope-from mm@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 9A7D28FC0C; Thu, 10 May 2012 04:13:45 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q4A4Djkj018574; Thu, 10 May 2012 04:13:45 GMT (envelope-from mm@freefall.freebsd.org) Received: (from mm@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q4A4Djll018570; Thu, 10 May 2012 04:13:45 GMT (envelope-from mm) Date: Thu, 10 May 2012 04:13:45 GMT Message-Id: <201205100413.q4A4Djll018570@freefall.freebsd.org> To: mm@FreeBSD.org, freebsd-fs@FreeBSD.org, mm@FreeBSD.org From: mm@FreeBSD.org Cc: Subject: Re: kern/167447: [zfs] [patch] patch to zfs rename -f to perform force unmount. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 May 2012 04:13:45 -0000 Synopsis: [zfs] [patch] patch to zfs rename -f to perform force unmount. Responsible-Changed-From-To: freebsd-fs->mm Responsible-Changed-By: mm Responsible-Changed-When: Thu May 10 04:13:45 UTC 2012 Responsible-Changed-Why: I'll take it. http://www.freebsd.org/cgi/query-pr.cgi?pr=167447 From owner-freebsd-fs@FreeBSD.ORG Thu May 10 13:07:34 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 40A67106564A for ; Thu, 10 May 2012 13:07:34 +0000 (UTC) (envelope-from simon@comsys.ntu-kpi.kiev.ua) Received: from comsys.kpi.ua (comsys.kpi.ua [77.47.192.42]) by mx1.freebsd.org (Postfix) with ESMTP id AC16D8FC15 for ; Thu, 10 May 2012 13:07:33 +0000 (UTC) Received: from pm513-1.comsys.kpi.ua ([10.18.52.101] helo=pm513-1.comsys.ntu-kpi.kiev.ua) by comsys.kpi.ua with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.63) (envelope-from ) id 1SST5f-0002Va-6j; Thu, 10 May 2012 16:07:31 +0300 Received: by pm513-1.comsys.ntu-kpi.kiev.ua (Postfix, from userid 1001) id C01741CC21; Thu, 10 May 2012 16:07:31 +0300 (EEST) Date: Thu, 10 May 2012 16:07:31 +0300 From: Andrey Simonenko To: Rick Macklem Message-ID: <20120510130731.GA72837@pm513-1.comsys.ntu-kpi.kiev.ua> References: <20120507174813.GA5927@pm513-1.comsys.ntu-kpi.kiev.ua> <1357768784.50127.1336434018113.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1357768784.50127.1336434018113.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Mutt/1.5.21 (2010-09-15) X-Authenticated-User: simon@comsys.ntu-kpi.kiev.ua X-Authenticator: plain X-Sender-Verify: SUCCEEDED (sender exists & accepts mail) X-Exim-Version: 4.63 (build at 28-Apr-2011 07:11:12) X-Date: 2012-05-10 16:07:31 X-Connected-IP: 10.18.52.101:44001 X-Message-Linecount: 55 X-Body-Linecount: 39 X-Message-Size: 2410 X-Body-Size: 1685 Cc: freebsd-fs@freebsd.org Subject: Re: NFSv4 Questions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 May 2012 13:07:34 -0000 On Mon, May 07, 2012 at 07:40:18PM -0400, Rick Macklem wrote: > Andrey Simonenko wrote: > > On Sun, Apr 29, 2012 at 04:36:03PM -0400, Rick Macklem wrote: > > > > > > Also, be sure to check "man nfsv4" and maybe reference it (it is > > > currently > > > in the See Also list, but that might not be strong enough). > > > > There is another question not explained in documentation (I could not > > find the answer at least). Currently NFSv3 client uses reserved port > > for NFS mounts and uses non reserved port if "noresvport" is > > specified. > > NFSv4 client always uses non reserved port, ignoring the "resvport" > > option in the mount_nfs command. > > > > Such behaviour of NFS client was introduced in 1.18 version of > > fs/nfsclient/nfs_clvfsops.c [1], where the "resvport" flag is cleared > > for NFSv4 mounts. > > > > Why does "reserved port logic" differ in NFSv3 and NFSv4 clients? > > > It is my understanding that NFSv4 servers are not supposed to require > a "reserved" port#. However, at a quick glance, I can't find that stated > in RFC 3530. (It may be implied by the fact that NFSv4 uses a "user" based > security model and not a "host" based one.) > > As such, the client should never need to "waste" a reserved port# on a NFSv4 > connection. Since AUTH_SYS can be used in NFSv4 as well and according to RFC 3530 AUTH_SYS in NFSv4 has the same logic as in NFSv2/3, then 1. Does "user" based security model mean RPCSEC_GSS? 2. Does "host" based security model mean AUTH_SYS? I did not find any mention about port numbers in RFC 1813 and 3530, looks like that ports numbers range used by NFS clients and checked by NFS server is the implementation decision. From owner-freebsd-fs@FreeBSD.ORG Thu May 10 15:40:13 2012 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id AD4ED106564A for ; Thu, 10 May 2012 15:40:13 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 7EB1A8FC08 for ; Thu, 10 May 2012 15:40:13 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q4AFeD0L067835 for ; Thu, 10 May 2012 15:40:13 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q4AFeDdX067834; Thu, 10 May 2012 15:40:13 GMT (envelope-from gnats) Date: Thu, 10 May 2012 15:40:13 GMT Message-Id: <201205101540.q4AFeDdX067834@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: "Jukka A. Ukkonen" Cc: Subject: kern/167612: [portalfs] The portal file system gets stuck inside portal_open(). ("1 extra fds") X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: "Jukka A. Ukkonen" List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 May 2012 15:40:13 -0000 The following reply was made to PR kern/167612; it has been noted by GNATS. From: "Jukka A. Ukkonen" To: bug-followup@FreeBSD.org, jau@iki.fi Cc: Subject: kern/167612: [portalfs] The portal file system gets stuck inside portal_open(). ("1 extra fds") Date: Thu, 10 May 2012 18:33:49 +0300 This is a multi-part message in MIME format. --------------060204070501010607040700 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit This really was an alignment issue. The old code was not in sync with the alignment done in the CMSG_* macros. Find a patch attached. --jau --------------060204070501010607040700 Content-Type: text/plain; charset=UTF-8; name="portal_vnops.c.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="portal_vnops.c.diff" --- portal_vnops.c.orig 2012-05-08 18:43:17.000000000 +0300 +++ portal_vnops.c 2012-05-10 17:07:55.000000000 +0300 @@ -397,19 +397,47 @@ * than a single mbuf in it. What to do? */ cmsg = mtod(cm, struct cmsghdr *); - newfds = (cmsg->cmsg_len - sizeof(*cmsg)) / sizeof (int); + + /* + * Just in case the sender no longer does what we expect + * and sends something else before or in the worst case + * instead of the file descriptor we expect... + */ + + if ((cmsg->cmsg_level != SOL_SOCKET) + || (cmsg->cmsg_type != SCM_RIGHTS)) { + error = ECONNREFUSED; + goto bad; + } + + /* + * Use the flippin' CMSG_DATA() macro to make sure we use + * the same alignment as the sender. + * Otherwise things go pear shape very easily. + * The bad news is that even faulty code may work on some + * CPU architectures. + */ + + ip = (int *) CMSG_DATA (cmsg); + + newfds = (cmsg->cmsg_len - + ((unsigned char *) ip - + (unsigned char *) cmsg)) / sizeof (int); + if (newfds == 0) { error = ECONNREFUSED; goto bad; } + /* * At this point the rights message consists of a control message * header, followed by a data region containing a vector of * integer file descriptors. The fds were allocated by the action * of receiving the control message. */ - ip = (int *) (cmsg + 1); + fd = *ip++; + if (newfds > 1) { /* * Close extra fds. --------------060204070501010607040700-- From owner-freebsd-fs@FreeBSD.ORG Thu May 10 20:34:17 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6F0AC106566C for ; Thu, 10 May 2012 20:34:17 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 28E108FC0C for ; Thu, 10 May 2012 20:34:17 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap4EAGMlrE+DaFvO/2dsb2JhbABEhXavMIIVAQEEASNWBRYOCgICDRkCWQYTiAkFqFiTAYEviWOFBYEYBJV9kECDBQ X-IronPort-AV: E=Sophos;i="4.75,566,1330923600"; d="scan'208";a="168766046" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 10 May 2012 16:34:16 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id DE70F7941E; Thu, 10 May 2012 16:34:15 -0400 (EDT) Date: Thu, 10 May 2012 16:34:15 -0400 (EDT) From: Rick Macklem To: Andrey Simonenko Message-ID: <901330725.234130.1336682055896.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20120510130731.GA72837@pm513-1.comsys.ntu-kpi.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org Subject: Re: NFSv4 Questions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 May 2012 20:34:17 -0000 Andrey Simonenko wrote: > On Mon, May 07, 2012 at 07:40:18PM -0400, Rick Macklem wrote: > > Andrey Simonenko wrote: > > > On Sun, Apr 29, 2012 at 04:36:03PM -0400, Rick Macklem wrote: > > > > > > > > Also, be sure to check "man nfsv4" and maybe reference it (it is > > > > currently > > > > in the See Also list, but that might not be strong enough). > > > > > > There is another question not explained in documentation (I could > > > not > > > find the answer at least). Currently NFSv3 client uses reserved > > > port > > > for NFS mounts and uses non reserved port if "noresvport" is > > > specified. > > > NFSv4 client always uses non reserved port, ignoring the > > > "resvport" > > > option in the mount_nfs command. > > > > > > Such behaviour of NFS client was introduced in 1.18 version of > > > fs/nfsclient/nfs_clvfsops.c [1], where the "resvport" flag is > > > cleared > > > for NFSv4 mounts. > > > > > > Why does "reserved port logic" differ in NFSv3 and NFSv4 clients? > > > > > It is my understanding that NFSv4 servers are not supposed to > > require > > a "reserved" port#. However, at a quick glance, I can't find that > > stated > > in RFC 3530. (It may be implied by the fact that NFSv4 uses a "user" > > based > > security model and not a "host" based one.) > > > > As such, the client should never need to "waste" a reserved port# on > > a NFSv4 > > connection. > > Since AUTH_SYS can be used in NFSv4 as well and according to RFC 3530 > AUTH_SYS in NFSv4 has the same logic as in NFSv2/3, then > > 1. Does "user" based security model mean RPCSEC_GSS? > > 2. Does "host" based security model mean AUTH_SYS? > My guess is that AUTH_SYS is not considered a security model at all, but the "authenticators" refer to users. I believe the "host" based security model referred to in the RFCs refers to the restrictions implemented by /etc/exports, based on client host IP addresses. I do remember that the IETF working group discussed "reserved port #s" and agreed that requiring one did not enhance security and that NFSv4 servers should not require that a client's port# be within a certain range. (If you were to search the archive for nfsv4@ietf.org, it should be somewhere in there.) However, I agree that this does not seem to be stated in the RFCs, because I couldn't find it when the question came up. (It may be that IETF does not have a definition of a "reserved port#".) Personally, I agree with the working group and have always thought requiring a client to use a "reserved port#" was meaningless. However, I already noted that I don't mind enabling it, with a comment that it should not be required for NFSv4. > I did not find any mention about port numbers in RFC 1813 and 3530, > looks like that ports numbers range used by NFS clients and checked by > NFS server is the implementation decision. During interoperability testing (I'll be at another NFSv4 Bakeathon in June) I have never had a server that would not allow a connection to happen from a non-reserved port# for NFSv4, so I believe that the implementation practice is to not require it for NFSv4. (Consistent with the discussion on nfsv4@ietf.org.) rick From owner-freebsd-fs@FreeBSD.ORG Thu May 10 21:13:40 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 09A7F1065670; Thu, 10 May 2012 21:13:40 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 8C9998FC16; Thu, 10 May 2012 21:13:39 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EANAurE+DaFvO/2dsb2JhbABEhXavMIIVAQEBAwEBAQEgKyALBRYOCgICDRkCKQEJJgYIBwQBHASHaAULqEWSfoEviWMZBIRogRgEk0+CLoERjy+DBYE6AQgR X-IronPort-AV: E=Sophos;i="4.75,567,1330923600"; d="scan'208";a="168771777" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 10 May 2012 17:13:38 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 95B95B3F89; Thu, 10 May 2012 17:13:38 -0400 (EDT) Date: Thu, 10 May 2012 17:13:38 -0400 (EDT) From: Rick Macklem To: Andrew Leonard Message-ID: <1446179418.236280.1336684418582.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org Subject: Re: Unable to set ACLs on ZFS file system over NFSv4? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 May 2012 21:13:40 -0000 Andrew Leonard wrote: > I have a ZFS file system on which I can successfully manipulate ACLs > locally, but am unable to do so when it is mounted remotely using > NFSv4 on both FreeBSD and Linux (CentOS 5) clients. > > The system in question is running 8-STABLE: > > FreeBSD zfs07.example.com 8.2-STABLE FreeBSD 8.2-STABLE #0: Thu Nov 17 > 17:46:00 PST 2011 > root@zfs07.example.com:/usr/obj/usr/src/sys/GENERIC amd64 > > ACLs can be successfully manipulated locally; e.g. the following > returns no error and works as expected: > > > setfacl -m g:group2:rwxpDaRWcs:fd:allow /tank01/ngs/test.dir > > The file system is exported as follows in /etc/exports: > > /tank01/ngs -sec=sys > V4: /tank01 -sec=sys > > On the FreeBSD client, it is mounted using NFSv4, and behaves as > follows under the same user (sanitized to "user1", who is in > "group1"): > > > whoami > user1 > > groups > group1 [...] > > mount | grep /mnt > zfs07b:/ngs on /mnt (newnfs, nfsv4acls) > > getfacl /mnt/test2.dir > # file: /mnt/test2.dir > # owner: user1 > # group: group1 > group:group1:rwxpDdaARWcCo-:fd----:allow > owner@:rwxp--aARWcCo-:------:allow > group@:r-x---a-R-c---:------:allow > everyone@:r-x---a-R-c---:------:allow > > setfacl -m g:group2:rwxpDaRWcs:fd:allow /mnt/test2.dir > setfacl: /mnt/test2.dir: acl_set_file() failed: Input/output error > > In all other respects, ACLs appear to be honored over NFSv4 - the user > can access, create, modify and delete files as expected, and ACLs are > appropriately inherited - the ACLs just cannot be manipulated. > > Linux client behavior is functionally identical: > > > mount | grep /mnt > zfs07b:/ngs on /mnt type nfs4 (rw,addr=192.168.x.y) > > nfs4_setfacl -a A:gfd:group2:rwxaDdtnNcy test2.dir > Failed setxattr operation: Input/output error > > Is this a misconfiguration on my part, a known limitation, or a bug? > As far as I know, it should work. I only use UFS, but my understanding is that ZFS always supports NFSv4 ACLs. If you capture a packet trace from before you do the NFSv4 mount, I can take a look and see what the server is saying. (Basically, at mount time a reply to a Getattr should including the supported attributes and that should include the ACL bit. Then the setfacl becomes a Setattr of the ACL attribute.) # tcpdump -s 0 -w acl.pcap host - run on the client should do it If you want to look at it, use wireshark. If you want me to look, just email acl.pcap as an attachment. rick ps: Although I suspect it is the server that isn't behaving, please use the FreeBSD client for the above. pss: I've cc'd trasz@ in case he can spot some reason why it wouldn't work. > More details: > > > zfs get version tank01/ngs > NAME PROPERTY VALUE SOURCE > tank01/ngs version 5 - > > zpool get version tank01 > NAME PROPERTY VALUE SOURCE > tank01 version 28 default > > zfs get all tank01/ngs > NAME PROPERTY VALUE SOURCE > tank01/ngs type filesystem - > tank01/ngs creation Tue May 1 16:15 2012 - > tank01/ngs used 61.6G - > tank01/ngs available 4.47T - > tank01/ngs referenced 33.8G - > tank01/ngs compressratio 4.23x - > tank01/ngs mounted yes - > tank01/ngs quota none default > tank01/ngs reservation none default > tank01/ngs recordsize 128K default > tank01/ngs mountpoint /tank01/ngs default > tank01/ngs sharenfs off default > tank01/ngs checksum on default > tank01/ngs compression gzip local > tank01/ngs atime on default > tank01/ngs devices on default > tank01/ngs exec on default > tank01/ngs setuid off inherited from tank01 > tank01/ngs readonly off default > tank01/ngs jailed off default > tank01/ngs snapdir hidden default > tank01/ngs aclmode passthrough local > tank01/ngs aclinherit passthrough-x local > tank01/ngs canmount on default > tank01/ngs xattr off temporary > tank01/ngs copies 1 default > tank01/ngs version 5 - > tank01/ngs utf8only off - > tank01/ngs normalization none - > tank01/ngs casesensitivity sensitive - > tank01/ngs vscan off default > tank01/ngs nbmand off default > tank01/ngs sharesmb off default > tank01/ngs refquota none default > tank01/ngs refreservation none default > tank01/ngs primarycache all default > tank01/ngs secondarycache all default > tank01/ngs usedbysnapshots 27.8G - > tank01/ngs usedbydataset 33.8G - > tank01/ngs usedbychildren 0 - > tank01/ngs usedbyrefreservation 0 - > tank01/ngs logbias latency default > tank01/ngs dedup off default > tank01/ngs mlslabel - > tank01/ngs sync standard default > tank01/ngs refcompressratio 4.14x - > > egrep 'nfs|zfs' /etc/rc.conf.local > nfscbd_enable="YES" > nfs_client_enable="YES" > nfsuserd_enable="YES" > nfsv4_server_enable="YES" > nfs_server_enable="YES" > zfs_enable="YES" > > Thanks, > Andy > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Thu May 10 21:23:13 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id CFD881065678 for ; Thu, 10 May 2012 21:23:13 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 74A2D8FC12 for ; Thu, 10 May 2012 21:23:13 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EACIxrE+DaFvO/2dsb2JhbABEhXavMIIVAQEBAwEBAQEgKyALGw4KAgINGQIpAQkmBggHBAEcBIdoBQuoSpJ9gS+JYxQFBIRogRgEk0+CLoERjy+DBYE6AQgR X-IronPort-AV: E=Sophos;i="4.75,567,1330923600"; d="scan'208";a="168772960" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 10 May 2012 17:23:12 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 92330B40D6; Thu, 10 May 2012 17:23:12 -0400 (EDT) Date: Thu, 10 May 2012 17:23:12 -0400 (EDT) From: Rick Macklem To: Andrew Leonard Message-ID: <353146957.236642.1336684992583.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <1446179418.236280.1336684418582.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org Subject: Re: Unable to set ACLs on ZFS file system over NFSv4? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 May 2012 21:23:13 -0000 I wrote: > Andrew Leonard wrote: > > I have a ZFS file system on which I can successfully manipulate ACLs > > locally, but am unable to do so when it is mounted remotely using > > NFSv4 on both FreeBSD and Linux (CentOS 5) clients. > > > > The system in question is running 8-STABLE: > > > > FreeBSD zfs07.example.com 8.2-STABLE FreeBSD 8.2-STABLE #0: Thu Nov > > 17 > > 17:46:00 PST 2011 > > root@zfs07.example.com:/usr/obj/usr/src/sys/GENERIC amd64 > > > > ACLs can be successfully manipulated locally; e.g. the following > > returns no error and works as expected: > > > > > setfacl -m g:group2:rwxpDaRWcs:fd:allow /tank01/ngs/test.dir > > > > The file system is exported as follows in /etc/exports: > > > > /tank01/ngs -sec=sys > > V4: /tank01 -sec=sys > > > > On the FreeBSD client, it is mounted using NFSv4, and behaves as > > follows under the same user (sanitized to "user1", who is in > > "group1"): > > > > > whoami > > user1 > > > groups > > group1 [...] > > > mount | grep /mnt > > zfs07b:/ngs on /mnt (newnfs, nfsv4acls) > > > getfacl /mnt/test2.dir > > # file: /mnt/test2.dir > > # owner: user1 > > # group: group1 > > group:group1:rwxpDdaARWcCo-:fd----:allow > > owner@:rwxp--aARWcCo-:------:allow > > group@:r-x---a-R-c---:------:allow > > everyone@:r-x---a-R-c---:------:allow > > > setfacl -m g:group2:rwxpDaRWcs:fd:allow /mnt/test2.dir > > setfacl: /mnt/test2.dir: acl_set_file() failed: Input/output error > > > > In all other respects, ACLs appear to be honored over NFSv4 - the > > user > > can access, create, modify and delete files as expected, and ACLs > > are > > appropriately inherited - the ACLs just cannot be manipulated. > > > > Linux client behavior is functionally identical: > > > > > mount | grep /mnt > > zfs07b:/ngs on /mnt type nfs4 (rw,addr=192.168.x.y) > > > nfs4_setfacl -a A:gfd:group2:rwxaDdtnNcy test2.dir > > Failed setxattr operation: Input/output error > > > > Is this a misconfiguration on my part, a known limitation, or a bug? > > > As far as I know, it should work. I only use UFS, but my understanding > is that ZFS always supports NFSv4 ACLs. > > If you capture a packet trace from before you do the NFSv4 mount, I > can > take a look and see what the server is saying. (Basically, at mount > time > a reply to a Getattr should including the supported attributes and > that > should include the ACL bit. Then the setfacl becomes a Setattr of the > ACL > attribute.) > # tcpdump -s 0 -w acl.pcap host > - run on the client should do it > > If you want to look at it, use wireshark. If you want me to look, just > email acl.pcap as an attachment. > > rick > ps: Although I suspect it is the server that isn't behaving, please > use > the FreeBSD client for the above. > pss: I've cc'd trasz@ in case he can spot some reason why it wouldn't > work. > Oh, and make sure "user1" isn't in more than 16 groups, because that is the limit for AUTH_SYS. (I'm not sure what the effect of user1 being in more than 16 groups would be, but might as well eliminate it as a cause.) > > More details: > > > > > zfs get version tank01/ngs > > NAME PROPERTY VALUE SOURCE > > tank01/ngs version 5 - > > > zpool get version tank01 > > NAME PROPERTY VALUE SOURCE > > tank01 version 28 default > > > zfs get all tank01/ngs > > NAME PROPERTY VALUE SOURCE > > tank01/ngs type filesystem - > > tank01/ngs creation Tue May 1 16:15 2012 - > > tank01/ngs used 61.6G - > > tank01/ngs available 4.47T - > > tank01/ngs referenced 33.8G - > > tank01/ngs compressratio 4.23x - > > tank01/ngs mounted yes - > > tank01/ngs quota none default > > tank01/ngs reservation none default > > tank01/ngs recordsize 128K default > > tank01/ngs mountpoint /tank01/ngs default > > tank01/ngs sharenfs off default > > tank01/ngs checksum on default > > tank01/ngs compression gzip local > > tank01/ngs atime on default > > tank01/ngs devices on default > > tank01/ngs exec on default > > tank01/ngs setuid off inherited from tank01 > > tank01/ngs readonly off default > > tank01/ngs jailed off default > > tank01/ngs snapdir hidden default > > tank01/ngs aclmode passthrough local > > tank01/ngs aclinherit passthrough-x local > > tank01/ngs canmount on default > > tank01/ngs xattr off temporary > > tank01/ngs copies 1 default > > tank01/ngs version 5 - > > tank01/ngs utf8only off - > > tank01/ngs normalization none - > > tank01/ngs casesensitivity sensitive - > > tank01/ngs vscan off default > > tank01/ngs nbmand off default > > tank01/ngs sharesmb off default > > tank01/ngs refquota none default > > tank01/ngs refreservation none default > > tank01/ngs primarycache all default > > tank01/ngs secondarycache all default > > tank01/ngs usedbysnapshots 27.8G - > > tank01/ngs usedbydataset 33.8G - > > tank01/ngs usedbychildren 0 - > > tank01/ngs usedbyrefreservation 0 - > > tank01/ngs logbias latency default > > tank01/ngs dedup off default > > tank01/ngs mlslabel - > > tank01/ngs sync standard default > > tank01/ngs refcompressratio 4.14x - > > > egrep 'nfs|zfs' /etc/rc.conf.local > > nfscbd_enable="YES" > > nfs_client_enable="YES" > > nfsuserd_enable="YES" > > nfsv4_server_enable="YES" > > nfs_server_enable="YES" > > zfs_enable="YES" > > > > Thanks, > > Andy > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to > > "freebsd-fs-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Fri May 11 08:25:56 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F07D11065673 for ; Fri, 11 May 2012 08:25:56 +0000 (UTC) (envelope-from karl.oulmi@ibl.fr) Received: from marisse.ibl.fr (marisse.ibl.fr [193.49.178.19]) by mx1.freebsd.org (Postfix) with ESMTP id 939178FC1C for ; Fri, 11 May 2012 08:25:56 +0000 (UTC) X-Virus-Scanned: amavisd-new at ibl.fr Message-ID: <4FACCAEB.8040401@ibl.fr> Date: Fri, 11 May 2012 10:16:43 +0200 From: Karl Oulmi MIME-Version: 1.0 To: freebsd-fs@freebsd.org Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha1; boundary="------------ms040105020606070700040406" X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Best practice for shared volume with iscsi Dell MD3200i ? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 May 2012 08:25:57 -0000 This is a cryptographically signed message in MIME format. --------------ms040105020606070700040406 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Hi all, I am trying to run two freebsd9 boxes with a 3.7 TO shared iscsi volume=20 on a MD3200i. The goal is to run a "master" and a "slave" dovecot IMAP server with a=20 shared /home. I created the shared partition like this : gpart create -s gpt /dev/da0 gpart add -t freebsd-ufs /dev/da0 newfs /dev/da0p1 Everything is working great on the "master" server, but when I'm trying=20 to mount the volume from the "slave" one, I have the following error : mount: /dev/da0p1 : Operation not permitted The only way I have to successfully mount the share on the "slave"=20 server is to run a fsck -t ufs /dev/da0p1 and then do the mount. Could anyone tell me what's wrong ? Regards, Karl --------------ms040105020606070700040406-- From owner-freebsd-fs@FreeBSD.ORG Fri May 11 12:20:23 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id ED8B5106564A for ; Fri, 11 May 2012 12:20:22 +0000 (UTC) (envelope-from simon@comsys.ntu-kpi.kiev.ua) Received: from comsys.kpi.ua (comsys.kpi.ua [77.47.192.42]) by mx1.freebsd.org (Postfix) with ESMTP id 64E7C8FC0C for ; Fri, 11 May 2012 12:20:22 +0000 (UTC) Received: from pm513-1.comsys.kpi.ua ([10.18.52.101] helo=pm513-1.comsys.ntu-kpi.kiev.ua) by comsys.kpi.ua with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.63) (envelope-from ) id 1SSopY-0006BP-2p; Fri, 11 May 2012 15:20:20 +0300 Received: by pm513-1.comsys.ntu-kpi.kiev.ua (Postfix, from userid 1001) id A3AF91CC34; Fri, 11 May 2012 15:20:20 +0300 (EEST) Date: Fri, 11 May 2012 15:20:20 +0300 From: Andrey Simonenko To: Rick Macklem Message-ID: <20120511122020.GA13906@pm513-1.comsys.ntu-kpi.kiev.ua> References: <20120510130731.GA72837@pm513-1.comsys.ntu-kpi.kiev.ua> <901330725.234130.1336682055896.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <901330725.234130.1336682055896.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Mutt/1.5.21 (2010-09-15) X-Authenticated-User: simon@comsys.ntu-kpi.kiev.ua X-Authenticator: plain X-Sender-Verify: SUCCEEDED (sender exists & accepts mail) X-Exim-Version: 4.63 (build at 28-Apr-2011 07:11:12) X-Date: 2012-05-11 15:20:20 X-Connected-IP: 10.18.52.101:52027 X-Message-Linecount: 78 X-Body-Linecount: 62 X-Message-Size: 3530 X-Body-Size: 2804 Cc: freebsd-fs@freebsd.org Subject: Re: NFSv4 Questions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 May 2012 12:20:23 -0000 On Thu, May 10, 2012 at 04:34:15PM -0400, Rick Macklem wrote: > Andrey Simonenko wrote: > > > in RFC 3530. (It may be implied by the fact that NFSv4 uses a "user" > > > based > > > security model and not a "host" based one.) > > > > > > As such, the client should never need to "waste" a reserved port# on > > > a NFSv4 > > > connection. > > > > Since AUTH_SYS can be used in NFSv4 as well and according to RFC 3530 > > AUTH_SYS in NFSv4 has the same logic as in NFSv2/3, then > > > > 1. Does "user" based security model mean RPCSEC_GSS? > > > > 2. Does "host" based security model mean AUTH_SYS? > > > My guess is that AUTH_SYS is not considered a security model at all, > but the "authenticators" refer to users. Probably I wrongly asked the question. I did not mean that some security flavor (eg. AUTH_SYS) is a security model. I wanted to say that NFSv4 allows to use AUTH_SYS security flavor and user credentials are given as is by client's machine, so some form of control by client's IP address is required by the NFSv4 server if a client uses and is allowed to use AUTH_SYS security flavor. Actually this is specified in "16. Security Considerations" from RFC 3530 and AUTH_SYS in NFSv4 is called << "classic" model of machine authentication via IP address checking >>. What do you think about the following idea about configuration? 1. The NFS server for NFSv2/3 clients allows to specify whether their MOUNT MNT, UMNT and UMNTALL RPC requests have to or do not have to come from reserved ports. 2. The NFS server for NFSv2/3/4 clients allows to specify whether their NFS RPC calls: a) do not have to come from reserved ports b) always have to come from reserved ports c) have to come from reserved ports if clients use AUTH_SYS. 3. By default reserved ports are not required for MOUNT RPC and NFS RPC calls. Corresponding options can be used for entire file system and/or for single address specification. First item obviously is checked in a user space and second item is checked in the NFS server somewhere after VFS_CHECKEXP() when the server decides which security flavor to use. NetBSD already has -noresvmnt and -noresvport options in their exports(5). > Personally, I agree with the working group and have always thought requiring > a client to use a "reserved port#" was meaningless. However, I already noted > that I don't mind enabling it, with a comment that it should not be required > for NFSv4. If a client machine is trusted, then reserved ports can guaranty that requests come from privileged processes and not from user space where client can fill any credentials in AUTH_SYS. If client machine is not trusted, then this will not work of course. BTW mountd requires reserved port and NFS server does not required reserved port by default. From owner-freebsd-fs@FreeBSD.ORG Fri May 11 15:52:28 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4E55B1065672 for ; Fri, 11 May 2012 15:52:28 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from noop.in-addr.com (mail.in-addr.com [IPv6:2001:470:8:162::1]) by mx1.freebsd.org (Postfix) with ESMTP id 1AF4C8FC0C for ; Fri, 11 May 2012 15:52:28 +0000 (UTC) Received: from gjp by noop.in-addr.com with local (Exim 4.77 (FreeBSD)) (envelope-from ) id 1SSs8I-000GNF-1r; Fri, 11 May 2012 11:51:54 -0400 Date: Fri, 11 May 2012 11:51:53 -0400 From: Gary Palmer To: Karl Oulmi Message-ID: <20120511155153.GA31698@in-addr.com> References: <4FACCAEB.8040401@ibl.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4FACCAEB.8040401@ibl.fr> X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: gpalmer@freebsd.org X-SA-Exim-Scanned: No (on noop.in-addr.com); SAEximRunCond expanded to false Cc: freebsd-fs@freebsd.org Subject: Re: Best practice for shared volume with iscsi Dell MD3200i ? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 May 2012 15:52:28 -0000 On Fri, May 11, 2012 at 10:16:43AM +0200, Karl Oulmi wrote: > Hi all, > > I am trying to run two freebsd9 boxes with a 3.7 TO shared iscsi volume > on a MD3200i. > > The goal is to run a "master" and a "slave" dovecot IMAP server with a > shared /home. > > I created the shared partition like this : > gpart create -s gpt /dev/da0 > gpart add -t freebsd-ufs /dev/da0 > newfs /dev/da0p1 > > Everything is working great on the "master" server, but when I'm trying > to mount the volume from the "slave" one, I have the following error : > mount: /dev/da0p1 : Operation not permitted > > The only way I have to successfully mount the share on the "slave" > server is to run a fsck -t ufs /dev/da0p1 and then do the mount. > > Could anyone tell me what's wrong ? UFS is not a cluster-aware filesystem. You cannot mount it in multiple places at the same time. The best you can hope for in that situation, short of developing a cluster-aware filesystem, is to only mount the volume on the slave if the master fails. Regards, Gary From owner-freebsd-fs@FreeBSD.ORG Fri May 11 21:20:45 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E60731065670 for ; Fri, 11 May 2012 21:20:45 +0000 (UTC) (envelope-from lists@hurricane-ridge.com) Received: from mail-pz0-f54.google.com (mail-pz0-f54.google.com [209.85.210.54]) by mx1.freebsd.org (Postfix) with ESMTP id B48038FC08 for ; Fri, 11 May 2012 21:20:45 +0000 (UTC) Received: by dadv36 with SMTP id v36so4144504dad.13 for ; Fri, 11 May 2012 14:20:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:in-reply-to:references:date :message-id:subject:from:to:cc:content-type:x-gm-message-state; bh=MCU4wzuib4VSpaCG4fjW420lc6gh9aPEqN5H5WOH5NQ=; b=mGE2rAQyJbD4Tur8L6mhcSWbolGv9w5GT8gp2aIqgMoUv+dMr01onVgSkLtyYg7Ujg RrAUw2AmpxDrViKLxj+CmKb/qeWOAJdsip+qQC6o0+e5rNGBf0sYKbgjFhaqOKRESRPP gmbU89FPn5kMhji7a38GPm8ZZqIcZ2ygu9hpx8h5UhFb8jZ04xdU729MQ0SsPqhv706n WH6iZVWla9ojtBTbp5MdKYSsmg6ykzDhpCUPqcFOt11t0auLBe8sPmWfFeGeY5JCCuBs qygtWpNMfsC5Q/+pS5dIOubeWRFoSgTIsozz9n3fslcTDgxsEezWikxUNMPoXUy49SDd fO2Q== MIME-Version: 1.0 Received: by 10.68.231.195 with SMTP id ti3mr34901066pbc.96.1336771245287; Fri, 11 May 2012 14:20:45 -0700 (PDT) Received: by 10.68.195.166 with HTTP; Fri, 11 May 2012 14:20:45 -0700 (PDT) X-Originating-IP: [209.124.184.194] In-Reply-To: <353146957.236642.1336684992583.JavaMail.root@erie.cs.uoguelph.ca> References: <1446179418.236280.1336684418582.JavaMail.root@erie.cs.uoguelph.ca> <353146957.236642.1336684992583.JavaMail.root@erie.cs.uoguelph.ca> Date: Fri, 11 May 2012 14:20:45 -0700 Message-ID: From: Andrew Leonard To: Rick Macklem Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQnHvOhbcOnxHx0EAENoHxOlS0H4JyfF1nnvGoAdjtifQ7tb8u3uTRcTm0KpbP+vJOwkaMu5 Cc: freebsd-fs@freebsd.org Subject: Re: Unable to set ACLs on ZFS file system over NFSv4? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 May 2012 21:20:46 -0000 On Thu, May 10, 2012 at 2:23 PM, Rick Macklem wrote: > I wrote: >> If you capture a packet trace from before you do the NFSv4 mount, I >> can >> take a look and see what the server is saying. (Basically, at mount >> time >> a reply to a Getattr should including the supported attributes and >> that >> should include the ACL bit. Then the setfacl becomes a Setattr of the >> ACL >> attribute.) >> # tcpdump -s 0 -w acl.pcap host >> - run on the client should do it >> >> If you want to look at it, use wireshark. If you want me to look, just >> email acl.pcap as an attachment. >> >> rick >> ps: Although I suspect it is the server that isn't behaving, please >> use >> the FreeBSD client for the above. >> pss: I've cc'd trasz@ in case he can spot some reason why it wouldn't >> work. >> > Oh, and make sure "user1" isn't in more than 16 groups, because that is the > limit for AUTH_SYS. (I'm not sure what the effect of user1 being in more > than 16 groups would be, but might as well eliminate it as a cause.) Thanks, Rick - I'll send the pcap over private email, as I'm sure $DAYJOB would consider it somewhat sensitive. Looking in wireshark, if I'm reading it correctly, I don't see anything for FATTR4_ACL in any replies. On the final connection, I do see NFS4ERR_IO set as the status for the reply to the setattr - but from Googling, my understanding is that response is supposed to indicate a hard error, such as a hardware problem. Also, I have verified that "user1" is not a member of more than 16 groups, so we can rule that out - that user is in only three groups. -Andy From owner-freebsd-fs@FreeBSD.ORG Fri May 11 21:50:17 2012 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7FCB4106564A for ; Fri, 11 May 2012 21:50:17 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 5F7E78FC1D for ; Fri, 11 May 2012 21:50:17 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q4BLoHbK097624 for ; Fri, 11 May 2012 21:50:17 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q4BLoHUD097623; Fri, 11 May 2012 21:50:17 GMT (envelope-from gnats) Date: Fri, 11 May 2012 21:50:17 GMT Message-Id: <201205112150.q4BLoHUD097623@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Jeff Kletsky Cc: Subject: Re: kern/167685: [zfs] ZFS on USB drive prevents shutdown / reboot X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Jeff Kletsky List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 May 2012 21:50:17 -0000 The following reply was made to PR kern/167685; it has been noted by GNATS. From: Jeff Kletsky To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/167685: [zfs] ZFS on USB drive prevents shutdown / reboot Date: Fri, 11 May 2012 14:41:03 -0700 This is a multi-part message in MIME format. --------------020209050805030409070009 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Problem can be replicated by booting of a "memstick" (with a "spare" USB stick as /dev/da1) and then executing # dd if=/dev/zer of=/dev/da1 bs=64k # zpool create stick /dev/da1 # reboot Problem has been reliably reproduced on the Atom 330 previously mentioned, as well as on an AMD A8-3870 with A75 chipset. It also can be replicated using VirtualBox running under Ubuntu on the AMD A8-3870 system. It does not seem specific to one "flavor" of USB controller or driver. Using /usr/src/release/generate_release.sh and bisection, I have confirmed that * r227445 does not exhibit the behavior ("Copy stable/9 to releng/9.0 as part of the FreeBSD 9.0-RELEASE release cycle) * r229097 does not exhibit the behavior * r229281 -- FAIL by not rebooting under the conditions described above. Based on these results, I am suspicious of r229100 | hselasky | 2011-12-31 06:33:15 -0800 (Sat, 31 Dec 2011) | 6 lines MFC r228709, r228711 and r228723: - Add missing unlock of USB controller's lock, when doing shutdown, suspend and resume. - Add code to wait for USB shutdown to be executed at system shutdown. - Add sysctl which can be used to skip this waiting. as being what brought the issue to the forefront. I am presently building r229099 and r229100 to confirm this suspicion. A potential, though untested workaround would be # sysctl hw.usb.no_shutdown_wait=1 --------------020209050805030409070009 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Problem can be replicated by booting of a "memstick" (with a "spare" USB stick as /dev/da1) and then executing

# dd if=/dev/zer of=/dev/da1 bs=64k
# zpool create stick /dev/da1
# reboot

Problem has been reliably reproduced on the Atom 330 previously mentioned, as well as on an AMD A8-3870 with A75 chipset. It also can be replicated using VirtualBox running under Ubuntu on the AMD A8-3870 system. It does not seem specific to one "flavor" of USB controller or driver.

Using /usr/src/release/generate_release.sh and bisection, I have confirmed that

* r227445 does not exhibit the behavior ("Copy stable/9 to releng/9.0 as part of the FreeBSD 9.0-RELEASE release cycle)
* r229097 does not exhibit the behavior
* r229281 -- FAIL by not rebooting under the conditions described above.

Based on these results, I am suspicious of

r229100 | hselasky | 2011-12-31 06:33:15 -0800 (Sat, 31 Dec 2011) | 6 lines
 
 MFC r228709, r228711 and r228723:
 - Add missing unlock of USB controller's lock, when
 doing shutdown, suspend and resume.
 - Add code to wait for USB shutdown to be executed at system shutdown.
 - Add sysctl which can be used to skip this waiting.

as being what brought the issue to the forefront.

I am presently building r229099 and r229100 to confirm this suspicion.

A potential, though untested workaround would be
# sysctl hw.usb.no_shutdown_wait=1


--------------020209050805030409070009-- From owner-freebsd-fs@FreeBSD.ORG Fri May 11 22:32:16 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 10AEF106564A; Fri, 11 May 2012 22:32:16 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 160C48FC0A; Fri, 11 May 2012 22:32:14 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id BAA24015; Sat, 12 May 2012 01:32:12 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1SSyNf-00056b-Lu; Sat, 12 May 2012 01:32:11 +0300 Message-ID: <4FAD9368.5010008@FreeBSD.org> Date: Sat, 12 May 2012 01:32:08 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:12.0) Gecko/20120503 Thunderbird/12.0.1 MIME-Version: 1.0 To: freebsd-hackers@FreeBSD.org, freebsd-fs@FreeBSD.org References: <4F8999D2.1080902@FreeBSD.org> <4F8E820B.6080400@FreeBSD.org> In-Reply-To: <4F8E820B.6080400@FreeBSD.org> X-Enigmail-Version: 1.5pre Content-Type: text/plain; charset=x-viet-vps Content-Transfer-Encoding: 7bit Cc: Subject: Re: [review request] zfsboot/zfsloader: support accessing filesystems within a pool X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 May 2012 22:32:16 -0000 After all the preparatory changes are committed, this is a final[*] notice/warning that I am going to start committing the following patchset really soon now[**[: http://people.freebsd.org/~avg/zfsboot.patches.9.diff [*] unless circumstances change [**] maybe next hour, even -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Sat May 12 01:45:17 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A55B5106566B for ; Sat, 12 May 2012 01:45:17 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 5D9B08FC14 for ; Sat, 12 May 2012 01:45:17 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap4EAI2/rU+DaFvO/2dsb2JhbABEhXmufoIVAQEEASNWBRYOCgICDRkCWQaIHAWoRJJLgS+JaIRwgRgElX2QQIMF X-IronPort-AV: E=Sophos;i="4.75,574,1330923600"; d="scan'208";a="168933545" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 11 May 2012 21:45:11 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 32772B3EFE; Fri, 11 May 2012 21:45:11 -0400 (EDT) Date: Fri, 11 May 2012 21:45:11 -0400 (EDT) From: Rick Macklem To: Andrey Simonenko Message-ID: <1493074817.296570.1336787111152.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20120511122020.GA13906@pm513-1.comsys.ntu-kpi.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org Subject: Re: NFSv4 Questions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 May 2012 01:45:17 -0000 Andrey Simonenko wrote: > On Thu, May 10, 2012 at 04:34:15PM -0400, Rick Macklem wrote: > > Andrey Simonenko wrote: > > > > > in RFC 3530. (It may be implied by the fact that NFSv4 uses a > > > > "user" > > > > based > > > > security model and not a "host" based one.) > > > > > > > > As such, the client should never need to "waste" a reserved > > > > port# on > > > > a NFSv4 > > > > connection. > > > > > > Since AUTH_SYS can be used in NFSv4 as well and according to RFC > > > 3530 > > > AUTH_SYS in NFSv4 has the same logic as in NFSv2/3, then > > > > > > 1. Does "user" based security model mean RPCSEC_GSS? > > > > > > 2. Does "host" based security model mean AUTH_SYS? > > > > > My guess is that AUTH_SYS is not considered a security model at all, > > but the "authenticators" refer to users. > > Probably I wrongly asked the question. I did not mean that some > security > flavor (eg. AUTH_SYS) is a security model. I wanted to say that NFSv4 > allows to use AUTH_SYS security flavor and user credentials are given > as is by client's machine, so some form of control by client's IP > address > is required by the NFSv4 server if a client uses and is allowed to use > AUTH_SYS security flavor. Actually this is specified in "16. Security > Considerations" from RFC 3530 and AUTH_SYS in NFSv4 is called << > "classic" > model of machine authentication via IP address checking >>. > > What do you think about the following idea about configuration? > > 1. The NFS server for NFSv2/3 clients allows to specify whether their > MOUNT MNT, UMNT and UMNTALL RPC requests have to or do not have to > come > from reserved ports. > > 2. The NFS server for NFSv2/3/4 clients allows to specify whether > their > NFS RPC calls: > a) do not have to come from reserved ports > b) always have to come from reserved ports > c) have to come from reserved ports if clients use AUTH_SYS. > > 3. By default reserved ports are not required for MOUNT RPC and > NFS RPC calls. Corresponding options can be used for entire file > system and/or for single address specification. > > First item obviously is checked in a user space and second item is > checked > in the NFS server somewhere after VFS_CHECKEXP() when the server > decides > which security flavor to use. > One problem with this is that some NFSv4 operations do not have any file handle and, as such, cannot be associated with any exported file system. (I suppose you could add the export option for resvport to the V4: line like I did with "-sec" for these operations, but it will get messy.) > NetBSD already has -noresvmnt and -noresvport options in their > exports(5). > I'll let others comment w.r.t. whether they have a need for this. To me, unless others are saying "we need this", I don't see any reason to change what is already there, except maybe optionally require a reserved port# for NFSv4 mounts via a sysctl. I comment on this further down. > > Personally, I agree with the working group and have always thought > > requiring > > a client to use a "reserved port#" was meaningless. However, I > > already noted > > that I don't mind enabling it, with a comment that it should not be > > required > > for NFSv4. > > If a client machine is trusted, then reserved ports can guaranty that > requests come from privileged processes and not from user space where > client can fill any credentials in AUTH_SYS. If client machine is not > trusted, then this will not work of course. BTW mountd requires > reserved > port and NFS server does not required reserved port by default. Well, I agree that, if you have a client machine where "root" is secure (no root kit vunerabilities, etc) but non-root users on this machine would potentially run their own bogus userland NFS client, then requiring a reserved port# does subvert the use of such a bogus NFS client. (My concern is that some people will think that requiring a reserved port# makes NFS secure for other cases, like users with their own laptops/desktops.) Personally, I think the above case is rare and that having another sysctl vfs.nfsd.nfsv4_privport (similar to vfs.nfsd.nfs_privport) is sufficient, but I'll let others comment on this, since it is not my decision. rick From owner-freebsd-fs@FreeBSD.ORG Sat May 12 02:30:57 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C192C106566C; Sat, 12 May 2012 02:30:57 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 629888FC0A; Sat, 12 May 2012 02:30:57 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap4EAPzKrU+DaFvO/2dsb2JhbABEhXmufoIOBwEBBAEjVgUWDgoRGQIEVQYThUkHgjkFqEySSIsXFIRcgRgEjneHBpBAgwWBOwg X-IronPort-AV: E=Sophos;i="4.75,574,1330923600"; d="scan'208";a="171638749" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 11 May 2012 22:30:51 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 22663B3F86; Fri, 11 May 2012 22:30:51 -0400 (EDT) Date: Fri, 11 May 2012 22:30:51 -0400 (EDT) From: Rick Macklem To: Andrew Leonard Message-ID: <1831201709.296992.1336789851115.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_296991_491013469.1336789851113" X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org Subject: Re: Unable to set ACLs on ZFS file system over NFSv4? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 May 2012 02:30:57 -0000 ------=_Part_296991_491013469.1336789851113 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Andrew Leonard wrote: > On Thu, May 10, 2012 at 2:23 PM, Rick Macklem > wrote: > > > I wrote: > > >> If you capture a packet trace from before you do the NFSv4 mount, I > >> can > >> take a look and see what the server is saying. (Basically, at mount > >> time > >> a reply to a Getattr should including the supported attributes and > >> that > >> should include the ACL bit. Then the setfacl becomes a Setattr of > >> the > >> ACL > >> attribute.) > >> # tcpdump -s 0 -w acl.pcap host > >> - run on the client should do it > >> > >> If you want to look at it, use wireshark. If you want me to look, > >> just > >> email acl.pcap as an attachment. > >> > >> rick > >> ps: Although I suspect it is the server that isn't behaving, please > >> use > >> the FreeBSD client for the above. > >> pss: I've cc'd trasz@ in case he can spot some reason why it > >> wouldn't > >> work. > >> > > Oh, and make sure "user1" isn't in more than 16 groups, because that > > is the > > limit for AUTH_SYS. (I'm not sure what the effect of user1 being in > > more > > than 16 groups would be, but might as well eliminate it as a cause.) > > Thanks, Rick - I'll send the pcap over private email, as I'm sure > $DAYJOB would consider it somewhat sensitive. > > Looking in wireshark, if I'm reading it correctly, I don't see > anything for FATTR4_ACL in any replies. On the final connection, I do > see NFS4ERR_IO set as the status for the reply to the setattr - but > from Googling, my understanding is that response is supposed to > indicate a hard error, such as a hardware problem. > Yep, it appears that ZFS returned an error that isn't in the list of replies for getattr, so it got mapped to EIO (the catch all for error codes not known to NFS). I took a quick look at the ZFS code and the problem looks pretty obvious. ZFS replies EOPNOTSUPP to the VOP_ACLCHECK() and that's as far as it gets. Please try the attached patch in the server (untested, but all it does is go ahead and try the VOP_SETACL() for the case where VOP_ACLCHECK() replies EOPNOTSUPP) and let me know if it helps. Thanks for reporting this and sending the packet trace, rick > Also, I have verified that "user1" is not a member of more than 16 > groups, so we can rule that out - that user is in only three groups. > > -Andy ------=_Part_296991_491013469.1336789851113 Content-Type: text/x-patch; name=zfs-acl.patch Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename=zfs-acl.patch LS0tIGZzL25mcy9uZnNfY29tbW9uYWNsLmMub3JpZwkyMDEyLTA1LTExIDIyOjE5OjMyLjAwMDAw MDAwMCAtMDQwMAorKysgZnMvbmZzL25mc19jb21tb25hY2wuYwkyMDEyLTA1LTExIDIyOjIwOjA5 LjAwMDAwMDAwMCAtMDQwMApAQCAtNDY5LDcgKzQ2OSw3IEBAIG5mc3J2X3NldGFjbCh2bm9kZV90 IHZwLCBORlNBQ0xfVCAqYWNscCwKIAkJZ290byBvdXQ7CiAJfQogCWVycm9yID0gVk9QX0FDTENI RUNLKHZwLCBBQ0xfVFlQRV9ORlM0LCBhY2xwLCBjcmVkLCBwKTsKLQlpZiAoIWVycm9yKQorCWlm IChlcnJvciA9PSAwIHx8IGVycm9yID09IEVPUE5PVFNVUFApCiAJCWVycm9yID0gVk9QX1NFVEFD TCh2cCwgQUNMX1RZUEVfTkZTNCwgYWNscCwgY3JlZCwgcCk7CiAKIG91dDoK ------=_Part_296991_491013469.1336789851113-- From owner-freebsd-fs@FreeBSD.ORG Sat May 12 09:22:35 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AA1AD106566C for ; Sat, 12 May 2012 09:22:35 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id EB4EA8FC08 for ; Sat, 12 May 2012 09:22:34 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id MAA29703; Sat, 12 May 2012 12:22:26 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1ST8Ww-00099F-If; Sat, 12 May 2012 12:22:26 +0300 Message-ID: <4FAE2BD1.9060002@FreeBSD.org> Date: Sat, 12 May 2012 12:22:25 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:12.0) Gecko/20120503 Thunderbird/12.0.1 MIME-Version: 1.0 To: Florian Wagner References: <20111015214347.09f68e4e@naclador.mos32.de> <4E9ACA9F.5090308@FreeBSD.org> <20111019082139.1661868e@auedv3.syscomp.de> <4E9EEF45.9020404@FreeBSD.org> <20111019182130.27446750@naclador.mos32.de> <4EB98E05.4070900@FreeBSD.org> <20111119211921.7ffa9953@naclador.mos32.de> <4EC8CD14.4040600@FreeBSD.org> <20111120121248.5e9773c8@naclador.mos32.de> <4EC91B36.7060107@FreeBSD.org> <20111120191018.1aa4e882@naclador.mos32.de> <4ECA2DBD.5040701@FreeBSD.org> <20111121201332.03ecadf1@naclador.mos32.de> <4ECAC272.5080500@FreeBSD.org> <4ECEBD44.6090900@FreeBSD.org> <20111125224722.6cf3a299@naclador.mos32.de> <4ED0CFF9.4030503@FreeBSD.org> <20111126134927.60fe5097@naclador.mos32.de> <4ED35326.80402@FreeBSD.org> <20120109122011.0ae6ad70@naclador.mos32.de> In-Reply-To: <20120109122011.0ae6ad70@naclador.mos32.de> X-Enigmail-Version: 1.5pre Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org Subject: Re: Extending zfsboot.c to allow selecting filesystem from boot.config X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 May 2012 09:22:35 -0000 on 09/01/2012 13:20 Florian Wagner said the following: > > Do you currently have any plans to merge any of that into stable-9 or > stable-8? I have just committed the code to head. MFC timer is set to 1 month. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Sat May 12 10:06:18 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 979C3106566C for ; Sat, 12 May 2012 10:06:18 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from cpsmtpb-ews09.kpnxchange.com (cpsmtpb-ews09.kpnxchange.com [213.75.39.14]) by mx1.freebsd.org (Postfix) with ESMTP id 229218FC08 for ; Sat, 12 May 2012 10:06:17 +0000 (UTC) Received: from cpsps-ews08.kpnxchange.com ([10.94.84.175]) by cpsmtpb-ews09.kpnxchange.com with Microsoft SMTPSVC(6.0.3790.4675); Sat, 12 May 2012 12:05:10 +0200 Received: from CPSMTPM-TLF103.kpnxchange.com ([195.121.3.6]) by cpsps-ews08.kpnxchange.com with Microsoft SMTPSVC(7.5.7601.17514); Sat, 12 May 2012 12:05:10 +0200 Received: from sjakie.klop.ws ([212.182.167.131]) by CPSMTPM-TLF103.kpnxchange.com with Microsoft SMTPSVC(7.5.7601.17514); Sat, 12 May 2012 12:05:09 +0200 Received: from 212-182-167-131.ip.telfort.nl (localhost [127.0.0.1]) by sjakie.klop.ws (Postfix) with ESMTP id 9C2191157A for ; Sat, 12 May 2012 12:05:09 +0200 (CEST) Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: freebsd-fs@freebsd.org References: <4FACCAEB.8040401@ibl.fr> <20120511155153.GA31698@in-addr.com> Date: Sat, 12 May 2012 12:05:09 +0200 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: "Ronald Klop" Message-ID: In-Reply-To: <20120511155153.GA31698@in-addr.com> User-Agent: Opera Mail/11.62 (FreeBSD) X-OriginalArrivalTime: 12 May 2012 10:05:09.0797 (UTC) FILETIME=[B6D5D550:01CD3026] X-RcptDomain: freebsd.org Subject: Re: Best practice for shared volume with iscsi Dell MD3200i ? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 May 2012 10:06:18 -0000 On Fri, 11 May 2012 17:51:53 +0200, Gary Palmer wrote: > On Fri, May 11, 2012 at 10:16:43AM +0200, Karl Oulmi wrote: >> Hi all, >> >> I am trying to run two freebsd9 boxes with a 3.7 TO shared iscsi volume >> on a MD3200i. >> >> The goal is to run a "master" and a "slave" dovecot IMAP server with a >> shared /home. >> >> I created the shared partition like this : >> gpart create -s gpt /dev/da0 >> gpart add -t freebsd-ufs /dev/da0 >> newfs /dev/da0p1 >> >> Everything is working great on the "master" server, but when I'm trying >> to mount the volume from the "slave" one, I have the following error : >> mount: /dev/da0p1 : Operation not permitted >> >> The only way I have to successfully mount the share on the "slave" >> server is to run a fsck -t ufs /dev/da0p1 and then do the mount. >> >> Could anyone tell me what's wrong ? > > UFS is not a cluster-aware filesystem. You cannot mount it in > multiple places at the same time. The best you can hope for in > that situation, short of developing a cluster-aware filesystem, is > to only mount the volume on the slave if the master fails. > > Regards, > > Gary Or use NFS. From owner-freebsd-fs@FreeBSD.ORG Sat May 12 10:08:28 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CE434106564A for ; Sat, 12 May 2012 10:08:28 +0000 (UTC) (envelope-from ronald-freebsd8@klop.yi.org) Received: from cpsmtpb-ews02.kpnxchange.com (cpsmtpb-ews02.kpnxchange.com [213.75.39.5]) by mx1.freebsd.org (Postfix) with ESMTP id 597AD8FC0C for ; Sat, 12 May 2012 10:08:28 +0000 (UTC) Received: from cpsps-ews12.kpnxchange.com ([10.94.84.179]) by cpsmtpb-ews02.kpnxchange.com with Microsoft SMTPSVC(6.0.3790.4675); Sat, 12 May 2012 12:07:21 +0200 Received: from CPSMTPM-TLF103.kpnxchange.com ([195.121.3.6]) by cpsps-ews12.kpnxchange.com with Microsoft SMTPSVC(7.5.7601.17514); Sat, 12 May 2012 12:07:21 +0200 Received: from sjakie.klop.ws ([212.182.167.131]) by CPSMTPM-TLF103.kpnxchange.com with Microsoft SMTPSVC(7.5.7601.17514); Sat, 12 May 2012 12:07:20 +0200 Received: from 212-182-167-131.ip.telfort.nl (localhost [127.0.0.1]) by sjakie.klop.ws (Postfix) with ESMTP id AB4371157F for ; Sat, 12 May 2012 12:07:20 +0200 (CEST) Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes To: freebsd-fs@freebsd.org References: <201205112150.q4BLoHUD097623@freefall.freebsd.org> Date: Sat, 12 May 2012 12:07:20 +0200 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: "Ronald Klop" Message-ID: In-Reply-To: <201205112150.q4BLoHUD097623@freefall.freebsd.org> User-Agent: Opera Mail/11.62 (FreeBSD) X-OriginalArrivalTime: 12 May 2012 10:07:20.0751 (UTC) FILETIME=[04E3D3F0:01CD3027] X-RcptDomain: freebsd.org Subject: Re: kern/167685: [zfs] ZFS on USB drive prevents shutdown / reboot X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 May 2012 10:08:28 -0000 On Fri, 11 May 2012 23:50:17 +0200, Jeff Kletsky wrote: > The following reply was made to PR kern/167685; it has been noted by > GNATS. > > From: Jeff Kletsky > To: bug-followup@FreeBSD.org > Cc: > Subject: Re: kern/167685: [zfs] ZFS on USB drive prevents shutdown / > reboot > Date: Fri, 11 May 2012 14:41:03 -0700 > > This is a multi-part message in MIME format. > --------------020209050805030409070009 > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > Content-Transfer-Encoding: 7bit > Problem can be replicated by booting of a "memstick" (with a "spare" USB > stick as /dev/da1) and then executing > # dd if=/dev/zer of=/dev/da1 bs=64k > # zpool create stick /dev/da1 > # reboot > Problem has been reliably reproduced on the Atom 330 previously > mentioned, as well as on an AMD A8-3870 with A75 chipset. It also can be > replicated using VirtualBox running under Ubuntu on the AMD A8-3870 > system. It does not seem specific to one "flavor" of USB controller or > driver. > Using /usr/src/release/generate_release.sh and bisection, I have > confirmed that > * r227445 does not exhibit the behavior ("Copy stable/9 to releng/9.0 as > part of the FreeBSD 9.0-RELEASE release cycle) > * r229097 does not exhibit the behavior > * r229281 -- FAIL by not rebooting under the conditions described above. > Based on these results, I am suspicious of > r229100 | hselasky | 2011-12-31 06:33:15 -0800 (Sat, 31 Dec 2011) | 6 > lines > MFC r228709, r228711 and r228723: > - Add missing unlock of USB controller's lock, when > doing shutdown, suspend and resume. > - Add code to wait for USB shutdown to be executed at system shutdown. > - Add sysctl which can be used to skip this waiting. > as being what brought the issue to the forefront. > I am presently building r229099 and r229100 to confirm this suspicion. > A potential, though untested workaround would be > # sysctl hw.usb.no_shutdown_wait=1 I had/have the same problem with ZFS on my external USB backup-disk. I use that sysctl since and can confirm that it works. From owner-freebsd-fs@FreeBSD.ORG Sat May 12 11:18:48 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CCEC1106566B for ; Sat, 12 May 2012 11:18:48 +0000 (UTC) (envelope-from araujobsdport@gmail.com) Received: from mail-ob0-f182.google.com (mail-ob0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 8E7128FC08 for ; Sat, 12 May 2012 11:18:48 +0000 (UTC) Received: by obcni5 with SMTP id ni5so6022772obc.13 for ; Sat, 12 May 2012 04:18:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=TasOHjdSopeb8mAf5CEQ1u22xi+5MlW/ZSKeaLVmJVQ=; b=s0boO1FQbR6SZMcybhXSdIZJwaOTW4Acz4a5IzeacePsY5Rs6dpRdSeNG6ctHIaWna C5BqKffpNi1RCHVkyn81dWEClEJ6MOmmzmf8nU574kf4GGW1xWXiIhQZ6R+rqLY+EVWd x/laNtzwfYcQeh/PRB7piXEH4drifHmPHucPCB8VhHAXJqow0Mg1n4tn/mKTxLzyb+ha RcHltylSHmda5EQW8JKadSzeKPmdyES5IdaA59ifHVL9fD3S7gZm0CT486gisd0vTenH 1HeV/H1akE1AvjP5Khrvn9WtUFn5Z7ItaD4xYgrVH+VMtvhFEPkFby/Si4tIr2SxRg7t h9vw== MIME-Version: 1.0 Received: by 10.50.191.233 with SMTP id hb9mr674988igc.44.1336821528129; Sat, 12 May 2012 04:18:48 -0700 (PDT) Received: by 10.231.31.196 with HTTP; Sat, 12 May 2012 04:18:48 -0700 (PDT) In-Reply-To: References: <4FACCAEB.8040401@ibl.fr> <20120511155153.GA31698@in-addr.com> Date: Sat, 12 May 2012 19:18:48 +0800 Message-ID: From: Marcelo Araujo To: Ronald Klop Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: Best practice for shared volume with iscsi Dell MD3200i ? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: araujo@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 May 2012 11:18:48 -0000 2012/5/12 Ronald Klop > On Fri, 11 May 2012 17:51:53 +0200, Gary Palmer > wrote: > > On Fri, May 11, 2012 at 10:16:43AM +0200, Karl Oulmi wrote: >> >>> Hi all, >>> >>> I am trying to run two freebsd9 boxes with a 3.7 TO shared iscsi volume >>> on a MD3200i. >>> >>> The goal is to run a "master" and a "slave" dovecot IMAP server with a >>> shared /home. >>> >>> I created the shared partition like this : >>> gpart create -s gpt /dev/da0 >>> gpart add -t freebsd-ufs /dev/da0 >>> newfs /dev/da0p1 >>> >>> Everything is working great on the "master" server, but when I'm trying >>> to mount the volume from the "slave" one, I have the following error : >>> mount: /dev/da0p1 : Operation not permitted >>> >>> The only way I have to successfully mount the share on the "slave" >>> server is to run a fsck -t ufs /dev/da0p1 and then do the mount. >>> >>> Could anyone tell me what's wrong ? >>> >> >> UFS is not a cluster-aware filesystem. You cannot mount it in >> multiple places at the same time. The best you can hope for in >> that situation, short of developing a cluster-aware filesystem, is >> to only mount the volume on the slave if the master fails. >> >> Regards, >> >> Gary >> > > Just some questions! Both machines share access to the same DISKS? I mean, both machines can see all disks? If yes, you could use DEVD to detect some kind of fail like CARP or something else and than, do some action like mount the disks on slave and so on. Currently I have this solution and works pretty well. Best Regards, -- Marcelo Araujo araujo@FreeBSD.org From owner-freebsd-fs@FreeBSD.ORG Sat May 12 12:10:12 2012 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9D21F106564A for ; Sat, 12 May 2012 12:10:12 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 875498FC17 for ; Sat, 12 May 2012 12:10:12 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q4CCACpf043067 for ; Sat, 12 May 2012 12:10:12 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q4CCACtL043066; Sat, 12 May 2012 12:10:12 GMT (envelope-from gnats) Date: Sat, 12 May 2012 12:10:12 GMT Message-Id: <201205121210.q4CCACtL043066@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: dfilter@FreeBSD.ORG (dfilter service) Cc: Subject: Re: kern/165923: commit references a PR X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: dfilter service List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 May 2012 12:10:12 -0000 The following reply was made to PR kern/165923; it has been noted by GNATS. From: dfilter@FreeBSD.ORG (dfilter service) To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/165923: commit references a PR Date: Sat, 12 May 2012 12:03:08 +0000 (UTC) Author: rmacklem Date: Sat May 12 12:02:51 2012 New Revision: 235332 URL: http://svn.freebsd.org/changeset/base/235332 Log: PR# 165923 reported intermittent write failures for dirty memory mapped pages being written back on an NFS mount. Since any thread can call VOP_PUTPAGES() to write back a dirty page, the credentials of that thread may not have write access to the file on an NFS server. (Often the uid is 0, which may be mapped to "nobody" in the NFS server.) Although there is no completely correct fix for this (NFS servers check access on every write RPC instead of at open/mmap time), this patch avoids the common cases by holding onto a credential that recently opened the file for writing and uses that credential for the write RPCs being done by VOP_PUTPAGES() for both NFS clients. Tested by: Joel Ray Holveck (joelh at juniper.net) PR: kern/165923 Reviewed by: kib MFC after: 2 weeks Modified: head/sys/fs/nfsclient/nfs_clbio.c head/sys/fs/nfsclient/nfs_clnode.c head/sys/fs/nfsclient/nfs_clvnops.c head/sys/fs/nfsclient/nfsnode.h head/sys/nfsclient/nfs_bio.c head/sys/nfsclient/nfs_node.c head/sys/nfsclient/nfs_vnops.c head/sys/nfsclient/nfsnode.h Modified: head/sys/fs/nfsclient/nfs_clbio.c ============================================================================== --- head/sys/fs/nfsclient/nfs_clbio.c Sat May 12 10:53:49 2012 (r235331) +++ head/sys/fs/nfsclient/nfs_clbio.c Sat May 12 12:02:51 2012 (r235332) @@ -281,7 +281,11 @@ ncl_putpages(struct vop_putpages_args *a vp = ap->a_vp; np = VTONFS(vp); td = curthread; /* XXX */ - cred = curthread->td_ucred; /* XXX */ + /* Set the cred to n_writecred for the write rpcs. */ + if (np->n_writecred != NULL) + cred = crhold(np->n_writecred); + else + cred = crhold(curthread->td_ucred); /* XXX */ nmp = VFSTONFS(vp->v_mount); pages = ap->a_m; count = ap->a_count; @@ -345,6 +349,7 @@ ncl_putpages(struct vop_putpages_args *a iomode = NFSWRITE_FILESYNC; error = ncl_writerpc(vp, &uio, cred, &iomode, &must_commit, 0); + crfree(cred); pmap_qremove(kva, npages); relpbuf(bp, &ncl_pbuf_freecnt); Modified: head/sys/fs/nfsclient/nfs_clnode.c ============================================================================== --- head/sys/fs/nfsclient/nfs_clnode.c Sat May 12 10:53:49 2012 (r235331) +++ head/sys/fs/nfsclient/nfs_clnode.c Sat May 12 12:02:51 2012 (r235332) @@ -300,6 +300,8 @@ ncl_reclaim(struct vop_reclaim_args *ap) FREE((caddr_t)dp2, M_NFSDIROFF); } } + if (np->n_writecred != NULL) + crfree(np->n_writecred); FREE((caddr_t)np->n_fhp, M_NFSFH); if (np->n_v4 != NULL) FREE((caddr_t)np->n_v4, M_NFSV4NODE); Modified: head/sys/fs/nfsclient/nfs_clvnops.c ============================================================================== --- head/sys/fs/nfsclient/nfs_clvnops.c Sat May 12 10:53:49 2012 (r235331) +++ head/sys/fs/nfsclient/nfs_clvnops.c Sat May 12 12:02:51 2012 (r235332) @@ -513,6 +513,7 @@ nfs_open(struct vop_open_args *ap) struct vattr vattr; int error; int fmode = ap->a_mode; + struct ucred *cred; if (vp->v_type != VREG && vp->v_type != VDIR && vp->v_type != VLNK) return (EOPNOTSUPP); @@ -604,7 +605,22 @@ nfs_open(struct vop_open_args *ap) } np->n_directio_opens++; } + + /* + * If this is an open for writing, capture a reference to the + * credentials, so they can be used by ncl_putpages(). Using + * these write credentials is preferable to the credentials of + * whatever thread happens to be doing the VOP_PUTPAGES() since + * the write RPCs are less likely to fail with EACCES. + */ + if ((fmode & FWRITE) != 0) { + cred = np->n_writecred; + np->n_writecred = crhold(ap->a_cred); + } else + cred = NULL; mtx_unlock(&np->n_mtx); + if (cred != NULL) + crfree(cred); vnode_create_vobject(vp, vattr.va_size, ap->a_td); return (0); } Modified: head/sys/fs/nfsclient/nfsnode.h ============================================================================== --- head/sys/fs/nfsclient/nfsnode.h Sat May 12 10:53:49 2012 (r235331) +++ head/sys/fs/nfsclient/nfsnode.h Sat May 12 12:02:51 2012 (r235332) @@ -123,6 +123,7 @@ struct nfsnode { int n_directio_asyncwr; u_int64_t n_change; /* old Change attribute */ struct nfsv4node *n_v4; /* extra V4 stuff */ + struct ucred *n_writecred; /* Cred. for putpages */ }; #define n_atim n_un1.nf_atim Modified: head/sys/nfsclient/nfs_bio.c ============================================================================== --- head/sys/nfsclient/nfs_bio.c Sat May 12 10:53:49 2012 (r235331) +++ head/sys/nfsclient/nfs_bio.c Sat May 12 12:02:51 2012 (r235332) @@ -275,7 +275,11 @@ nfs_putpages(struct vop_putpages_args *a vp = ap->a_vp; np = VTONFS(vp); td = curthread; /* XXX */ - cred = curthread->td_ucred; /* XXX */ + /* Set the cred to n_writecred for the write rpcs. */ + if (np->n_writecred != NULL) + cred = crhold(np->n_writecred); + else + cred = crhold(curthread->td_ucred); /* XXX */ nmp = VFSTONFS(vp->v_mount); pages = ap->a_m; count = ap->a_count; @@ -339,6 +343,7 @@ nfs_putpages(struct vop_putpages_args *a iomode = NFSV3WRITE_FILESYNC; error = (nmp->nm_rpcops->nr_writerpc)(vp, &uio, cred, &iomode, &must_commit); + crfree(cred); pmap_qremove(kva, npages); relpbuf(bp, &nfs_pbuf_freecnt); Modified: head/sys/nfsclient/nfs_node.c ============================================================================== --- head/sys/nfsclient/nfs_node.c Sat May 12 10:53:49 2012 (r235331) +++ head/sys/nfsclient/nfs_node.c Sat May 12 12:02:51 2012 (r235332) @@ -270,6 +270,8 @@ nfs_reclaim(struct vop_reclaim_args *ap) free((caddr_t)dp2, M_NFSDIROFF); } } + if (np->n_writecred != NULL) + crfree(np->n_writecred); if (np->n_fhsize > NFS_SMALLFH) { free((caddr_t)np->n_fhp, M_NFSBIGFH); } Modified: head/sys/nfsclient/nfs_vnops.c ============================================================================== --- head/sys/nfsclient/nfs_vnops.c Sat May 12 10:53:49 2012 (r235331) +++ head/sys/nfsclient/nfs_vnops.c Sat May 12 12:02:51 2012 (r235332) @@ -507,6 +507,7 @@ nfs_open(struct vop_open_args *ap) struct vattr vattr; int error; int fmode = ap->a_mode; + struct ucred *cred; if (vp->v_type != VREG && vp->v_type != VDIR && vp->v_type != VLNK) return (EOPNOTSUPP); @@ -563,7 +564,22 @@ nfs_open(struct vop_open_args *ap) } np->n_directio_opens++; } + + /* + * If this is an open for writing, capture a reference to the + * credentials, so they can be used by nfs_putpages(). Using + * these write credentials is preferable to the credentials of + * whatever thread happens to be doing the VOP_PUTPAGES() since + * the write RPCs are less likely to fail with EACCES. + */ + if ((fmode & FWRITE) != 0) { + cred = np->n_writecred; + np->n_writecred = crhold(ap->a_cred); + } else + cred = NULL; mtx_unlock(&np->n_mtx); + if (cred != NULL) + crfree(cred); vnode_create_vobject(vp, vattr.va_size, ap->a_td); return (0); } Modified: head/sys/nfsclient/nfsnode.h ============================================================================== --- head/sys/nfsclient/nfsnode.h Sat May 12 10:53:49 2012 (r235331) +++ head/sys/nfsclient/nfsnode.h Sat May 12 12:02:51 2012 (r235332) @@ -128,6 +128,7 @@ struct nfsnode { uint32_t n_namelen; int n_directio_opens; int n_directio_asyncwr; + struct ucred *n_writecred; /* Cred. for putpages */ }; #define n_atim n_un1.nf_atim _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Sat May 12 17:40:13 2012 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 0F477106564A for ; Sat, 12 May 2012 17:40:13 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id D4AEA8FC0A for ; Sat, 12 May 2012 17:40:12 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q4CHeC4b083501 for ; Sat, 12 May 2012 17:40:12 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q4CHeCRD083500; Sat, 12 May 2012 17:40:12 GMT (envelope-from gnats) Date: Sat, 12 May 2012 17:40:12 GMT Message-Id: <201205121740.q4CHeCRD083500@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: Jeff Kletsky Cc: Subject: Re: kern/167685: [zfs] ZFS on USB drive prevents shutdown / reboot X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Jeff Kletsky List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 May 2012 17:40:13 -0000 The following reply was made to PR kern/167685; it has been noted by GNATS. From: Jeff Kletsky To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/167685: [zfs] ZFS on USB drive prevents shutdown / reboot Date: Sat, 12 May 2012 10:30:26 -0700 Not surprisingly: r229099 does *not* exhibit the symptom r229100 *does* exhibit the symptom # sysctl hw.usb.no_shutdown_wait=1 is confirmed as a workaround