From owner-freebsd-fs@FreeBSD.ORG Sun Sep 16 09:33:19 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 31EBF106566C for ; Sun, 16 Sep 2012 09:33:19 +0000 (UTC) (envelope-from c.kworr@gmail.com) Received: from mail-bk0-f54.google.com (mail-bk0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id A4A238FC14 for ; Sun, 16 Sep 2012 09:33:18 +0000 (UTC) Received: by bkcje9 with SMTP id je9so1837378bkc.13 for ; Sun, 16 Sep 2012 02:33:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=f2QpA+Z4v22Sdk0RmRT1MR2+WlYRw05fnzr4d7NPqFY=; b=Zc01qlzM7Wajge04pP3tks9nI9cNvmVGXAboihMvV4YHNKztXbcF0TItGwJmk3Gdys 68Vh0jUovmwi5NnDX8qZ5LqEOWtlKaOWdVkdv0+Cx4MuIiMjDyuhalmMMJA2yEKqgsK6 dhmT1YfKkRNJB8Uf2t3l4b9dERXcCHQkSaAq6DpA5jae7Mppl0IRBD0hSyGJGHhFhHp9 ZjEBiUwLGCM7I6V+tV3GITMfDBxu+VwHD4xoGkKmTf9BD5mFp225Xl4F5zbMmVV6qert tWhNB5BjhxGrzF/w64JmnXY/BFt8GYtpANgLLaBwVs6b04gP6QeIhi5inXL0HRLzobWH zSyA== Received: by 10.204.130.209 with SMTP id u17mr3312910bks.35.1347787997703; Sun, 16 Sep 2012 02:33:17 -0700 (PDT) Received: from limbo.xim.bz ([46.150.100.6]) by mx.google.com with ESMTPS id 25sm3354060bkx.9.2012.09.16.02.33.14 (version=SSLv3 cipher=OTHER); Sun, 16 Sep 2012 02:33:16 -0700 (PDT) Message-ID: <50559CD8.1070700@gmail.com> Date: Sun, 16 Sep 2012 12:33:12 +0300 From: Volodymyr Kostyrko User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:15.0) Gecko/20120911 Thunderbird/15.0.1 MIME-Version: 1.0 To: =?UTF-8?B?IlRob21hcyBHw7ZsbG5lciAoTmV3c2xldHRlciki?= References: <001a01cd900d$bcfcc870$36f65950$@goelli.de> <504F282D.8030808@gmail.com> <000a01cd90aa$0a277310$1e765930$@goelli.de> <5050461A.9050608@gmail.com> <000001cd9239$ed734c80$c859e580$@goelli.de> <5052EC5D.4060403@gmail.com> <000a01cd9274$0aa0bba0$1fe232e0$@goelli.de> <505322C9.70200@gmail.com> <000001cd9377$e9e9b010$bdbd1030$@goelli.de> In-Reply-To: <000001cd9377$e9e9b010$bdbd1030$@goelli.de> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-fs@freebsd.org Subject: Re: AW: AW: AW: AW: ZFS: Corrupted pool metadata after adding vdev to a pool - no opportunity to rescue data from healthy vdevs? Remove a vdev? Rewrite metadata? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2012 09:33:19 -0000 15.09.2012 22:25, Thomas Göllner (Newsletter) wrote: >>>>> I also think there is no way to write new or edit the lables of the discs? >>>> >>>> This idea is called Block Pointer Rewrite and is not implemented yet. I have found no code to do that. >>> >>> I thought it may come to this -.- Because during my last reading I had to learn, that I have to find the "root block pointer" to recover the maybe overwritten labels... As it changes place and content with each copy on write process (each txg?) it will be a search for the needle in the haystack... >> >> Not at all, what are you referring to is MOS and the one is contained in each UberBlock. > > So as this thing is so far beyond my skills, I am sad to point out that I have to give up here. Without someone who will take me by the hand and say what to do step by step I think recovering/rewriting the right labels of my discs is something I will not be able to do within one year or so. It's a pity that ZFS still has no tools for recovering metadata built in. This would be a task to think of in future. > > Thanks again for your help Volodymyr. It is a bit of consolation that at least I know now that I have done everything I could. If you can afford putting your drives aside you can try to wait before some tool occasionally emerges. I will not promise anything but I'm slowly making some progress with my script. I'm motivated about that as I have broken pool with photos. Trying to import that pool is causing a core dump on any system I tested like OpenSolaris, Illumos or SystemRescueCD. -- Sphinx of black quartz judge my vow. From owner-freebsd-fs@FreeBSD.ORG Sun Sep 16 21:41:33 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 845C6106566B for ; Sun, 16 Sep 2012 21:41:33 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 435628FC16 for ; Sun, 16 Sep 2012 21:41:32 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap4EAO5GVlCDaFvO/2dsb2JhbAA+BxaFcbcSgkqBCwINGQJfiBMLmSeOQ5F0gSGKIYU1gRIDlWKBFI8NgwKBPiIb X-IronPort-AV: E=Sophos;i="4.80,432,1344225600"; d="scan'208";a="179287787" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 16 Sep 2012 17:41:25 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id EDA5579451 for ; Sun, 16 Sep 2012 17:41:25 -0400 (EDT) Date: Sun, 16 Sep 2012 17:41:25 -0400 (EDT) From: Rick Macklem To: FS List Message-ID: <1531430179.669311.1347831685957.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Subject: testing/review of atomic export update patch X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2012 21:41:33 -0000 Hi, There is a simple patch at: http://people.freebsd.org/~rmacklem/atomic-export.patch that can be applied to a kernel + mountd, so that the new nfsd can be suspended by mountd while the exports are being reloaded. It adds a new "-S" flag to mountd to enable this. (This avoids the long standing bug where clients receive ESTALE replies to RPCs while mountd is reloading exports.) I am emailing to request testing and/or review of this patch by anyone who is interested. (One site has reported that the patch worked well for them and another is testing it as I type this.) Thanks in advance for any comments, rick From owner-freebsd-fs@FreeBSD.ORG Mon Sep 17 07:17:16 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9E61D106566B for ; Mon, 17 Sep 2012 07:17:16 +0000 (UTC) (envelope-from freebsd@pki2.com) Received: from btw.pki2.com (btw.pki2.com [IPv6:2001:470:a:6fd::2]) by mx1.freebsd.org (Postfix) with ESMTP id 6827A8FC08 for ; Mon, 17 Sep 2012 07:17:16 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by btw.pki2.com (8.14.5/8.14.5) with ESMTP id q8H7H6EU010887 for ; Mon, 17 Sep 2012 00:17:06 -0700 (PDT) (envelope-from freebsd@pki2.com) From: Dennis Glatting To: freebsd-fs@freebsd.org Content-Type: text/plain; charset="ISO-8859-1" Date: Mon, 17 Sep 2012 00:17:06 -0700 Message-ID: <1347866226.5619.1.camel@btw.pki2.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-yoursite-MailScanner-Information: Dennis Glatting X-yoursite-MailScanner-ID: q8H7H6EU010887 X-yoursite-MailScanner: Found to be clean X-MailScanner-From: freebsd@pki2.com Subject: How to clear this ZFS error? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Sep 2012 07:17:16 -0000 There was a system failure when I replaced the disk. How do I rid myself of 15107145887069556078? config: NAME STATE READ WRITE CKSUM disk-2 DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 replacing-0 DEGRADED 0 0 0 da5 ONLINE 0 0 0 15107145887069556078 OFFLINE 0 0 0 was /dev/da1.nop da0 ONLINE 0 0 0 da4 ONLINE 0 0 0 da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 cache ada3 ONLINE 0 0 0 errors: No known data errors From owner-freebsd-fs@FreeBSD.ORG Mon Sep 17 07:21:22 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 87A5F1065670 for ; Mon, 17 Sep 2012 07:21:22 +0000 (UTC) (envelope-from edho@myconan.net) Received: from mail-wg0-f42.google.com (mail-wg0-f42.google.com [74.125.82.42]) by mx1.freebsd.org (Postfix) with ESMTP id 101398FC0A for ; Mon, 17 Sep 2012 07:21:21 +0000 (UTC) Received: by wgbfm10 with SMTP id fm10so1948340wgb.1 for ; Mon, 17 Sep 2012 00:21:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=myconan.net; s=myconan; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=j3aqvucq86q1edr7tiBf3X0UkHotxw1ctGH2nUC0v9o=; b=gQX4k7ExLKPt6o315faNZ203OLeGY5yVvVcuI+UFb6Df2N5xwURvex6U6Ptxqz0qWG fu11fQyRZix0df4oBdJjgGB1UxSi5hsVYX9h3BEW4673hFiZZzdDhaButRUt0b4SQdXt pKeqa4qposngEYEjyMNXp39YQ+bBkISyDsIeo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:x-gm-message-state; bh=j3aqvucq86q1edr7tiBf3X0UkHotxw1ctGH2nUC0v9o=; b=MoKz01zLEADxdt1F46/qPqphPWVpgbzcFKds45UIiDAm2YVaR5p7DYWkggf7bS4qet JKrxktZ1r5XvVF0rRx1ZfKQITw6s3Mi/9x2jBgykxh1s0JDY9+iwaerFNoVhhJNs9cC6 2R54evLWMVdhvPsgnwumzpkNOvNXTt04L97AKWuhG7VgMkzBzc0rGIfMzc5OgvZKh562 tW9IzoAkIsAxwQjBEuPzXS0O1JJNSWQ6d3YVuFXYb4fDJXN7+cOSlbVz4RKaUbhpFcLA xlxrlZCeHf2AKgDfTfMLw1dwwByu90lCepbyjF90mvSkIN3TqhhjFzWI5Zq818imA4qu qK+Q== Received: by 10.180.102.136 with SMTP id fo8mr14047541wib.19.1347866480775; Mon, 17 Sep 2012 00:21:20 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.63.238 with HTTP; Mon, 17 Sep 2012 00:20:50 -0700 (PDT) In-Reply-To: <1347866226.5619.1.camel@btw.pki2.com> References: <1347866226.5619.1.camel@btw.pki2.com> From: Edho Arief Date: Mon, 17 Sep 2012 14:20:50 +0700 Message-ID: To: Dennis Glatting Content-Type: text/plain; charset=UTF-8 X-Gm-Message-State: ALoCoQl2PaOzuIEGbk65ksg/J7sxjvf2+q9vJ5N6tl1k1GadrpNyKO08ob/FImwGSkydWJ8TeL3A Cc: freebsd-fs@freebsd.org Subject: Re: How to clear this ZFS error? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Sep 2012 07:21:22 -0000 On Mon, Sep 17, 2012 at 2:17 PM, Dennis Glatting wrote: > There was a system failure when I replaced the disk. How do I rid > myself of 15107145887069556078? > have you tried `zpool detach`? From owner-freebsd-fs@FreeBSD.ORG Mon Sep 17 07:27:12 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D7C5A106566B for ; Mon, 17 Sep 2012 07:27:12 +0000 (UTC) (envelope-from freebsd@penx.com) Received: from btw.pki2.com (btw.pki2.com [IPv6:2001:470:a:6fd::2]) by mx1.freebsd.org (Postfix) with ESMTP id A32138FC08 for ; Mon, 17 Sep 2012 07:27:12 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by btw.pki2.com (8.14.5/8.14.5) with ESMTP id q8H7R5i1046512; Mon, 17 Sep 2012 00:27:05 -0700 (PDT) (envelope-from freebsd@penx.com) From: Dennis Glatting To: Edho Arief In-Reply-To: References: <1347866226.5619.1.camel@btw.pki2.com> Content-Type: text/plain; charset="us-ascii" Date: Mon, 17 Sep 2012 00:27:05 -0700 Message-ID: <1347866825.7373.2.camel@btw.pki2.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-yoursite-MailScanner-Information: Dennis Glatting X-yoursite-MailScanner-ID: q8H7R5i1046512 X-yoursite-MailScanner: Found to be clean X-MailScanner-From: freebsd@penx.com Cc: freebsd-fs@freebsd.org Subject: Re: How to clear this ZFS error? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Sep 2012 07:27:12 -0000 On Mon, 2012-09-17 at 14:20 +0700, Edho Arief wrote: > On Mon, Sep 17, 2012 at 2:17 PM, Dennis Glatting wrote: > > There was a system failure when I replaced the disk. How do I rid > > myself of 15107145887069556078? > > > > have you tried `zpool detach`? Extremely helpful. Worked. Thanks. From owner-freebsd-fs@FreeBSD.ORG Mon Sep 17 08:13:44 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D9FCC106566B for ; Mon, 17 Sep 2012 08:13:44 +0000 (UTC) (envelope-from mattblists@icritical.com) Received: from mail1.icritical.com (mail1.icritical.com [93.95.13.41]) by mx1.freebsd.org (Postfix) with SMTP id 322418FC15 for ; Mon, 17 Sep 2012 08:13:43 +0000 (UTC) Received: (qmail 23348 invoked from network); 17 Sep 2012 08:00:22 -0000 Received: from localhost (127.0.0.1) by mail1.icritical.com with SMTP; 17 Sep 2012 08:00:22 -0000 Received: (qmail 23339 invoked by uid 599); 17 Sep 2012 08:00:21 -0000 Received: from unknown (HELO PDC002.icritical.int) (212.57.254.146) by mail1.icritical.com (qpsmtpd/0.28) with ESMTP; Mon, 17 Sep 2012 09:00:21 +0100 Message-ID: <5056D896.8060607@icritical.com> Date: Mon, 17 Sep 2012 09:00:22 +0100 From: Matt Burke User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120906 Thunderbird/15.0 MIME-Version: 1.0 To: Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-TLS-Incoming: YES X-Virus-Scanned: by iCritical at mail1.icritical.com Cc: Subject: [patch] DTrace disk IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Sep 2012 08:13:45 -0000 I've recently been trying to get a grip on measuring disk IO latency (per transaction), and have found it to be rather difficult given the asynchronous nature of the beast, and also I can't find a way of translating the bio start of transaction timestamps to anything I can use in DTrace when pulling them out. So I knocked up this little patch against releng/9.1 to put a couple of DTrace probes in the right places to pick up crucial data like the now+then timestamps while they're present. The predicate on the probe is needed to pick up the right firing - for reasons I've not been able to fathom because gstat et al give correct data, devstat_end_transaction is called multiple times for a given operation - from g_disk_done(), then g_io_deliver() - without anything useful in the bio struct (device name, number, etc). There also seem to be a lot of firings coming from the following path which I don't understand, again without anything useful in the bio: kernel`devstat_end_transaction+0x13b kernel`g_io_deliver+0x1b0 kernel`g_io_schedule_up+0xa6 kernel`g_up_procbody+0x5c kernel`fork_exit+0x11f kernel`0xffffffff80c1c3fe Catching flushes is also proving problematic. It would seem that devstat_end_transaction_bio() is called, but the bio and devstat structs are virtually empty. bp->bio_dev, bp->bio_disk, ds->device_name, ds->device_number, ds_unit_number are all null/empty, so I know that one disk has flushed, and I know how long it took, but I can't find out which disk it was. Thoughts? Index: sys/kern/subr_devstat.c =================================================================== --- sys/kern/subr_devstat.c (revision 240481) +++ sys/kern/subr_devstat.c (working copy) @@ -29,6 +29,7 @@ #include __FBSDID("$FreeBSD$"); +#include "opt_kdtrace.h" #include #include #include @@ -41,9 +42,22 @@ #include #include #include +#include #include +SDT_PROVIDER_DEFINE(devstat); +SDT_PROBE_DEFINE(devstat, subr_devstat, devstat_end_transaction, stat, stat); +SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction, stat, 0, "struct devstat *"); +SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction, stat, 1, "uint32_t"); +SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction, stat, 2, "struct bintime *"); +SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction, stat, 3, "struct bintime *"); +SDT_PROBE_DEFINE(devstat, subr_devstat, devstat_end_transaction_bio, stat, stat); +SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction_bio, stat, 0, "struct devstat *"); +SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction_bio, stat, 1, "struct bio *"); + + + static int devstat_num_devs; static long devstat_generation = 1; static int devstat_version = DEVSTAT_VERSION; @@ -312,6 +326,8 @@ ds->end_count++; atomic_add_rel_int(&ds->sequence0, 1); + + SDT_PROBE(devstat, subr_devstat, devstat_end_transaction, stat, ds, bytes, now, then, 0); } void @@ -332,6 +348,8 @@ else flg = DEVSTAT_NO_DATA; + SDT_PROBE(devstat, subr_devstat, devstat_end_transaction_bio, stat, ds, bp, 0, 0, 0); + devstat_end_transaction(ds, bp->bio_bcount - bp->bio_resid, DEVSTAT_TAG_SIMPLE, flg, NULL, &bp->bio_t0); } Sample dtrace script: ===================== BEGIN { bio_cmds[1] = "READ"; bio_cmds[2] = "WRITE"; bio_cmds[4] = "DELETE"; bio_cmds[8] = "GETATTR"; bio_cmds[16] = "FLUSH"; } devstat::devstat_end_transaction_bio: { self->bio = args[1]; } devstat::devstat_end_transaction: /self->bio && args[0]->device_number/ { diff_frac = args[2]->frac - args[3]->frac; diff_ufrac = (diff_frac < 0) ? (args[3]->frac - args[2]->frac) : diff_frac; diff = (1000000000*(diff_ufrac>>32))>>32; printf("%d\t%s%d\t%s\t%d\t0.%09d\n", timestamp, args[0]->device_name, args[0]->unit_number, bio_cmds[self->bio->bio_cmd], args[1], diff ); } -- The information contained in this message is confidential and intended for the addressee only. If you have received this message in error, or there are any problems with its content, please contact the sender. iCritical is a trading name of Critical Software Ltd. Registered in England: 04909220. Registered Office: IC2, Keele Science Park, Keele, Staffordshire, ST5 5NH. This message has been scanned for security threats by iCritical. www.icritical.com From owner-freebsd-fs@FreeBSD.ORG Mon Sep 17 11:07:05 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 28C081065689 for ; Mon, 17 Sep 2012 11:07:05 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 110628FC15 for ; Mon, 17 Sep 2012 11:07:05 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q8HB751c004430 for ; Mon, 17 Sep 2012 11:07:05 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q8HB74uD004428 for freebsd-fs@FreeBSD.org; Mon, 17 Sep 2012 11:07:04 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 17 Sep 2012 11:07:04 GMT Message-Id: <201209171107.q8HB74uD004428@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Sep 2012 11:07:05 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/171415 fs [zfs] zfs recv fails with "cannot receive incremental o kern/170945 fs [gpt] disk layout not portable between direct connect o kern/170914 fs [zfs] [patch] Import patchs related with issues 3090 a o kern/170912 fs [zfs] [patch] unnecessarily setting DS_FLAG_INCONSISTE o bin/170778 fs [zfs] [panic] FreeBSD panics randomly o kern/170680 fs [nfs] Multiple NFS Client bug in the FreeBSD 7.4-RELEA o kern/170497 fs [xfs][panic] kernel will panic whenever I ls a mounted o kern/170238 fs [zfs] [panic] Panic when deleting data o kern/169945 fs [zfs] [panic] Kernel panic while importing zpool (afte o kern/169480 fs [zfs] ZFS stalls on heavy I/O o kern/169398 fs [zfs] Can't remove file with permanent error o kern/169339 fs panic while " : > /etc/123" o kern/169319 fs [zfs] zfs resilver can't complete o kern/168947 fs [nfs] [zfs] .zfs/snapshot directory is messed up when o kern/168942 fs [nfs] [hang] nfsd hangs after being restarted (not -HU o kern/168158 fs [zfs] incorrect parsing of sharenfs options in zfs (fs o kern/167979 fs [ufs] DIOCGDINFO ioctl does not work on 8.2 file syste o kern/167977 fs [smbfs] mount_smbfs results are differ when utf-8 or U o kern/167688 fs [fusefs] Incorrect signal handling with direct_io o kern/167685 fs [zfs] ZFS on USB drive prevents shutdown / reboot o kern/167612 fs [portalfs] The portal file system gets stuck inside po o kern/167272 fs [zfs] ZFS Disks reordering causes ZFS to pick the wron o kern/167260 fs [msdosfs] msdosfs disk was mounted the second time whe o kern/167109 fs [zfs] [panic] zfs diff kernel panic Fatal trap 9: gene o kern/167105 fs [nfs] mount_nfs can not handle source exports wiht mor o kern/167067 fs [zfs] [panic] ZFS panics the server o kern/167066 fs [zfs] ZVOLs not appearing in /dev/zvol o kern/167065 fs [zfs] boot fails when a spare is the boot disk o kern/167048 fs [nfs] [patch] RELEASE-9 crash when using ZFS+NULLFS+NF o kern/166912 fs [ufs] [panic] Panic after converting Softupdates to jo o kern/166851 fs [zfs] [hang] Copying directory from the mounted UFS di o kern/166477 fs [nfs] NFS data corruption. o kern/165950 fs [ffs] SU+J and fsck problem o kern/165923 fs [nfs] Writing to NFS-backed mmapped files fails if flu o kern/165521 fs [zfs] [hang] livelock on 1 Gig of RAM with zfs when 31 o kern/165392 fs Multiple mkdir/rmdir fails with errno 31 o kern/165087 fs [unionfs] lock violation in unionfs o kern/164472 fs [ufs] fsck -B panics on particular data inconsistency o kern/164370 fs [zfs] zfs destroy for snapshot fails on i386 and sparc o kern/164261 fs [nullfs] [patch] fix panic with NFS served from NULLFS o kern/164256 fs [zfs] device entry for volume is not created after zfs o kern/164184 fs [ufs] [panic] Kernel panic with ufs_makeinode o kern/163801 fs [md] [request] allow mfsBSD legacy installed in 'swap' o kern/163770 fs [zfs] [hang] LOR between zfs&syncer + vnlru leading to o kern/163501 fs [nfs] NFS exporting a dir and a subdir in that dir to o kern/162944 fs [coda] Coda file system module looks broken in 9.0 o kern/162860 fs [zfs] Cannot share ZFS filesystem to hosts with a hyph o kern/162751 fs [zfs] [panic] kernel panics during file operations o kern/162591 fs [nullfs] cross-filesystem nullfs does not work as expe o kern/162519 fs [zfs] "zpool import" relies on buggy realpath() behavi o kern/162362 fs [snapshots] [panic] ufs with snapshot(s) panics when g o kern/161968 fs [zfs] [hang] renaming snapshot with -r including a zvo p kern/161897 fs [zfs] [patch] zfs partition probing causing long delay o kern/161864 fs [ufs] removing journaling from UFS partition fails on o bin/161807 fs [patch] add option for explicitly specifying metadata o kern/161579 fs [smbfs] FreeBSD sometimes panics when an smb share is o kern/161533 fs [zfs] [panic] zfs receive panic: system ioctl returnin o kern/161438 fs [zfs] [panic] recursed on non-recursive spa_namespace_ o kern/161424 fs [nullfs] __getcwd() calls fail when used on nullfs mou o kern/161280 fs [zfs] Stack overflow in gptzfsboot o kern/161205 fs [nfs] [pfsync] [regression] [build] Bug report freebsd o kern/161169 fs [zfs] [panic] ZFS causes kernel panic in dbuf_dirty o kern/161112 fs [ufs] [lor] filesystem LOR in FreeBSD 9.0-BETA3 o kern/160893 fs [zfs] [panic] 9.0-BETA2 kernel panic o kern/160860 fs [ufs] Random UFS root filesystem corruption with SU+J o kern/160801 fs [zfs] zfsboot on 8.2-RELEASE fails to boot from root-o o kern/160790 fs [fusefs] [panic] VPUTX: negative ref count with FUSE o kern/160777 fs [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/impo o kern/160706 fs [zfs] zfs bootloader fails when a non-root vdev exists o kern/160591 fs [zfs] Fail to boot on zfs root with degraded raidz2 [r o kern/160410 fs [smbfs] [hang] smbfs hangs when transferring large fil o kern/160283 fs [zfs] [patch] 'zfs list' does abort in make_dataset_ha o kern/159930 fs [ufs] [panic] kernel core o kern/159402 fs [zfs][loader] symlinks cause I/O errors o kern/159357 fs [zfs] ZFS MAXNAMELEN macro has confusing name (off-by- o kern/159356 fs [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s o kern/159351 fs [nfs] [patch] - divide by zero in mountnfs() o kern/159251 fs [zfs] [request]: add FLETCHER4 as DEDUP hash option o kern/159077 fs [zfs] Can't cd .. with latest zfs version o kern/159048 fs [smbfs] smb mount corrupts large files o kern/159045 fs [zfs] [hang] ZFS scrub freezes system o kern/158839 fs [zfs] ZFS Bootloader Fails if there is a Dead Disk o kern/158802 fs amd(8) ICMP storm and unkillable process. o kern/158231 fs [nullfs] panic on unmounting nullfs mounted over ufs o f kern/157929 fs [nfs] NFS slow read o kern/157399 fs [zfs] trouble with: mdconfig force delete && zfs strip o kern/157179 fs [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov o kern/156797 fs [zfs] [panic] Double panic with FreeBSD 9-CURRENT and o kern/156781 fs [zfs] zfs is losing the snapshot directory, p kern/156545 fs [ufs] mv could break UFS on SMP systems o kern/156193 fs [ufs] [hang] UFS snapshot hangs && deadlocks processes o kern/156039 fs [nullfs] [unionfs] nullfs + unionfs do not compose, re o kern/155615 fs [zfs] zfs v28 broken on sparc64 -current o kern/155587 fs [zfs] [panic] kernel panic with zfs p kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 p kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153520 fs [zfs] Boot from GPT ZFS root on HP BL460c G1 unstable o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o bin/153142 fs [zfs] ls -l outputs `ls: ./.zfs: Operation not support o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/151111 fs [zfs] vnodes leakage during zfs unmount o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " p kern/147560 fs [zfs] [boot] Booting 8.1-PRERELEASE raidz system take o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an f bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o kern/118318 fs [nfs] NFS server hangs under special circumstances o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118126 fs [nfs] [patch] Poor NFS server write performance o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 289 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Sep 17 12:23:34 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 267F21065670 for ; Mon, 17 Sep 2012 12:23:34 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 935128FC08 for ; Mon, 17 Sep 2012 12:23:32 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q8HCNcqi057349; Mon, 17 Sep 2012 15:23:38 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q8HCNP15037012; Mon, 17 Sep 2012 15:23:25 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q8HCNPGK037011; Mon, 17 Sep 2012 15:23:25 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 17 Sep 2012 15:23:25 +0300 From: Konstantin Belousov To: Rick Macklem Message-ID: <20120917122325.GR37286@deviant.kiev.zoral.com.ua> References: <1531430179.669311.1347831685957.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="vk2EvGhio7iZz8DU" Content-Disposition: inline In-Reply-To: <1531430179.669311.1347831685957.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Mutt/1.5.21 (2010-09-15) X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: FS List Subject: Re: testing/review of atomic export update patch X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Sep 2012 12:23:34 -0000 --vk2EvGhio7iZz8DU Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Sep 16, 2012 at 05:41:25PM -0400, Rick Macklem wrote: > Hi, >=20 > There is a simple patch at: > http://people.freebsd.org/~rmacklem/atomic-export.patch > that can be applied to a kernel + mountd, so that the new > nfsd can be suspended by mountd while the exports are being > reloaded. It adds a new "-S" flag to mountd to enable this. > (This avoids the long standing bug where clients receive ESTALE > replies to RPCs while mountd is reloading exports.) This looks simple, but also somewhat worrisome. What would happen if the mountd crashes after nfsd suspension is requested, but before resume was performed ? Might be, mountd should check for suspended nfsd on start and unsuspend it, if some flag is specified ? --vk2EvGhio7iZz8DU Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAlBXFjwACgkQC3+MBN1Mb4h/OACeIEjMZo6AWDlO0dSHDCrkncG6 oZYAnjVapZW44ulwTmWudOhlwpCCFUEF =U8MR -----END PGP SIGNATURE----- --vk2EvGhio7iZz8DU-- From owner-freebsd-fs@FreeBSD.ORG Mon Sep 17 12:34:24 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 751C0106564A for ; Mon, 17 Sep 2012 12:34:24 +0000 (UTC) (envelope-from olivier@gid0.org) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id EF0908FC08 for ; Mon, 17 Sep 2012 12:34:23 +0000 (UTC) Received: by lbbgg13 with SMTP id gg13so5094755lbb.13 for ; Mon, 17 Sep 2012 05:34:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type :x-gm-message-state; bh=tg2K/0LTxJQ0+88i4mopVB0VWnoKZIz7tbV6QfvNMBw=; b=A7kDnc7WIJlXUsYQQm50X+aUiug3aP64rWzuJ19swobFk3jo9YOzUqUMu+w9pztthr 5m7T9Au5dyOF37H5cjBQ7uhwnqUjnygnBtyQopGobEkuKGsbDoXOoh3DVOWQQcwjnd31 a+8o7KQ+epxnPBBfJdPIDEZZ8QQ/0ftp4ZUTeX5SyqylwuzkmKa4MYv0Ab4hTPkO91XU gxnfY7+bnEp4JoY4kkbeLmXhbJOSkiwQqJqCC4pxaB3D980KeA/MZTDaCB4MK74FlROX pqx673CgTd77dqUnv6CO3fxuqrgUGuibGREtb19XLS8psHDW2jdPE0h4Zte8gT7eNYD6 5zIA== MIME-Version: 1.0 Received: by 10.152.113.165 with SMTP id iz5mr4756173lab.48.1347885262612; Mon, 17 Sep 2012 05:34:22 -0700 (PDT) Received: by 10.112.2.36 with HTTP; Mon, 17 Sep 2012 05:34:22 -0700 (PDT) Date: Mon, 17 Sep 2012 14:34:22 +0200 Message-ID: From: Olivier Smedts To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQkMT4qu+qvyeE6UAJ0e+RFcr3axUalvUXwG0pMRFAdH9fN5P0KpjLOd9b9yLsBwvpxZVWcw Subject: zpool add log to root pool X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Sep 2012 12:34:24 -0000 Hello ZFS folks, Is there anyone here using a separate log device (or "ZIL") on a root pool ? # zpool add tank log gpt/zil cannot add to 'tank': root pool can not have multiple vdevs or separate logs Under 9-STABLE, using zpool v28. This seems to be a limitation from OpenSolaris. For example, FreeBSD supports booting from a multiple-vdev root pool. I found that most people use the "unset bootfs property, add vdev, set bootfs again" trick to have a working multiple-vdev root pool under FreeBSD. I think I can do the same for the log device but don't want to loose my data. Is there anyone successfuly using a log device / zil on a root pool under FreeBSD ? Thanks -- Olivier Smedts _ ASCII ribbon campaign ( ) e-mail: olivier@gid0.org - against HTML email & vCards X www: http://www.gid0.org - against proprietary attachments / \ "Il y a seulement 10 sortes de gens dans le monde : ceux qui comprennent le binaire, et ceux qui ne le comprennent pas." From owner-freebsd-fs@FreeBSD.ORG Mon Sep 17 14:33:11 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 70987106566B for ; Mon, 17 Sep 2012 14:33:11 +0000 (UTC) (envelope-from c.kworr@gmail.com) Received: from mail-bk0-f54.google.com (mail-bk0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id EDFF58FC08 for ; Mon, 17 Sep 2012 14:33:10 +0000 (UTC) Received: by bkcje9 with SMTP id je9so2367918bkc.13 for ; Mon, 17 Sep 2012 07:33:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=fN2bMV5GyphMiQ/9HcUHSCtvxEHat8/7jG2jkyCEPDM=; b=vLsa3oT64EX94grrKS3RCbTdePJVGFuGgp/lwp+q8QTVToydVKmhRjL6pugf79PNSA 2tMEagmHHxCnKG1VnAdUIKmPDcn9ajnfdjvR4txemQkT8G2p9iF+EIkNR1T2bC+B41n2 M0MQdm3D4DzNbReWiPZb9BwvJj/YEK6Xs6ObZwxvGVz9an9r1uJBbVv1P8UBnvN0Cg8P a+kjYXW46jNAomDQJsnzY0Box/2GoBVwucy7p0QrN/pK/DsYFNNHf1JNIHFAqnRaVAEs M4K7W5N9dwH1WlHU+RaSZ/RSpQR8dMBIFBYXJSKzREcDfX4DgShty7OrwejcpcM06oJ5 E7mQ== Received: by 10.204.11.209 with SMTP id u17mr1977856bku.130.1347892389542; Mon, 17 Sep 2012 07:33:09 -0700 (PDT) Received: from green.local (227-7-132-95.pool.ukrtel.net. [95.132.7.227]) by mx.google.com with ESMTPS id a17sm2555331bkw.5.2012.09.17.07.33.07 (version=SSLv3 cipher=OTHER); Mon, 17 Sep 2012 07:33:08 -0700 (PDT) Message-ID: <505734A1.9000501@gmail.com> Date: Mon, 17 Sep 2012 17:33:05 +0300 From: Volodymyr Kostyrko User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120911 Thunderbird/15.0.1 MIME-Version: 1.0 To: Olivier Smedts References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: zpool add log to root pool X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Sep 2012 14:33:11 -0000 17.09.2012 15:34, Olivier Smedts wrote: > Is there anyone here using a separate log device (or "ZIL") on a root pool ? # zpool set bootfs= tank > # zpool add tank log gpt/zil > cannot add to 'tank': root pool can not have multiple vdevs or separate logs > > Under 9-STABLE, using zpool v28. This seems to be a limitation from > OpenSolaris. For example, FreeBSD supports booting from a > multiple-vdev root pool. I found that most people use the "unset > bootfs property, add vdev, set bootfs again" trick to have a working > multiple-vdev root pool under FreeBSD. I think I can do the same for > the log device but don't want to loose my data. > > Is there anyone successfuly using a log device / zil on a root pool > under FreeBSD ? Me. # zpool iostat -v faz0 capacity operations bandwidth pool alloc free read write read write -------------------------------------- ----- ----- ----- ----- ----- ----- faz0 121G 173G 22 149 130K 671K mirror 121G 173G 22 149 130K 659K gptid/b88daece-7a48-11df-8703-0018f36885d5 - - 10 56 111K 660K gptid/23ddb9f0-7b04-11df-8867-0018f36885d5 - - 10 56 111K 660K logs - - - - - - gptid/3592d260-c98e-11e0-9ef6-0018f36885d5 1,86M 1014M 0 0 0 11,6K cache - - - - - - gptid/3809bef7-c98e-11e0-9ef6-0018f36885d5 36,3G 8M 71 8 653K 374K -------------------------------------- ----- ----- ----- ----- ----- ----- # zpool get bootfs faz0 NAME PROPERTY VALUE SOURCE faz0 bootfs - default -- Sphinx of black quartz judge my vow. From owner-freebsd-fs@FreeBSD.ORG Mon Sep 17 16:29:45 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 853E710656B7 for ; Mon, 17 Sep 2012 16:29:45 +0000 (UTC) (envelope-from Newsletter@goelli.de) Received: from mo6-p05-ob.rzone.de (mo6-p05-ob.rzone.de [IPv6:2a01:238:20a:202:5305::1]) by mx1.freebsd.org (Postfix) with ESMTP id DA8F68FC29 for ; Mon, 17 Sep 2012 16:29:44 +0000 (UTC) X-RZG-CLASS-ID: mo05 X-RZG-AUTH: :ImkTZkytb+s5KUDumTG4i0mGDH1K4fweaf9O+/5rQT5pvsrb4VLk35Jv6Ak/ChY= Received: from goelliNotebook (dslb-094-219-102-167.pools.arcor-ip.net [94.219.102.167]) by smtp.strato.de (jored mo30) (RZmta 30.14 DYNA|AUTH) with ESMTPA id n00393o8HG55ru ; Mon, 17 Sep 2012 18:29:43 +0200 (CEST) From: =?utf-8?Q?Thomas_G=C3=B6llner_=28Newsletter=29?= To: "'Volodymyr Kostyrko'" References: <001a01cd900d$bcfcc870$36f65950$@goelli.de> <504F282D.8030808@gmail.com> <000a01cd90aa$0a277310$1e765930$@goelli.de> <5050461A.9050608@gmail.com> <000001cd9239$ed734c80$c859e580$@goelli.de> <5052EC5D.4060403@gmail.com> <000a01cd9274$0aa0bba0$1fe232e0$@goelli.de> <505322C9.70200@gmail.com> <000001cd9377$e9e9b010$bdbd1030$@goelli.de> <50559CD8.1070700@gmail.com> In-Reply-To: <50559CD8.1070700@gmail.com> Date: Mon, 17 Sep 2012 18:29:42 +0200 Message-ID: <000001cd94f1$a4157030$ec405090$@goelli.de> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQJeoooVG2eNN70r/NZmey5KTcQ1/AH9GaMbAUqasjACcrqh/gFBByGfA1g04pUCQLd3HAGGeVDSATGcaCEBYrMtgJXnIg+g Content-Language: de Cc: freebsd-fs@freebsd.org Subject: AW: AW: AW: AW: AW: ZFS: Corrupted pool metadata after adding vdev to a pool - no opportunity to rescue data from healthy vdevs? Remove a vdev? Rewrite metadata? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Sep 2012 16:29:45 -0000 > If you can afford putting your drives aside you can try to wait before = some tool occasionally emerges. I will not promise anything > but I'm slowly making some progress with my script. I'm motivated = about that as I have broken pool with photos. Trying to import > that pool is causing a core dump on any system I tested like = OpenSolaris, Illumos or SystemRescueCD. It would be great if you script would be able to deal with pools with = broken labels. I will put the three 3TB disks aside and use the old = 1.5TB disks instead. So if there is some progress in your script or = someone else is gonna write some tool for restoring labels or reading = data of broken pools, perhaps I can get some data back. I think it would = take some time to get this fresh 3TB pool full ;-) This would also solve the next problem I discovered... These 1.5TB disks have 512byte sectors. I have one spare. If the second = disk falls out, first I thought, I will replace it with a 4TB disk and = so on until I have replaced all of them. So I can expand the pool. But = as I read now, this is not possible, isn't it? Because the 4TB drives = would have 4k sectors. From owner-freebsd-fs@FreeBSD.ORG Mon Sep 17 20:09:34 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C72C3106564A for ; Mon, 17 Sep 2012 20:09:34 +0000 (UTC) (envelope-from spork@bway.net) Received: from xena.bway.net (xena.bway.net [216.220.96.26]) by mx1.freebsd.org (Postfix) with ESMTP id 5E6CA8FC15 for ; Mon, 17 Sep 2012 20:09:34 +0000 (UTC) Received: (qmail 89294 invoked by uid 0); 17 Sep 2012 20:02:53 -0000 Received: from smtp.bway.net (216.220.96.25) by xena.bway.net with ESMTPS (DHE-RSA-AES256-SHA encrypted); 17 Sep 2012 20:02:53 -0000 Received: (qmail 89283 invoked by uid 90); 17 Sep 2012 20:02:52 -0000 Received: from unknown (HELO frankentosh.sporklab.com) (spork@96.57.144.66) by smtp.bway.net with ESMTPA; 17 Sep 2012 20:02:52 -0000 Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Charles Sprickman In-Reply-To: Date: Mon, 17 Sep 2012 16:02:51 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <9A49FDEB-325E-4A25-8FB9-C4FF8F9BAF67@bway.net> References: To: Olivier Smedts X-Mailer: Apple Mail (2.1084) Cc: freebsd-fs@freebsd.org Subject: Re: zpool add log to root pool X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Sep 2012 20:09:34 -0000 On Sep 17, 2012, at 8:34 AM, Olivier Smedts wrote: > Hello ZFS folks, >=20 > Is there anyone here using a separate log device (or "ZIL") on a root = pool ? >=20 > # zpool add tank log gpt/zil > cannot add to 'tank': root pool can not have multiple vdevs or = separate logs >=20 > Under 9-STABLE, using zpool v28. This seems to be a limitation from > OpenSolaris. For example, FreeBSD supports booting from a > multiple-vdev root pool. I found that most people use the "unset > bootfs property, add vdev, set bootfs again" trick to have a working > multiple-vdev root pool under FreeBSD. I did that and it seems to work. pool: zroot state: ONLINE scan: scrub repaired 0 in 0h0m with 0 errors on Tue Aug 21 01:28:27 = 2012 config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada1p3 ONLINE 0 0 0 ada2p3 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 ada3p3 ONLINE 0 0 0 ada0p3 ONLINE 0 0 0 logs mirror-2 ONLINE 0 0 0 gpt/zil-a ONLINE 0 0 0 gpt/zil-b ONLINE 0 0 0 cache gpt/l2arc-a ONLINE 0 0 0 gpt/l2arc-b ONLINE 0 0 0 errors: No known data errors Lost my /dev/gpt entries for the existing mirror slices in the process = though... > I think I can do the same for > the log device but don't want to loose my data. >=20 > Is there anyone successfuly using a log device / zil on a root pool > under FreeBSD ? Mine works, but I've never been able to find any official confirmation = from the -fs folks as to how "proper" or supported the configuration is. = Not too many other options on 1U boxes though really... Charles > Thanks >=20 > --=20 > Olivier Smedts _ > ASCII ribbon campaign ( ) > e-mail: olivier@gid0.org - against HTML email & vCards X > www: http://www.gid0.org - against proprietary attachments / \ >=20 > "Il y a seulement 10 sortes de gens dans le monde : > ceux qui comprennent le binaire, > et ceux qui ne le comprennent pas." > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Mon Sep 17 20:36:22 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id AF4FC106566B for ; Mon, 17 Sep 2012 20:36:22 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-lpp01m010-f54.google.com (mail-lpp01m010-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id 30B0A8FC14 for ; Mon, 17 Sep 2012 20:36:21 +0000 (UTC) Received: by lage12 with SMTP id e12so5504761lag.13 for ; Mon, 17 Sep 2012 13:36:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=g9s7g5Kt3InPSjhFv+koR36b8pu+Tr0j6FZYjBpvNaY=; b=NibPrDQuuRBSNiwYhelp/o2GcM0mwYxY5Z0x7aXJk73U43Dl+4TfvVkoIB1BmfnZMA UNav0me5I77kQB1ZtNqL91rL/ppuDRAmeVhKo5BAlLoHnUY+o4NkKle/pX7Md9VEvDmx tLvnU+hAcuvyUxGI6y4jFpJ47LcglPbwM0uyEhfaHPvHGOO06Tkn2TkVrHCPAohbXp89 0cDmwFreektfpUFvdia9nLSNxuH5fQOGmewCiJrf3hwFKPIQj6PZRv9ExcIL7eG6fr7o pSPewPwPPBG5QmPxC+ZAMq3n+xSlRoKn221NOjT7DnCJ3aMEOhsd8pFFoJF0TbtyXdP2 dffw== MIME-Version: 1.0 Received: by 10.152.110.9 with SMTP id hw9mr10638390lab.55.1347914180412; Mon, 17 Sep 2012 13:36:20 -0700 (PDT) Received: by 10.114.23.230 with HTTP; Mon, 17 Sep 2012 13:36:20 -0700 (PDT) In-Reply-To: <9A49FDEB-325E-4A25-8FB9-C4FF8F9BAF67@bway.net> References: <9A49FDEB-325E-4A25-8FB9-C4FF8F9BAF67@bway.net> Date: Mon, 17 Sep 2012 13:36:20 -0700 Message-ID: From: Freddie Cash To: Charles Sprickman Content-Type: text/plain; charset=UTF-8 Cc: FreeBSD Filesystems Subject: Re: zpool add log to root pool X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Sep 2012 20:36:22 -0000 On Mon, Sep 17, 2012 at 1:02 PM, Charles Sprickman wrote: > I did that and it seems to work. > > pool: zroot > state: ONLINE > scan: scrub repaired 0 in 0h0m with 0 errors on Tue Aug 21 01:28:27 2012 > config: > > NAME STATE READ WRITE CKSUM > zroot ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > ada1p3 ONLINE 0 0 0 > ada2p3 ONLINE 0 0 0 > mirror-1 ONLINE 0 0 0 > ada3p3 ONLINE 0 0 0 > ada0p3 ONLINE 0 0 0 > logs > mirror-2 ONLINE 0 0 0 > gpt/zil-a ONLINE 0 0 0 > gpt/zil-b ONLINE 0 0 0 > cache > gpt/l2arc-a ONLINE 0 0 0 > gpt/l2arc-b ONLINE 0 0 0 > > errors: No known data errors > > Lost my /dev/gpt entries for the existing mirror slices in the process though... Have you tried booting from a LiveCD (like Frenzy or the 9.0 installer), doing an import of the pool, and export of the pool, and then a "zpool import -d /dev/gpt zroot"? You may be able to skip the iniatial import/export. -- Freddie Cash fjwcash@gmail.com From owner-freebsd-fs@FreeBSD.ORG Mon Sep 17 21:33:55 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0053A1065673 for ; Mon, 17 Sep 2012 21:33:54 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id D796E8FC15 for ; Mon, 17 Sep 2012 21:33:53 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EAECWV1CDaFvO/2dsb2JhbAA+BxaFcbchgiABAQUjBFIbDgoCAg0ZAlkGiBMLp1SSc4EhigAhhTWBEgOVYoEUjw2DAoE+Ihs X-IronPort-AV: E=Sophos;i="4.80,439,1344225600"; d="scan'208";a="182014531" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 17 Sep 2012 17:32:44 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id C46C3B3EFE; Mon, 17 Sep 2012 17:32:44 -0400 (EDT) Date: Mon, 17 Sep 2012 17:32:44 -0400 (EDT) From: Rick Macklem To: Konstantin Belousov Message-ID: <1777840817.743780.1347917564789.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20120917122325.GR37286@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: FS List Subject: Re: testing/review of atomic export update patch X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Sep 2012 21:33:55 -0000 Konstantin Belousov wrote: > On Sun, Sep 16, 2012 at 05:41:25PM -0400, Rick Macklem wrote: > > Hi, > > > > There is a simple patch at: > > http://people.freebsd.org/~rmacklem/atomic-export.patch > > that can be applied to a kernel + mountd, so that the new > > nfsd can be suspended by mountd while the exports are being > > reloaded. It adds a new "-S" flag to mountd to enable this. > > (This avoids the long standing bug where clients receive ESTALE > > replies to RPCs while mountd is reloading exports.) > > This looks simple, but also somewhat worrisome. What would happen > if the mountd crashes after nfsd suspension is requested, but before > resume was performed ? > > Might be, mountd should check for suspended nfsd on start and > unsuspend > it, if some flag is specified ? Well, I think that happens with the patch as it stands. suspend is done if the "-S" option is specified, but that is a no op if it is already suspended. The resume is done no matter what flags are provided, so mountd will always try and do a "resume". --> get_exportlist() is always called when mountd is started up and it does the resume unconditionally when it completes. If mountd repeatedly crashes before completing get_exportlist() when it is started up, the exports will be all messed up, so having the nfsd threads suspended doesn't seem so bad for this case (which hopefully never happens;-). Both suspend and resume are just no ops for unpatched kernels. Maybe the comment in front of "resume" should explicitly explain this, instead of saying resume is harmless to do under all conditions? Thanks for looking at it, rick From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 01:08:12 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 821) id 66E1A106566C; Tue, 18 Sep 2012 01:08:12 +0000 (UTC) Date: Tue, 18 Sep 2012 01:08:12 +0000 From: John To: FreeBSD FS Message-ID: <20120918010812.GA71005@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i Subject: XFS/istgt backed 8TB xfs filesystem configuration? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 01:08:12 -0000 Hi Folks, I've been asked to export an 8TB volume (via istgt) and was curious if anyone has any experience with optimal xfs configurations in this area. Typically, volumes for linux are created similar to: zfs create -b 32768 -V $lunsize $physname On a dual 10g data backbone network, we use mpio with 2 channels per net: [PortalGroup4] Comment "Two networks - Two ports" Portal DA1 10.59.10.10:5000 Portal DA2 10.60.10.10:5000 Portal DA3 10.59.10.10:5001 Portal DA4 10.60.10.10:5001 Comment "END: PortalGroup4" which typically seems to give the best performance. The luns are being brought together on the linux side (RHEL 6.1) with multipath. I've google'd around a bit and don't see much about zfs filesystems on top of iscsi exported zfs volumes :-) Anyone have any experience in this area? Suggestions? I'm told the data patterns will be mostly database reads, minimal writes. Thanks, John From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 06:10:01 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 33A74106566C for ; Tue, 18 Sep 2012 06:10:01 +0000 (UTC) (envelope-from c.kworr@gmail.com) Received: from mail-bk0-f54.google.com (mail-bk0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id A54618FC0A for ; Tue, 18 Sep 2012 06:10:00 +0000 (UTC) Received: by bkcje9 with SMTP id je9so2721386bkc.13 for ; Mon, 17 Sep 2012 23:09:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=B2Bp3D2dEOgVe6RlSyz9Ivmpn52RqYb/RdnIJaW9e5U=; b=zvAxJh+Fg6W1UCxQrPSyoMDc3r3n0QNNGdiffQIGWbkmji3dIh0Y0hdyKCwYAGs3Bz As4zfdn+jLpKSMgpXSLokQdbxvsnRfTNRrgbuE2XK9DWIXnygKLyKjFw49cqybNB6vF0 Ve1xrper8GjuQ3n4elu90NYPkUPxdosDkk5SMSrdQQEkGDl0dHcHOtxJyzmhssxEg+AD fKZdWUillqIgeb3i0uW4LXmrVyOrmGfm0YOmj8bRl7OMdQoZ9JWImvqpsWb5NjUuebct ifhiv7nOQQWNFVZUDjc9a5fvwNYXRdPCP9HilQ16LTsr3nRgGf9T9mFPzvBzf3N6Nndl i/Fg== Received: by 10.204.156.18 with SMTP id u18mr1825406bkw.131.1347948599526; Mon, 17 Sep 2012 23:09:59 -0700 (PDT) Received: from limbo.xim.bz ([46.150.100.6]) by mx.google.com with ESMTPS id t23sm6800049bks.4.2012.09.17.23.09.57 (version=SSLv3 cipher=OTHER); Mon, 17 Sep 2012 23:09:58 -0700 (PDT) Message-ID: <50581033.4040102@gmail.com> Date: Tue, 18 Sep 2012 09:09:55 +0300 From: Volodymyr Kostyrko User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:15.0) Gecko/20120911 Thunderbird/15.0.1 MIME-Version: 1.0 To: =?UTF-8?B?IlRob21hcyBHw7ZsbG5lciAoTmV3c2xldHRlciki?= References: <001a01cd900d$bcfcc870$36f65950$@goelli.de> <504F282D.8030808@gmail.com> <000a01cd90aa$0a277310$1e765930$@goelli.de> <5050461A.9050608@gmail.com> <000001cd9239$ed734c80$c859e580$@goelli.de> <5052EC5D.4060403@gmail.com> <000a01cd9274$0aa0bba0$1fe232e0$@goelli.de> <505322C9.70200@gmail.com> <000001cd9377$e9e9b010$bdbd1030$@goelli.de> <50559CD8.1070700@gmail.com> <000001cd94f1$a4157030$ec405090$@goelli.de> In-Reply-To: <000001cd94f1$a4157030$ec405090$@goelli.de> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-fs@freebsd.org Subject: Re: AW: AW: AW: AW: AW: ZFS: Corrupted pool metadata after adding vdev to a pool - no opportunity to rescue data from healthy vdevs? Remove a vdev? Rewrite metadata? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 06:10:01 -0000 17.09.2012 19:29, Thomas Göllner (Newsletter) write: >> If you can afford putting your drives aside you can try to wait before some tool occasionally emerges. I will not promise anything >> but I'm slowly making some progress with my script. I'm motivated about that as I have broken pool with photos. Trying to import >> that pool is causing a core dump on any system I tested like OpenSolaris, Illumos or SystemRescueCD. > > It would be great if you script would be able to deal with pools with broken labels. I will put the three 3TB disks aside and use the old 1.5TB disks instead. So if there is some progress in your script or someone else is gonna write some tool for restoring labels or reading data of broken pools, perhaps I can get some data back. I think it would take some time to get this fresh 3TB pool full ;-) > > This would also solve the next problem I discovered... > These 1.5TB disks have 512byte sectors. I have one spare. If the second disk falls out, first I thought, I will replace it with a 4TB disk and so on until I have replaced all of them. So I can expand the pool. But as I read now, this is not possible, isn't it? Because the 4TB drives would have 4k sectors. From my point of view all hype about moving to 4k sectors is highly irrelevant to ZFS and current products on the market. 1. ZFS tends to use big recordsize for storing any data. This means most files on your drives are already stored in 128k sectors. Storing small tails in 512b or 4k sectors shouldn't give big difference. 2. For older drives each drive should be partitioned with respect to 4k sectors. This is what -a option of gpart does: it aligns created partitions to 4k sector bounds. But half a year ago I already found some drives that can auto-shift all disk transactions to optimize read and write performance. Courtesy of Microsoft Windows, OS that does not care about anything not written in license terms, same as the users do, so using this drives would be more straightforward and would not cause decent pain to IT stuff about realigning partitions the way it would just work. -- Sphinx of black quartz judge my vow. From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 08:59:46 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 58BB6106566C for ; Tue, 18 Sep 2012 08:59:46 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id E327F8FC0C for ; Tue, 18 Sep 2012 08:59:45 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q8I8xrUp084443; Tue, 18 Sep 2012 11:59:54 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q8I8xfmZ043766; Tue, 18 Sep 2012 11:59:41 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q8I8xfWK043765; Tue, 18 Sep 2012 11:59:41 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 18 Sep 2012 11:59:41 +0300 From: Konstantin Belousov To: Rick Macklem Message-ID: <20120918085941.GZ37286@deviant.kiev.zoral.com.ua> References: <20120917122325.GR37286@deviant.kiev.zoral.com.ua> <1777840817.743780.1347917564789.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="hZpDuTGHUtM8eGVR" Content-Disposition: inline In-Reply-To: <1777840817.743780.1347917564789.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Mutt/1.5.21 (2010-09-15) X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: FS List Subject: Re: testing/review of atomic export update patch X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 08:59:46 -0000 --hZpDuTGHUtM8eGVR Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Sep 17, 2012 at 05:32:44PM -0400, Rick Macklem wrote: > Konstantin Belousov wrote: > > On Sun, Sep 16, 2012 at 05:41:25PM -0400, Rick Macklem wrote: > > > Hi, > > > > > > There is a simple patch at: > > > http://people.freebsd.org/~rmacklem/atomic-export.patch > > > that can be applied to a kernel + mountd, so that the new > > > nfsd can be suspended by mountd while the exports are being > > > reloaded. It adds a new "-S" flag to mountd to enable this. > > > (This avoids the long standing bug where clients receive ESTALE > > > replies to RPCs while mountd is reloading exports.) > >=20 > > This looks simple, but also somewhat worrisome. What would happen > > if the mountd crashes after nfsd suspension is requested, but before > > resume was performed ? > >=20 > > Might be, mountd should check for suspended nfsd on start and > > unsuspend > > it, if some flag is specified ? > Well, I think that happens with the patch as it stands. >=20 > suspend is done if the "-S" option is specified, but that is a no op > if it is already suspended. The resume is done no matter what flags > are provided, so mountd will always try and do a "resume". > --> get_exportlist() is always called when mountd is started up and > it does the resume unconditionally when it completes. > If mountd repeatedly crashes before completing get_exportlist() > when it is started up, the exports will be all messed up, so > having the nfsd threads suspended doesn't seem so bad for this > case (which hopefully never happens;-). >=20 > Both suspend and resume are just no ops for unpatched kernels. >=20 > Maybe the comment in front of "resume" should explicitly explain > this, instead of saying resume is harmless to do under all conditions? >=20 > Thanks for looking at it, rick I see. My another note is that there is no any protection against parallel instances of suspend/resume happen. For instance, one thread could set suspend_nfsd =3D 1 and be descheduled, while another executes resume code sequence meantime. Then it would see suspend_nfsd !=3D 0, while nfsv4rootfs_lock not held, and tries to unlock it. It seems that nfsv4_unlock would silently exit. The suspending thread resumes, and obtains the lock. You end up with suspend_nfsd =3D=3D 0 but lock held. --hZpDuTGHUtM8eGVR Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAlBYN/0ACgkQC3+MBN1Mb4iPGgCeM/a6BN9tZLpmw3fstmO+Gd1Q mKEAniRaUuIkellq4m3LLYRfLo8MzYvE =Kqj8 -----END PGP SIGNATURE----- --hZpDuTGHUtM8eGVR-- From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 10:37:56 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 356CD1065675 for ; Tue, 18 Sep 2012 10:37:56 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id A828A8FC14 for ; Tue, 18 Sep 2012 10:37:55 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.5/8.14.5) with ESMTP id q8IASHPS013435 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Tue, 18 Sep 2012 13:28:18 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <50584CC1.3030300@digsys.bg> Date: Tue, 18 Sep 2012 13:28:17 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:10.0.6esrpre) Gecko/20120728 Thunderbird/10.0.6 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <001a01cd900d$bcfcc870$36f65950$@goelli.de> <504F282D.8030808@gmail.com> <000a01cd90aa$0a277310$1e765930$@goelli.de> <5050461A.9050608@gmail.com> <000001cd9239$ed734c80$c859e580$@goelli.de> <5052EC5D.4060403@gmail.com> <000a01cd9274$0aa0bba0$1fe232e0$@goelli.de> <505322C9.70200@gmail.com> <000001cd9377$e9e9b010$bdbd1030$@goelli.de> <50559CD8.1070700@gmail.com> <000001cd94f1$a4157030$ec405090$@goelli.de> <50581033.4040102@gmail.com> In-Reply-To: <50581033.4040102@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: AW: AW: AW: AW: AW: ZFS: Corrupted pool metadata after adding vdev to a pool - no opportunity to rescue data from healthy vdevs? Remove a vdev? Rewrite metadata? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 10:37:56 -0000 On 18.09.12 09:09, Volodymyr Kostyrko wrote: > > From my point of view all hype about moving to 4k sectors is highly > irrelevant to ZFS and current products on the market. > > 1. ZFS tends to use big recordsize for storing any data. This means > most files on your drives are already stored in 128k sectors. Storing > small tails in 512b or 4k sectors shouldn't give big difference. Truth is, ZFS will write blocks of size from your media sector size up to 128K. The problem is that ZFS writes these records (even 128K) aligned to the sector size. So, once you write some data that is under 4k, your pool will become misaligned. There are two problems with the 4k drives: - many of these drives lie about their sector size. You must instruct your software to threat them as 4k sector drives, otherwise the performance penalty (mostly for writing) is very significant. - new drives you buy will inevitably come with 4k sectors (or more) and if you need to replace a drive in large zpool you will start having abysmal write performance. > > 2. For older drives each drive should be partitioned with respect to > 4k sectors. This is what -a option of gpart does: it aligns created > partitions to 4k sector bounds. But half a year ago I already found > some drives that can auto-shift all disk transactions to optimize read > and write performance. Courtesy of Microsoft Windows, OS that does not > care about anything not written in license terms, same as the users > do, so using this drives would be more straightforward and would not > cause decent pain to IT stuff about realigning partitions the way it > would just work. > This is only hype. There is no way any disk firmware can shift any transactions. All these drives do when you write 512 bytes in any 4k sector is read the 4k sector, replace 512 bytes of it and write it back. Best you could hope is that sector is already in the disk cache, which of course is rare. The problem is not Windows itself, but the old MBR concept, that first partition starts at sector 63. Today, it is wise to always make sure new zpools are created with ashift=12. Daniel From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 11:20:50 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 63658106564A; Tue, 18 Sep 2012 11:20:50 +0000 (UTC) (envelope-from feld@feld.me) Received: from feld.me (unknown [IPv6:2607:f4e0:100:300::2]) by mx1.freebsd.org (Postfix) with ESMTP id 2D3B18FC1A; Tue, 18 Sep 2012 11:20:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=feld.me; s=blargle; h=In-Reply-To:Message-Id:From:Mime-Version:Date:References:Subject:To:Content-Type; bh=3q/ZkSyRRVX+/nhhCMO2adRsEqVNgkSQzlqQWUL+7Js=; b=dCzH0W+jfnGLBdzlSHtfiUbpWtwbMGKW8zQiKN/Piug2Wyz+Q2I9qWJmQTMRh1SFAcja8sQ4b8dkjWCDiP1Iee8XCcRMGlN6U0zDkIPl1Kz53nkRYJag0XOqzp8PoPww; Received: from localhost ([127.0.0.1] helo=mwi1.coffeenet.org) by feld.me with esmtp (Exim 4.80 (FreeBSD)) (envelope-from ) id 1TDvr7-0009oV-C6; Tue, 18 Sep 2012 06:20:47 -0500 Received: from feld@feld.me by mwi1.coffeenet.org (Archiveopteryx 3.1.4) with esmtpa id 1347967235-3100-3099/5/74; Tue, 18 Sep 2012 11:20:35 +0000 Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes To: FreeBSD FS , John References: <20120918010812.GA71005@FreeBSD.org> Date: Tue, 18 Sep 2012 06:20:35 -0500 Mime-Version: 1.0 From: Mark Felder Message-Id: In-Reply-To: <20120918010812.GA71005@FreeBSD.org> User-Agent: Opera Mail/12.02 (FreeBSD) X-SA-Report: ALL_TRUSTED=-1, KHOP_THREADED=-0.5 X-SA-Score: -1.5 Cc: Subject: Re: XFS/istgt backed 8TB xfs filesystem configuration? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 11:20:50 -0000 Am I correct that you're proposing the following configuration: FreeBSD Server -> zpool -> zvol -> istgt -> Linux server -> iscsi initiator -> XFS filesystem If so then yes, I am using a setup like this. It works quite well. :-) From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 11:24:04 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 1D01F106566B for ; Tue, 18 Sep 2012 11:24:04 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from noop.in-addr.com (mail.in-addr.com [IPv6:2001:470:8:162::1]) by mx1.freebsd.org (Postfix) with ESMTP id D79348FC0A for ; Tue, 18 Sep 2012 11:24:03 +0000 (UTC) Received: from gjp by noop.in-addr.com with local (Exim 4.80 (FreeBSD)) (envelope-from ) id 1TDvuF-000PYY-Sa; Tue, 18 Sep 2012 07:23:55 -0400 Date: Tue, 18 Sep 2012 07:23:55 -0400 From: Gary Palmer To: Volodymyr Kostyrko Message-ID: <20120918112355.GB77784@in-addr.com> References: <000a01cd90aa$0a277310$1e765930$@goelli.de> <5050461A.9050608@gmail.com> <000001cd9239$ed734c80$c859e580$@goelli.de> <5052EC5D.4060403@gmail.com> <000a01cd9274$0aa0bba0$1fe232e0$@goelli.de> <505322C9.70200@gmail.com> <000001cd9377$e9e9b010$bdbd1030$@goelli.de> <50559CD8.1070700@gmail.com> <000001cd94f1$a4157030$ec405090$@goelli.de> <50581033.4040102@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <50581033.4040102@gmail.com> X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: gpalmer@freebsd.org X-SA-Exim-Scanned: No (on noop.in-addr.com); SAEximRunCond expanded to false Cc: freebsd-fs@freebsd.org, "\"Thomas G??llner \(Newsletter\)\"" Subject: Re: AW: AW: AW: AW: AW: ZFS: Corrupted pool metadata after adding vdev to a pool - no opportunity to rescue data from healthy vdevs? Remove a vdev? Rewrite metadata? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 11:24:04 -0000 On Tue, Sep 18, 2012 at 09:09:55AM +0300, Volodymyr Kostyrko wrote: > 17.09.2012 19:29, Thomas G??llner (Newsletter) write: > >> If you can afford putting your drives aside you can try to wait before some tool occasionally emerges. I will not promise anything > >> but I'm slowly making some progress with my script. I'm motivated about that as I have broken pool with photos. Trying to import > >> that pool is causing a core dump on any system I tested like OpenSolaris, Illumos or SystemRescueCD. > > > > It would be great if you script would be able to deal with pools with broken labels. I will put the three 3TB disks aside and use the old 1.5TB disks instead. So if there is some progress in your script or someone else is gonna write some tool for restoring labels or reading data of broken pools, perhaps I can get some data back. I think it would take some time to get this fresh 3TB pool full ;-) > > > > This would also solve the next problem I discovered... > > These 1.5TB disks have 512byte sectors. I have one spare. If the second disk falls out, first I thought, I will replace it with a 4TB disk and so on until I have replaced all of them. So I can expand the pool. But as I read now, this is not possible, isn't it? Because the 4TB drives would have 4k sectors. > > From my point of view all hype about moving to 4k sectors is highly > irrelevant to ZFS and current products on the market. > > 1. ZFS tends to use big recordsize for storing any data. This means most > files on your drives are already stored in 128k sectors. Storing small > tails in 512b or 4k sectors shouldn't give big difference. Performance testing has shown that running "advanced format" (aka 4kilobyte sector disks) with 512 byte alignment with ZFS seriously degrades performance compared to running with 4 kilobyte alignment. Regards, Gary From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 11:25:03 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9ECA31065673 for ; Tue, 18 Sep 2012 11:25:03 +0000 (UTC) (envelope-from olivier@gid0.org) Received: from mail-lpp01m010-f54.google.com (mail-lpp01m010-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id 16FA38FC19 for ; Tue, 18 Sep 2012 11:25:02 +0000 (UTC) Received: by lage12 with SMTP id e12so5987478lag.13 for ; Tue, 18 Sep 2012 04:24:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:x-gm-message-state; bh=r8/56poSFT0wgrIE/N9ERfflmxTP2deA7BPt0thPVIU=; b=YMeUpuDxFdMnYO+XF66hH2/5Hc3mWYMYejPa0q0nf5hDQ4G1Bvak+NT2KrQOarZd3d EbrOnEc1wom6DI3JQ/jmXWYEwnS2lYy55jH0hcfCj4sLD6tydecDCG+hmPBK+X4hZlsm Zx8H2JQ8y+UvzVQao4UzOwMwarNjMwPA7BcAhCdON64eZBhoOsgx74pVgqPuM0NnQNJZ oPEkjtJPeU025aB2d6T6xTRFKuipCGelrjN8X59dhbPRPvjjTpH7mBTBWYSr1Lnhg8nx 7KTX2E6CDpO8sYLXWLQu2UTehbObFjVgxdpRIfHZflFUy73vvJ/IfuK0PDr8xX+MnRp3 dIjw== MIME-Version: 1.0 Received: by 10.112.42.103 with SMTP id n7mr123699lbl.69.1347967495873; Tue, 18 Sep 2012 04:24:55 -0700 (PDT) Received: by 10.112.2.36 with HTTP; Tue, 18 Sep 2012 04:24:55 -0700 (PDT) In-Reply-To: <9A49FDEB-325E-4A25-8FB9-C4FF8F9BAF67@bway.net> References: <9A49FDEB-325E-4A25-8FB9-C4FF8F9BAF67@bway.net> Date: Tue, 18 Sep 2012 13:24:55 +0200 Message-ID: From: Olivier Smedts To: Charles Sprickman Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQlSOjnxaUu3TNGrbh73sRD6vrLSnayvT6yVNfQ0Jx83a6nCRxInUpf8DAW0uf7Udgzdku8d Cc: freebsd-fs@freebsd.org Subject: Re: zpool add log to root pool X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 11:25:03 -0000 2012/9/17 Charles Sprickman : > On Sep 17, 2012, at 8:34 AM, Olivier Smedts wrote: >> Is there anyone successfuly using a log device / zil on a root pool >> under FreeBSD ? > > Mine works, but I've never been able to find any official confirmation from the -fs folks as to how "proper" or supported the configuration is. Not too many other options on 1U boxes though really... Thanks, I tried and it works for me : # zpool set bootfs= tank # zpool add tank log gpt/zil # zpool set bootfs=tank/freebsd tank Rebooted, seems to work... maybe the bootfs property check should be disabled in the code ? -- Olivier Smedts _ ASCII ribbon campaign ( ) e-mail: olivier@gid0.org - against HTML email & vCards X www: http://www.gid0.org - against proprietary attachments / \ "Il y a seulement 10 sortes de gens dans le monde : ceux qui comprennent le binaire, et ceux qui ne le comprennent pas." From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 12:27:16 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 6F3E0106568A for ; Tue, 18 Sep 2012 12:27:16 +0000 (UTC) (envelope-from c.kworr@gmail.com) Received: from mail-ey0-f182.google.com (mail-ey0-f182.google.com [209.85.215.182]) by mx1.freebsd.org (Postfix) with ESMTP id F136F8FC08 for ; Tue, 18 Sep 2012 12:27:15 +0000 (UTC) Received: by eaak11 with SMTP id k11so2903418eaa.13 for ; Tue, 18 Sep 2012 05:27:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=nOrc22YorVIyf8PAEfpLbrbjtX3D/8bG/YNUes1eBVU=; b=pG9oOwLOTHOha/I83KnswFH7dufwinDBWExEEneVPNoow/bKuWPytX2Rg4W93zsYOh 6xbD02qhBpF3rbmzQ+NvIaMqXB/urZX4rMop7lYQdn9FwXpRjSaaSLTFsb9RBGMyV7bC 4sISl4U/pi8UtRjGuG0uYXLJQiAECtOajoNgHIGYTZVUNHhBo8B7umVhTcNOyAU+TbJO eqZo7tBGVOaAymkGlK81M1P+d66lluZIks7/qY/8Ouw7Ds5R5gLLdTXTG7sgEHDP4+rR rl3H5PhvUJ+VVkzYf8uEHXeihZVDGdaQi+6yMwlRgcjLra6Sk2b5IN1yT0Pd1WWhxGVZ E31w== Received: by 10.14.198.65 with SMTP id u41mr17451850een.22.1347971234791; Tue, 18 Sep 2012 05:27:14 -0700 (PDT) Received: from green.local (90-224-132-95.pool.ukrtel.net. [95.132.224.90]) by mx.google.com with ESMTPS id r45sm35929439eem.6.2012.09.18.05.27.11 (version=SSLv3 cipher=OTHER); Tue, 18 Sep 2012 05:27:13 -0700 (PDT) Message-ID: <5058689E.5060302@gmail.com> Date: Tue, 18 Sep 2012 15:27:10 +0300 From: Volodymyr Kostyrko User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120911 Thunderbird/15.0.1 MIME-Version: 1.0 To: Olivier Smedts References: <9A49FDEB-325E-4A25-8FB9-C4FF8F9BAF67@bway.net> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: zpool add log to root pool X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 12:27:16 -0000 18.09.2012 14:24, Olivier Smedts wrote: > 2012/9/17 Charles Sprickman : >> On Sep 17, 2012, at 8:34 AM, Olivier Smedts wrote: >>> Is there anyone successfuly using a log device / zil on a root pool >>> under FreeBSD ? >> >> Mine works, but I've never been able to find any official confirmation from the -fs folks as to how "proper" or supported the configuration is. Not too many other options on 1U boxes though really... > > Thanks, I tried and it works for me : > # zpool set bootfs= tank > # zpool add tank log gpt/zil > # zpool set bootfs=tank/freebsd tank > Rebooted, seems to work... maybe the bootfs property check should be > disabled in the code ? There might be some uncertain areas like: 1. Does our bootcode support replaying/reconstructing ZIL before booting? 2. Would the machine boot if log device is missed? 3. How much data can be thrashed when log device fails. UPS answers most of this questions to me, but my machine is local test server, not a production one. -- Sphinx of black quartz judge my vow. From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 12:40:04 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CD92C106567D; Tue, 18 Sep 2012 12:40:04 +0000 (UTC) (envelope-from c.kworr@gmail.com) Received: from mail-ee0-f54.google.com (mail-ee0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 27EE68FC0A; Tue, 18 Sep 2012 12:40:03 +0000 (UTC) Received: by eeke52 with SMTP id e52so4141055eek.13 for ; Tue, 18 Sep 2012 05:39:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=IKyKxpCK8Is+gtF7J8z9yhWBhayYNEYFS/LOsPG4UJI=; b=j0TD+egvjteELxTI6V+QcRQIgl2Q+iVxiUfSZ3dsLlpJh54uYlv1XWjBpf1nJHhULH HQaSh/02ARsXc8lO5EiYooFhnLK3PMtM67mMQYCYK7P0TgY5P9knkpF3KWkJTk6egod6 /NJasnVVAGJBj0c5Cm8SaW+GI1k/NNROzKKIDuefDM6qIgAd9VbTVljtrd+ruIIca6xI TsgVmkgiD+6SAMkndwNM60tnykbdIK4AZwiC0Q9iDPKCk/D1oQLaYj8qdqEPbdbB49kw DXMmjnJTJr7IhBsyhURhgBjQvhi2lLQkMWgcxIC8ZctdyaK/3SSYshYh2AqHmyP2OELO GWrg== Received: by 10.14.213.137 with SMTP id a9mr17191823eep.38.1347971997284; Tue, 18 Sep 2012 05:39:57 -0700 (PDT) Received: from green.local (90-224-132-95.pool.ukrtel.net. [95.132.224.90]) by mx.google.com with ESMTPS id k49sm36024104een.4.2012.09.18.05.39.55 (version=SSLv3 cipher=OTHER); Tue, 18 Sep 2012 05:39:55 -0700 (PDT) Message-ID: <50586B99.40108@gmail.com> Date: Tue, 18 Sep 2012 15:39:53 +0300 From: Volodymyr Kostyrko User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120911 Thunderbird/15.0.1 MIME-Version: 1.0 To: Gary Palmer References: <000a01cd90aa$0a277310$1e765930$@goelli.de> <5050461A.9050608@gmail.com> <000001cd9239$ed734c80$c859e580$@goelli.de> <5052EC5D.4060403@gmail.com> <000a01cd9274$0aa0bba0$1fe232e0$@goelli.de> <505322C9.70200@gmail.com> <000001cd9377$e9e9b010$bdbd1030$@goelli.de> <50559CD8.1070700@gmail.com> <000001cd94f1$a4157030$ec405090$@goelli.de> <50581033.4040102@gmail.com> <20120918112355.GB77784@in-addr.com> In-Reply-To: <20120918112355.GB77784@in-addr.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: AW: AW: AW: AW: AW: ZFS: Corrupted pool metadata after adding vdev to a pool - no opportunity to rescue data from healthy vdevs? Remove a vdev? Rewrite metadata? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 12:40:05 -0000 18.09.2012 14:23, Gary Palmer wrote: >> From my point of view all hype about moving to 4k sectors is highly >> irrelevant to ZFS and current products on the market. >> >> 1. ZFS tends to use big recordsize for storing any data. This means most >> files on your drives are already stored in 128k sectors. Storing small >> tails in 512b or 4k sectors shouldn't give big difference. > > Performance testing has shown that running "advanced format" (aka 4kilobyte > sector disks) with 512 byte alignment with ZFS seriously degrades performance > compared to running with 4 kilobyte alignment. Please understand me correctly, this is only my point of view on the problem as I never saw any tests that show difference between correct alignment of _partitions_ and alignment on _records_ on ZFS. This area is not thoroughly covered with test data. -- Sphinx of black quartz judge my vow. From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 12:42:14 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 821) id 6F83D106566C; Tue, 18 Sep 2012 12:42:14 +0000 (UTC) Date: Tue, 18 Sep 2012 12:42:14 +0000 From: John To: FreeBSD FS Message-ID: <20120918124214.GA79439@FreeBSD.org> References: <20120918010812.GA71005@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.1i Cc: Mark Felder Subject: Re: XFS/istgt backed 8TB xfs filesystem configuration? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 12:42:14 -0000 ----- Mark Felder's Original Message ----- > Am I correct that you're proposing the following configuration: > > FreeBSD Server -> zpool -> zvol -> istgt -> Linux server -> iscsi > initiator -> XFS filesystem > > If so then yes, I am using a setup like this. It works quite well. :-) Yes. Since I'm using multipath, I might add: FreeBSD Server -> zpool -> zvol -> istgt -> Linux server -> iscsi(/dev/sd[bcde]) -> multipath(/dev/dm-X) Have you found any blocksizes, cache sizes, etc, that seem optimal? Thanks, John From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 12:45:45 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2FC9C106564A; Tue, 18 Sep 2012 12:45:45 +0000 (UTC) (envelope-from feld@feld.me) Received: from feld.me (unknown [IPv6:2607:f4e0:100:300::2]) by mx1.freebsd.org (Postfix) with ESMTP id D91BC8FC14; Tue, 18 Sep 2012 12:45:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=feld.me; s=blargle; h=In-Reply-To:Message-Id:From:Mime-Version:Date:References:Subject:To:Content-Type; bh=BqCoZjhFi+dRO74BJUy/YLZbbB5ylu7v44fCy/3g9SY=; b=WOEJmZSSS6/EvmBwmPccIxBbYhkiYB9tC/P+E4xhJzF/1cWK1btJzOcSSxzJ1K/F3pbTZOlobnDKrEdZy39zK0PRaI/159EjnUT+pndzytvZFvIs70ZcQQFB6uxtGI4V; Received: from localhost ([127.0.0.1] helo=mwi1.coffeenet.org) by feld.me with esmtp (Exim 4.80 (FreeBSD)) (envelope-from ) id 1TDxBK-000Crf-Eo; Tue, 18 Sep 2012 07:45:44 -0500 Received: from feld@feld.me by mwi1.coffeenet.org (Archiveopteryx 3.1.4) with esmtpa id 1347972332-3100-3099/5/75; Tue, 18 Sep 2012 12:45:32 +0000 Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes To: FreeBSD FS , John References: <20120918010812.GA71005@FreeBSD.org> <20120918124214.GA79439@FreeBSD.org> Date: Tue, 18 Sep 2012 07:45:31 -0500 Mime-Version: 1.0 From: Mark Felder Message-Id: In-Reply-To: <20120918124214.GA79439@FreeBSD.org> User-Agent: Opera Mail/12.02 (FreeBSD) X-SA-Report: ALL_TRUSTED=-1, KHOP_THREADED=-0.5 X-SA-Score: -1.5 Cc: Subject: Re: XFS/istgt backed 8TB xfs filesystem configuration? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 12:45:45 -0000 On Tue, 18 Sep 2012 07:42:14 -0500, John wrote: > > Have you found any blocksizes, cache sizes, etc, that seem optimal? > I'm providing iSCSI for XEN and VMWare servers and currently have no optimal tunings. ZFS's variable block size seems to handle the load just fine, and I have plenty of SSDs doing work for me. Just make sure you have carefully calculated the number of Connections and Sessions for istgt -- your multipath and number of LUNs being shared will affect these numbers. From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 13:14:48 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 6B1C0106566B for ; Tue, 18 Sep 2012 13:14:48 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from noop.in-addr.com (mail.in-addr.com [IPv6:2001:470:8:162::1]) by mx1.freebsd.org (Postfix) with ESMTP id 24FDE8FC14 for ; Tue, 18 Sep 2012 13:14:48 +0000 (UTC) Received: from gjp by noop.in-addr.com with local (Exim 4.80 (FreeBSD)) (envelope-from ) id 1TDxdR-000PhC-GT; Tue, 18 Sep 2012 09:14:41 -0400 Date: Tue, 18 Sep 2012 09:14:41 -0400 From: Gary Palmer To: Volodymyr Kostyrko Message-ID: <20120918131441.GC77784@in-addr.com> References: <000001cd9239$ed734c80$c859e580$@goelli.de> <5052EC5D.4060403@gmail.com> <000a01cd9274$0aa0bba0$1fe232e0$@goelli.de> <505322C9.70200@gmail.com> <000001cd9377$e9e9b010$bdbd1030$@goelli.de> <50559CD8.1070700@gmail.com> <000001cd94f1$a4157030$ec405090$@goelli.de> <50581033.4040102@gmail.com> <20120918112355.GB77784@in-addr.com> <50586B99.40108@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <50586B99.40108@gmail.com> X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: gpalmer@freebsd.org X-SA-Exim-Scanned: No (on noop.in-addr.com); SAEximRunCond expanded to false Cc: freebsd-fs@freebsd.org Subject: Re: AW: AW: AW: AW: AW: ZFS: Corrupted pool metadata after adding vdev to a pool - no opportunity to rescue data from healthy vdevs? Remove a vdev? Rewrite metadata? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 13:14:48 -0000 On Tue, Sep 18, 2012 at 03:39:53PM +0300, Volodymyr Kostyrko wrote: > 18.09.2012 14:23, Gary Palmer wrote: > >> From my point of view all hype about moving to 4k sectors is highly > >> irrelevant to ZFS and current products on the market. > >> > >> 1. ZFS tends to use big recordsize for storing any data. This means most > >> files on your drives are already stored in 128k sectors. Storing small > >> tails in 512b or 4k sectors shouldn't give big difference. > > > > Performance testing has shown that running "advanced format" (aka 4kilobyte > > sector disks) with 512 byte alignment with ZFS seriously degrades performance > > compared to running with 4 kilobyte alignment. > > Please understand me correctly, this is only my point of view on the > problem as I never saw any tests that show difference between correct > alignment of _partitions_ and alignment on _records_ on ZFS. This area > is not thoroughly covered with test data. I seem to recall that people made 4 kilobyte aligned partitions on advanced format drives without doing the gnop trick and still suffered worse performance than when they did the gnop trick to make ashift=12. Check the list archives. If you believe there is insufficient testing here and are saying that conventional wisdom regarding this is wrong, it is resonable to request that you prove your position. Gary From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 13:19:40 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E3444106566B for ; Tue, 18 Sep 2012 13:19:39 +0000 (UTC) (envelope-from c.kworr@gmail.com) Received: from mail-ee0-f54.google.com (mail-ee0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 6791A8FC08 for ; Tue, 18 Sep 2012 13:19:39 +0000 (UTC) Received: by eeke52 with SMTP id e52so4166540eek.13 for ; Tue, 18 Sep 2012 06:19:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=HQcrnhn5FbgWTPGKFJRDpujRWO14AEy0VhnD0ssuKb8=; b=D/xmSwUgtuP+qFM6slwzZhNLxL4vDw14leMCp5LZZRBdNAvOoLkA1ait6DdAIcK0b7 xoGpfzdRPB9N07p9SvF0E7e0n+1oXlm1ZHYdD/CYGa0ygBOhaXxJ0c+hqaoAzIHqTOmi 2TkN8uwK42aePMphLe2bwg/CMsMFIuusZjRkgzIlG4oCkZtJdko9G0EsYWDTfDleVbvD JYQJbgMViyV3FnBkfv1p+P4lq1wE/YxsQ6HY+TPHl6HbDckr/9y5Xz63AdiEHrr3YnNA RtNeqccKLwPIkOkAjr3KtU+io+PY5DYOnTknrNBFyJUlnmOyvdIzwToA8P2DCtueG2IJ AnLA== Received: by 10.14.4.198 with SMTP id 46mr215206eej.11.1347974378028; Tue, 18 Sep 2012 06:19:38 -0700 (PDT) Received: from green.local (90-224-132-95.pool.ukrtel.net. [95.132.224.90]) by mx.google.com with ESMTPS id r45sm36290476eem.6.2012.09.18.06.19.35 (version=SSLv3 cipher=OTHER); Tue, 18 Sep 2012 06:19:36 -0700 (PDT) Message-ID: <505874E6.2050109@gmail.com> Date: Tue, 18 Sep 2012 16:19:34 +0300 From: Volodymyr Kostyrko User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120911 Thunderbird/15.0.1 MIME-Version: 1.0 To: Daniel Kalchev References: <001a01cd900d$bcfcc870$36f65950$@goelli.de> <504F282D.8030808@gmail.com> <000a01cd90aa$0a277310$1e765930$@goelli.de> <5050461A.9050608@gmail.com> <000001cd9239$ed734c80$c859e580$@goelli.de> <5052EC5D.4060403@gmail.com> <000a01cd9274$0aa0bba0$1fe232e0$@goelli.de> <505322C9.70200@gmail.com> <000001cd9377$e9e9b010$bdbd1030$@goelli.de> <50559CD8.1070700@gmail.com> <000001cd94f1$a4157030$ec405090$@goelli.de> <50581033.4040102@gmail.com> <50584CC1.3030300@digsys.bg> In-Reply-To: <50584CC1.3030300@digsys.bg> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: AW: AW: AW: AW: AW: ZFS: Corrupted pool metadata after adding vdev to a pool - no opportunity to rescue data from healthy vdevs? Remove a vdev? Rewrite metadata? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 13:19:40 -0000 18.09.2012 13:28, Daniel Kalchev wrote: >> From my point of view all hype about moving to 4k sectors is highly >> irrelevant to ZFS and current products on the market. >> >> 1. ZFS tends to use big recordsize for storing any data. This means >> most files on your drives are already stored in 128k sectors. Storing >> small tails in 512b or 4k sectors shouldn't give big difference. > > Truth is, ZFS will write blocks of size from your media sector size up > to 128K. > > The problem is that ZFS writes these records (even 128K) aligned to the > sector size. So, once you write some data that is under 4k, your pool > will become misaligned. Not exactly. https://blogs.oracle.com/bonwick/entry/space_maps 1. ZFS divides the space on each virtual device into a few hundred metaslabs. 2. As Metaslabs are quite big so it's quite logical to make them aligned with high ashift value (I miss documentations on wheter this is true, but at least they should be dividable by 128k as this is default recordsize). 3. In each metaslab all space allocation is done through space maps. I have no documentation on this one either but due to a presence of gang blocks in ZFS specification all new allocation should be aligned to 128k if we are allocating 128k block, aligned to 64k if we are allocating 64k block and so on (yet again, I miss documentation on wheter this is true, but as far I understand Solaris way it's more practical to have data aligned then later dealing with it). I'm bad at reading code so I can't really say how allocations are aligned on ZFS metaslabs, but function dealing with metaslab allocation takes one 'align' variable. >> 2. For older drives each drive should be partitioned with respect to >> 4k sectors. This is what -a option of gpart does: it aligns created >> partitions to 4k sector bounds. But half a year ago I already found >> some drives that can auto-shift all disk transactions to optimize read >> and write performance. Courtesy of Microsoft Windows, OS that does not >> care about anything not written in license terms, same as the users >> do, so using this drives would be more straightforward and would not >> cause decent pain to IT stuff about realigning partitions the way it >> would just work. >> > > This is only hype. There is no way any disk firmware can shift any > transactions. How about Seagate Smart Align? It's documented to do so. I haven't touched any Seagate drives as I don't like them anyway... -- Sphinx of black quartz judge my vow. From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 13:25:38 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8BF6F1065674; Tue, 18 Sep 2012 13:25:38 +0000 (UTC) (envelope-from c.kworr@gmail.com) Received: from mail-ee0-f54.google.com (mail-ee0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id E020D8FC1C; Tue, 18 Sep 2012 13:25:37 +0000 (UTC) Received: by eeke52 with SMTP id e52so4170581eek.13 for ; Tue, 18 Sep 2012 06:25:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=gApzxRvsz0CRRU3haEdzzrw0frra1yMDOAJsEquEGeo=; b=qFwGWTwYsvS5QKhV7ZFkfRgfQoAjCAlKmPTr3KwN5bmiQKtSJMKBRDQgOPGAqIem7q stJkdOqtpnrgpFhtdGrQy5ihG4vZUqpM6NR2qrSxQtEvHgOYPQlUJUVt+IjlC6IpaSlA 2s23N3SvzhgjSlAb9hRzICqgnsH43kcv0KoDIKsEJ/2Qcw5BIA3s/gKmFaMyr0cb1LWO bjPZ1kMrwQnQvly394RcBz3dn9VwMEq2oPCpQX3NnLL6ODb/0HWfBNOgeTyiMixWFOkv btlHnoiTDB1/RcphrM4lpshmwDHVVIsQGvwKcFNQwf/CDGXxQ8qBVOCLD+By37F5LsDM OfgQ== Received: by 10.14.224.4 with SMTP id w4mr199891eep.21.1347974736902; Tue, 18 Sep 2012 06:25:36 -0700 (PDT) Received: from green.local (90-224-132-95.pool.ukrtel.net. [95.132.224.90]) by mx.google.com with ESMTPS id k49sm36338835een.4.2012.09.18.06.25.34 (version=SSLv3 cipher=OTHER); Tue, 18 Sep 2012 06:25:35 -0700 (PDT) Message-ID: <5058764D.1010403@gmail.com> Date: Tue, 18 Sep 2012 16:25:33 +0300 From: Volodymyr Kostyrko User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120911 Thunderbird/15.0.1 MIME-Version: 1.0 To: Gary Palmer References: <000001cd9239$ed734c80$c859e580$@goelli.de> <5052EC5D.4060403@gmail.com> <000a01cd9274$0aa0bba0$1fe232e0$@goelli.de> <505322C9.70200@gmail.com> <000001cd9377$e9e9b010$bdbd1030$@goelli.de> <50559CD8.1070700@gmail.com> <000001cd94f1$a4157030$ec405090$@goelli.de> <50581033.4040102@gmail.com> <20120918112355.GB77784@in-addr.com> <50586B99.40108@gmail.com> <20120918131441.GC77784@in-addr.com> In-Reply-To: <20120918131441.GC77784@in-addr.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: AW: AW: AW: AW: AW: ZFS: Corrupted pool metadata after adding vdev to a pool - no opportunity to rescue data from healthy vdevs? Remove a vdev? Rewrite metadata? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 13:25:38 -0000 18.09.2012 16:14, Gary Palmer wrote: >> Please understand me correctly, this is only my point of view on the >> problem as I never saw any tests that show difference between correct >> alignment of _partitions_ and alignment on _records_ on ZFS. This area >> is not thoroughly covered with test data. > > I seem to recall that people made 4 kilobyte aligned partitions on > advanced format drives without doing the gnop trick and still > suffered worse performance than when they did the gnop trick to make > ashift=12. Check the list archives. > > If you believe there is insufficient testing here and are saying that > conventional wisdom regarding this is wrong, it is resonable to request > that you prove your position. I have one of the first 4k drives yet it's not yet available for testing. I'm planning to rerun tests on it when it will be available. -- Sphinx of black quartz judge my vow. From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 13:34:56 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EDE3D106566B for ; Tue, 18 Sep 2012 13:34:55 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id AB9BA8FC0C for ; Tue, 18 Sep 2012 13:34:55 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EAEp3WFCDaFvO/2dsb2JhbAA+BxaFcbc0giABAQUjBFIbDgoCAg0ZAlkGiBMLpxuTFIEhigAhhTWBEgOVYoEUjw2DAoE+Ihs X-IronPort-AV: E=Sophos;i="4.80,443,1344225600"; d="scan'208";a="182105131" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 18 Sep 2012 09:34:54 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 5C44DB4017; Tue, 18 Sep 2012 09:34:54 -0400 (EDT) Date: Tue, 18 Sep 2012 09:34:54 -0400 (EDT) From: Rick Macklem To: Konstantin Belousov Message-ID: <21418398.765673.1347975294365.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20120918085941.GZ37286@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: FS List Subject: Re: testing/review of atomic export update patch X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 13:34:56 -0000 Konstantin Belousov wrote: > On Mon, Sep 17, 2012 at 05:32:44PM -0400, Rick Macklem wrote: > > Konstantin Belousov wrote: > > > On Sun, Sep 16, 2012 at 05:41:25PM -0400, Rick Macklem wrote: > > > > Hi, > > > > > > > > There is a simple patch at: > > > > http://people.freebsd.org/~rmacklem/atomic-export.patch > > > > that can be applied to a kernel + mountd, so that the new > > > > nfsd can be suspended by mountd while the exports are being > > > > reloaded. It adds a new "-S" flag to mountd to enable this. > > > > (This avoids the long standing bug where clients receive ESTALE > > > > replies to RPCs while mountd is reloading exports.) > > > > > > This looks simple, but also somewhat worrisome. What would happen > > > if the mountd crashes after nfsd suspension is requested, but > > > before > > > resume was performed ? > > > > > > Might be, mountd should check for suspended nfsd on start and > > > unsuspend > > > it, if some flag is specified ? > > Well, I think that happens with the patch as it stands. > > > > suspend is done if the "-S" option is specified, but that is a no op > > if it is already suspended. The resume is done no matter what flags > > are provided, so mountd will always try and do a "resume". > > --> get_exportlist() is always called when mountd is started up and > > it does the resume unconditionally when it completes. > > If mountd repeatedly crashes before completing get_exportlist() > > when it is started up, the exports will be all messed up, so > > having the nfsd threads suspended doesn't seem so bad for this > > case (which hopefully never happens;-). > > > > Both suspend and resume are just no ops for unpatched kernels. > > > > Maybe the comment in front of "resume" should explicitly explain > > this, instead of saying resume is harmless to do under all > > conditions? > > > > Thanks for looking at it, rick > I see. > > My another note is that there is no any protection against parallel > instances of suspend/resume happen. For instance, one thread could set > suspend_nfsd = 1 and be descheduled, while another executes resume > code sequence meantime. Then it would see suspend_nfsd != 0, while > nfsv4rootfs_lock not held, and tries to unlock it. It seems that > nfsv4_unlock would silently exit. The suspending thread resumes, > and obtains the lock. You end up with suspend_nfsd == 0 but lock held. Yes. I had assumed that mountd would be the only thing using these syscalls and it is single threaded. (The syscalls can only be done by root for the obvious reasons.;-) Maybe the following untested version of the syscalls would be better, since they would allow multiple concurrent calls to either suspend or resume. (There would still be an indeterminate case if one thread called resume concurrently with another few calling suspend, but that is unavoidable, I think?) Again, thanks for the comments, rick --- untested version of syscalls --- } else if ((uap->flag & NFSSVC_SUSPENDNFSD) != 0) { NFSLOCKV4ROOTMUTEX(); if (suspend_nfsd == 0) { /* Lock out all nfsd threads */ igotlock = 0; while (igotlock == 0 && suspend_nfsd == 0) { igotlock = nfsv4_lock(&nfsv4rootfs_lock, 1, NULL, NFSV4ROOTLOCKMUTEXPTR, NULL); } suspend_nfsd = 1; } NFSUNLOCKV4ROOTMUTEX(); error = 0; } else if ((uap->flag & NFSSVC_RESUMENFSD) != 0) { NFSLOCKV4ROOTMUTEX(); if (suspend_nfsd != 0) { nfsv4_unlock(&nfsv4rootfs_lock, 0); suspend_nfsd = 0; } NFSUNLOCKV4ROOTMUTEX(); error = 0; } From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 13:40:22 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 6B638106564A for ; Tue, 18 Sep 2012 13:40:22 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id DCAF68FC08 for ; Tue, 18 Sep 2012 13:40:21 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.5/8.14.5) with ESMTP id q8IDeBn6022485 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Tue, 18 Sep 2012 16:40:12 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <505879BB.3000806@digsys.bg> Date: Tue, 18 Sep 2012 16:40:11 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:10.0.7) Gecko/20120918 Thunderbird/10.0.7 MIME-Version: 1.0 To: Volodymyr Kostyrko References: <001a01cd900d$bcfcc870$36f65950$@goelli.de> <504F282D.8030808@gmail.com> <000a01cd90aa$0a277310$1e765930$@goelli.de> <5050461A.9050608@gmail.com> <000001cd9239$ed734c80$c859e580$@goelli.de> <5052EC5D.4060403@gmail.com> <000a01cd9274$0aa0bba0$1fe232e0$@goelli.de> <505322C9.70200@gmail.com> <000001cd9377$e9e9b010$bdbd1030$@goelli.de> <50559CD8.1070700@gmail.com> <000001cd94f1$a4157030$ec405090$@goelli.de> <50581033.4040102@gmail.com> <50584CC1.3030300@digsys.bg> <505874E6.2050109@gmail.com> In-Reply-To: <505874E6.2050109@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: AW: AW: AW: AW: AW: ZFS: Corrupted pool metadata after adding vdev to a pool - no opportunity to rescue data from healthy vdevs? Remove a vdev? Rewrite metadata? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 13:40:22 -0000 On 18.09.12 16:19, Volodymyr Kostyrko wrote: > 18.09.2012 13:28, Daniel Kalchev wrote: >> >> The problem is that ZFS writes these records (even 128K) aligned to the >> sector size. So, once you write some data that is under 4k, your pool >> will become misaligned. > > Not exactly. https://blogs.oracle.com/bonwick/entry/space_maps There is no statement in this post that contradicts with what I commented already. I may have been not precise enough -- the mis-alignment might happen within the metaslab, not the whole zpool. ZFS clearly does not write larger blocks than necessary, the smallest being the sector size. The sector size is represented by the ashift value. Sector size being 2^ashift. The ashift value is on per-vdev basis and is calculated as the largest sector size of the vdev members. So if you create an vdev mirror of two drives that report 512byte sectors to the OS, the resulting vdev will have ashift=9. If you create an mirror vdev from one drive that reports 512b sectors and another that report 4096b sectors, then you will have ashift=12. You do not need to have all vdevs in an zpool having the same ashift value (and thus the same sector size). > >>> 2. For older drives each drive should be partitioned with respect to >>> 4k sectors. This is what -a option of gpart does: it aligns created >>> partitions to 4k sector bounds. But half a year ago I already found >>> some drives that can auto-shift all disk transactions to optimize read >>> and write performance. Courtesy of Microsoft Windows, OS that does not >>> care about anything not written in license terms, same as the users >>> do, so using this drives would be more straightforward and would not >>> cause decent pain to IT stuff about realigning partitions the way it >>> would just work. >>> >> >> This is only hype. There is no way any disk firmware can shift any >> transactions. > > How about Seagate Smart Align? It's documented to do so. I haven't > touched any Seagate drives as I don't like them anyway... > I have a lot of Seagate drives with 4k sectors in use with ZFS. Despite these claims, performance is far worse if writes are not aligned to 4k. It is also awful with UFS if you don't care to align partitions. This is just marketing. Their rewrite implementation might be better than others, but still is better avoided. Daniel From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 15:05:59 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9CFEE106564A for ; Tue, 18 Sep 2012 15:05:59 +0000 (UTC) (envelope-from gibbs@FreeBSD.org) Received: from aslan.scsiguy.com (mail.scsiguy.com [70.89.174.89]) by mx1.freebsd.org (Postfix) with ESMTP id 6B24B8FC0C for ; Tue, 18 Sep 2012 15:05:58 +0000 (UTC) Received: from [192.168.6.100] (207-225-98-3.dia.static.qwest.net [207.225.98.3]) (authenticated bits=0) by aslan.scsiguy.com (8.14.5/8.14.5) with ESMTP id q8IF5vUf036899 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Tue, 18 Sep 2012 09:05:58 -0600 (MDT) (envelope-from gibbs@FreeBSD.org) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.0 \(1486\)) From: "Justin T. Gibbs" In-Reply-To: <1531430179.669311.1347831685957.JavaMail.root@erie.cs.uoguelph.ca> Date: Tue, 18 Sep 2012 09:06:03 -0600 Content-Transfer-Encoding: 7bit Message-Id: References: <1531430179.669311.1347831685957.JavaMail.root@erie.cs.uoguelph.ca> To: Rick Macklem X-Mailer: Apple Mail (2.1486) X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (aslan.scsiguy.com [70.89.174.89]); Tue, 18 Sep 2012 09:05:58 -0600 (MDT) Cc: FS List , Will Andrews Subject: Re: testing/review of atomic export update patch X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 15:05:59 -0000 On Sep 16, 2012, at 3:41 PM, Rick Macklem wrote: > Hi, > > There is a simple patch at: > http://people.freebsd.org/~rmacklem/atomic-export.patch > that can be applied to a kernel + mountd, so that the new > nfsd can be suspended by mountd while the exports are being > reloaded. It adds a new "-S" flag to mountd to enable this. > (This avoids the long standing bug where clients receive ESTALE > replies to RPCs while mountd is reloading exports.) At Spectra, we are successfully using the NFSE patch set from nfse.sourceforge.net (FreeBSD PR 136865). It addresses the ESTALE problem in addition to cleaning up several aspects of exports processing. Have you reviewed the NFSE work? Do you have any issues or concerns with it? What is the right path for getting NFSE integrated into FreeBSD? -- Justin From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 15:14:20 2012 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6D936106564A for ; Tue, 18 Sep 2012 15:14:20 +0000 (UTC) (envelope-from gibbs@scsiguy.com) Received: from aslan.scsiguy.com (www.scsiguy.com [70.89.174.89]) by mx1.freebsd.org (Postfix) with ESMTP id 427018FC12 for ; Tue, 18 Sep 2012 15:14:16 +0000 (UTC) Received: from [192.168.6.100] (207-225-98-3.dia.static.qwest.net [207.225.98.3]) (authenticated bits=0) by aslan.scsiguy.com (8.14.5/8.14.5) with ESMTP id q8IFEGXn036944 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO) for ; Tue, 18 Sep 2012 09:14:16 -0600 (MDT) (envelope-from gibbs@scsiguy.com) From: "Justin T. Gibbs" Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Message-Id: <76CBA055-021F-458D-8978-E9A973D9B783@scsiguy.com> Date: Tue, 18 Sep 2012 09:14:22 -0600 To: fs@FreeBSD.org Mime-Version: 1.0 (Mac OS X Mail 6.0 \(1486\)) X-Mailer: Apple Mail (2.1486) X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (aslan.scsiguy.com [70.89.174.89]); Tue, 18 Sep 2012 09:14:16 -0600 (MDT) Cc: Subject: ZFS: Deadlock during vnode recycling X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 15:14:20 -0000 One of our systems became unresponsive due to an inability to recycle vnodes. We tracked this down to a deadlock in zfs_zget(). I've = attached the stack trace from the vnlru process to the end of this email. We are currently testing the following patch. Since this issue is hard = to replicate I would appreciate review and feedback before I commit it to FreeBSD. Thanks, Jusitn Patch =3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D= 8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D= =3D8< Change 635310 by justing@justing_ns1_spectrabsd on 2012/09/17 15:30:14 For most vnode consumers of ZFS, the appropriate behavior when encountering a vnode that is in the process of being reclaimed is to wait for that process to complete and then allocate a new vnode. This behavior is enforced in zfs_zget() by checking for the VI_DOOMED vnode flag. In the case of the thread actually reclaiming the vnode, zfs_zget() must return the current vnode, otherwise a deadlock will occur. =09 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_znode.h: Create a virtual znode field, z_reclaim_td, which is implemeted as a macro that redirects to = z_task.ta_context. =09 z_task is only used by the reclaim code to perform the final cleanup of a znode in a secondary thread. Since this can only occur after any calls to zfs_zget(), it is safe to reuse the ta_context field. =09 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c: In zfs_freebsd_reclaim(), record curthread in the znode being reclaimed. =09 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c: o Null out z_reclaim_td when znode_ts are constructed. =09 o In zfs_zget(), return a "doomed vnode" if the current thread is actively reclaiming this object. Affected files ... ... = //SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs= _znode.h#2 edit ... = //SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vno= ps.c#3 edit ... = //SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_zno= de.c#2 edit Differences ... =3D=3D=3D=3D = //SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs= _znode.h#2 (text) =3D=3D=3D=3D @@ -241,6 +241,7 @@ struct task z_task; } znode_t; =20 +#define z_reclaim_td z_task.ta_context =20 /* * Convert between znode pointers and vnode pointers =3D=3D=3D=3D = //SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vno= ps.c#3 (text) =3D=3D=3D=3D @@ -6083,6 +6083,13 @@ =20 ASSERT(zp !=3D NULL); =20 + /* + * Mark the znode so that operations that typically block + * waiting for reclamation to complete will return the current, + * "doomed vnode", for this thread. + */ + zp->z_reclaim_td =3D curthread; + /* * Destroy the vm object and flush associated pages. */ =3D=3D=3D=3D = //SpectraBSD/stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_zno= de.c#2 (text) =3D=3D=3D=3D @@ -158,6 +158,7 @@ zp->z_dirlocks =3D NULL; zp->z_acl_cached =3D NULL; zp->z_moved =3D 0; + zp->z_reclaim_td =3D NULL; return (0); } =20 @@ -1192,7 +1193,8 @@ dying =3D 1; else { VN_HOLD(vp); - if ((vp->v_iflag & VI_DOOMED) !=3D 0) { + if ((vp->v_iflag & VI_DOOMED) !=3D 0 && + zp->z_reclaim_td !=3D curthread) { dying =3D 1; /* * Don't VN_RELE() vnode here, = because vnlru_proc debug session =3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D= 8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D=3D8<=3D=3D= =3D8< #0 sched_switch (td=3D0xfffffe000f87b470, newtd=3D0xfffffe000d36c8e0, = flags=3DVariable "flags" is not available. ) at /usr/src/sys/kern/sched_ule.c:1927 #1 0xffffffff8057f2b6 in mi_switch (flags=3D260, newtd=3D0x0) at = /usr/src/sys/kern/kern_synch.c:485 #2 0xffffffff805b8982 in sleepq_timedwait (wchan=3D0xfffffe05c7515640, = pri=3D0) at /usr/src/sys/kern/subr_sleepqueue.c:658 #3 0xffffffff8057f89f in _sleep (ident=3D0xfffffe05c7515640, lock=3D0x0, = priority=3DVariable "priority" is not available. ) at /usr/src/sys/kern/kern_synch.c:246 #4 0xffffffff81093035 in zfs_zget (zfsvfs=3D0xfffffe001de4c000, = obj_num=3D81963, zpp=3D0xffffff8c60dc51b0) at = /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/= zfs_znode.c:1224 #5 0xffffffff810bec9a in zfs_get_data (arg=3D0xfffffe001de4c000, = lr=3D0xffffff820f5330b8, buf=3D0x0, zio=3D0xfffffe0584625000) at = /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/= zfs_vnops.c:1142 #6 0xffffffff81096891 in zil_commit (zilog=3D0xfffffe001c382800, = foid=3DVariable "foid" is not available. ) at = /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/= zil.c:1048 #7 0xffffffff810bceb0 in zfs_freebsd_write (ap=3DVariable "ap" is not = available. ) at = /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/= zfs_vnops.c:1083 #8 0xffffffff8081f112 in VOP_WRITE_APV (vop=3D0xffffffff8112cf40, = a=3D0xffffff8c60dc5680) at vnode_if.c:951 #9 0xffffffff807b1a6b in vnode_pager_generic_putpages = (vp=3D0xfffffe05c76171e0, ma=3D0xffffff8c60dc5890, bytecount=3DVariable = "bytecount" is not available. ) at vnode_if.h:413 #10 0xffffffff807b1749 in vnode_pager_putpages = (object=3D0xfffffe05e9ee9bc8, m=3D0xffffff8c60dc5890, count=3D61440, = sync=3D1, rtvals=3D0xffffff8c60dc57a0) at vnode_if.h:1189 #11 0xffffffff807aaee0 in vm_pageout_flush (mc=3D0xffffff8c60dc5890, = count=3D15, flags=3D1, mreq=3D0, prunlen=3D0xffffff8c60dc594c, = eio=3D0xffffff8c60dc59c0) at vm_pager.h:145 #12 0xffffffff807a3da3 in vm_object_page_collect_flush (object=3DVariable = "object" is not available. ) at /usr/src/sys/vm/vm_object.c:936 #13 0xffffffff807a3f23 in vm_object_page_clean = (object=3D0xfffffe05e9ee9bc8, start=3DVariable "start" is not available. ) at /usr/src/sys/vm/vm_object.c:861 #14 0xffffffff807a42d4 in vm_object_terminate = (object=3D0xfffffe05e9ee9bc8) at /usr/src/sys/vm/vm_object.c:706 #15 0xffffffff807b241e in vnode_destroy_vobject (vp=3D0xfffffe05c76171e0) = at /usr/src/sys/vm/vnode_pager.c:167 #16 0xffffffff810beec7 in zfs_freebsd_reclaim (ap=3DVariable "ap" is not = available. ) at = /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/= zfs_vnops.c:6146 #17 0xffffffff806101e1 in vgonel (vp=3D0xfffffe05c76171e0) at = vnode_if.h:830 #18 0xffffffff80616379 in vnlru_proc () at = /usr/src/sys/kern/vfs_subr.c:734 (kgdb) frame 4 #4 0xffffffff81093035 in zfs_zget (zfsvfs=3D0xfffffe001de4c000, = obj_num=3D81963, zpp=3D0xffffff8c60dc51b0) at = /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/= zfs_znode.c:1224 1224 tsleep(zp, 0, "zcollide", 1); (kgdb) l 1219 sa_buf_rele(db, NULL); 1220 mutex_exit(&zp->z_lock); 1221 ZFS_OBJ_HOLD_EXIT(zfsvfs, = obj_num); 1222 if (vp !=3D NULL) 1223 VN_RELE(vp); 1224 tsleep(zp, 0, "zcollide", 1); 1225 goto again; 1226 } 1227 *zpp =3D zp; 1228 err =3D 0; (kgdb) From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 16:30:42 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 7A532106566C for ; Tue, 18 Sep 2012 16:30:42 +0000 (UTC) (envelope-from bf1783@googlemail.com) Received: from mail-vc0-f182.google.com (mail-vc0-f182.google.com [209.85.220.182]) by mx1.freebsd.org (Postfix) with ESMTP id 2C39B8FC08 for ; Tue, 18 Sep 2012 16:30:41 +0000 (UTC) Received: by vcbfw7 with SMTP id fw7so61033vcb.13 for ; Tue, 18 Sep 2012 09:30:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=++XF6Fh2ZJnlFRBdZFpe6N9NzXUBMM+TSOzpafg2JRk=; b=qN5WcYYHUWktZRXAwx50/LRbYvp8RmZgyBA8Vd5r+JpK8mfEdrZWdD4uT/qcQ80f/y P9aWxT/E7iIT22ML+MbMp828JCvLfL9l5V3j+/KGO6yBM2+fXTtorcNLyUZ1iJwwQJD+ KaCz4x0Ju9Wk1863xy1BIjCwiYIemPNdRy4e5zlE0RwhsKjODec4Wb6EnHtUkQGeBkr/ t28FY2bjbfmFxtvsuxPbfViXi56V+GEas3S/SIg7ELrhC+FqtVSl0tFqsd4D0dZwT5ow dJZ7zh9EJw3tWbJ6GMnqwNMMX1T/M2hMwvj6/3Kb86JOhlKonfVlN0eCNAlA4kv0qbwv f4iA== MIME-Version: 1.0 Received: by 10.220.119.204 with SMTP id a12mr218109vcr.66.1347985840878; Tue, 18 Sep 2012 09:30:40 -0700 (PDT) Received: by 10.58.4.166 with HTTP; Tue, 18 Sep 2012 09:30:40 -0700 (PDT) In-Reply-To: <20120918084924.GY37286@deviant.kiev.zoral.com.ua> References: <20120917121925.GQ37286@deviant.kiev.zoral.com.ua> <20120917183654.GA13273@x2.osted.lan> <20120918084924.GY37286@deviant.kiev.zoral.com.ua> Date: Tue, 18 Sep 2012 12:30:40 -0400 Message-ID: From: "b. f." To: freebsd-fs@FreeBSD.org Content-Type: text/plain; charset=ISO-8859-1 Cc: Subject: Re: Problems after recent nullfs,vfs changes in 10.0-CURRENT X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: bf1783@gmail.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 16:30:42 -0000 The following deals with some problems exposed by r240283-5, particularly (but not only) when used with changes to tmpfs that were first proposed by kib@ on 21 June 2010 on this list, in a thread entitled "Tmpfs elimination of double copy": http://docs.freebsd.org/cgi/getmsg.cgi?fetch=20463+0+archive/2010/freebsd-fs/20100627.freebsd-fs On 9/18/12, Konstantin Belousov wrote: > On Mon, Sep 17, 2012 at 08:36:54PM +0200, Peter Holm wrote: >> On Mon, Sep 17, 2012 at 03:19:25PM +0300, Konstantin Belousov wrote: >> > Please mail fs@, possibly Cc:-ing me. >> > >> > On Mon, Sep 17, 2012 at 03:04:46AM -0400, b. f. wrote: >> > > The recent nullfs or vfs changes (r240283-5) have exposed some >> > > problems with my tinderbox. In this tinderbox, I've been using >> > > recent >> > > versions of -CURRENT with Gleb's tmpfs rbtree patch: >> > > >> > > http://people.freebsd.org/~gleb/tmpfs-nrbtree.1.patch >> > > >> > > and a merged version of your tmpfs single-buffer patch: >> > > >> > > http://people.freebsd.org/~kib/misc/tmpfs.12.patch >> > > >> > > The tinderbox performs builds in a tmpfs filesystem that is nullfs >> > > grafted to a ufs filesystem. After r240283-5, builds of >> > > ports/lang/ocaml failed when a cp(1) of an executable failed with >> > > ETXTBSY. After reverting r240285, the builds of ocaml succeeded. >> > > >> > > I've attached logs of the failed and successful builds. Can you >> > > guess >> > > whether the problem is solely due to the recent nullfs and vfs >> > > changes, or to some defect in Gleb's proposed changes, or to a >> > > problem >> > > with your proposed tmpfs change, or my merging of it? What further >> > > changes or tests would you suggest to help find the source of the >> > > problem? >> > > >> > > I've attached a diff of the relevant changes to the system sources >> > > used in the tinderbox, and logs of the successful (*.log) and >> > > unsuccessful (*.log.error) ocaml builds. >> > >> > Please show me the mount -v output, and specify which filesystems >> > are used where. The following is a typical layout for one run of the tinderbox (which is in /home/shared/freebsd/tinderbox): /dev/ufs/d1root on / (ufs, local, noatime, writes: sync 13 async 25, reads: sync 553 async 42, fsid 8aabfa4d68614a9f) devfs on /dev (devfs, local, fsid 00ff007171000000) tmpfs on /tmp (tmpfs, local, nosuid, fsid 01ff008787000000) /dev/ufs/d1var on /var (ufs, local, noatime, journaled soft-updates, writes: sync 15 async 269, reads: sync 664 async 12, fsid a5abfa4d331091c9) /dev/ufs/d1usr on /usr (ufs, local, noatime, journaled soft-updates, writes: sync 2 async 0, reads: sync 765 async 12, fsid b4abfa4d94c0f782) /dev/ufs/d1usrlocal on /usr/local (ufs, local, noatime, journaled soft-updates, writes: sync 32 async 298, reads: sync 2867 async 106, fsid c4abfa4d96ab4351) /dev/ufs/d1home on /home (ufs, local, noatime, journaled soft-updates, writes: sync 16 async 123, reads: sync 2065 async 268, fsid ceabfa4d9bb85870) the filesystem used for the port builds: /tmp/tinderbox/7.4-amd64-u1 on /home/shared/freebsd/tinderbox/7.4-amd64-u1 (nullfs, local, fsid 03ff002929000000) /home/shared/freebsd/ports/head on /home/shared/freebsd/tinderbox/7.4-amd64-u1/a/ports (nullfs, local, read-only, fsid 04ff002929000000) /home/shared/freebsd/tinderbox/jails/7.4-amd64/src on /home/shared/freebsd/tinderbox/7.4-amd64-u1/usr/src (nullfs, local, read-only, fsid 05ff002929000000) devfs on /home/shared/freebsd/tinderbox/7.4-amd64-u1/dev (devfs, local, fsid 06ff007171000000) /home/shared/freebsd/distfiles on /home/shared/freebsd/tinderbox/7.4-amd64-u1/distcache (nullfs, local, fsid 07ff002929000000) linprocfs on /home/shared/freebsd/tinderbox/7.4-amd64-u1/compat/linux/proc (linprocfs, local, fsid 08ff00b5b5000000) procfs on /home/shared/freebsd/tinderbox/7.4-amd64-u1/proc (procfs, local, fsid 09ff000202000000) >> > >> > The issue almost definitely is the held reference on the vm object. >> > Lets remove Gleb' patches from the picture at all. >> > >> > After rethinking VV_TEXT handling both for nullfs and tmpfs (patched), >> > I see two issues ATM: >> > >> > 1. VV_TEXT may be set either on the lower vnode, or on the nullfs >> > vnode. >> > So if you executed a file from nullfs alias, lower vnode does not get >> > VV_TEXT set, and executable can still be opened for write. >> > >> > 2. For tmpfs, the hack I added to clear VV_TEXT if swap vm object >> > reference >> > count == 1, is not called often enough. This allows to VV_TEXT to leak, >> > esp. >> > because nullfs after r240283 is not eager to reclaim its vnodes. >> > >> > I updated my branch with tmpfs patches with the following changes: >> > >> > 1. nullfs now bypasses the VV_TEXT set and clear operations to the >> > lower >> > vnode. >> > >> > 2. the tmpfs_clear_text() hack is removed, instead >> > vm_object_deallocate() >> > clears VV_TEXT on the tmpfs vnode if reference count goes to 1. >> > >> > Updated patch is at >> > http://people.freebsd.org/~kib/misc/tmpfs.13.patch >> > I tested it very lightly, so to say. >> >> I see the problem on a pristine r240611. Test scenario included. >> >> + mdconfig -a -t swap -s 1g -u 5 >> + bsdlabel -w md5 auto >> + newfs -U md5a >> + mount /dev/md5a /mnt2 >> + chmod 777 /mnt2 >> + mount >> + grep /mnt >> + grep -q tmpfs >> + mount -t tmpfs tmpfs /mnt >> + chmod 777 /mnt >> + mkdir /mnt2/mp >> + mount -t nullfs /mnt /mnt2/mp >> + cp /usr/bin/true /mnt2/mp/true >> + /mnt/true >> + >> + rm -f /mnt/true >> + cp /usr/bin/true /mnt2/mp/true >> + /mnt2/mp/true >> + >> ./nullfs12.sh: cannot create /mnt2/mp/true: Text file busy >> + echo FAIL 2 >> FAIL 2 >> + mount >> + egrep 'tmpfs|nullfs|/mnt |/mnt2 ' >> /dev/md5a on /mnt2 (ufs, local, soft-updates) >> tmpfs on /mnt (tmpfs, NFS exported, local) >> /mnt on /mnt2/mp (nullfs, local) >> + rm -f /mnt2/mp/true > > Yes, this is very close if not identical to the only test which I performed > with the tmpfs.13.patch. > I can no longer reproduce the port build failures on r240651 amd64 after applying your tmpfs.13.patch, and I haven't encountered any other obvious problems in the short time that I've been using it. I did not rerun Peter Holm's nullfs12.sh test, since you had already subjected your patch to a similar test. Regards, b. From owner-freebsd-fs@FreeBSD.ORG Tue Sep 18 23:14:05 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 485D9106564A; Tue, 18 Sep 2012 23:14:05 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id B52898FC0C; Tue, 18 Sep 2012 23:14:04 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EACz/WFCDaFvO/2dsb2JhbAA+BxaFc7dCgiABAQUjBFIbGAICDRkCWQYTiAALp0WTDIEhiXohhT2BEgOVY4EUjw2DAoE+Ihs X-IronPort-AV: E=Sophos;i="4.80,445,1344225600"; d="scan'208";a="179618068" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 18 Sep 2012 19:12:54 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id EAB29B4012; Tue, 18 Sep 2012 19:12:54 -0400 (EDT) Date: Tue, 18 Sep 2012 19:12:54 -0400 (EDT) From: Rick Macklem To: "Justin T. Gibbs" Message-ID: <2050472507.821722.1348009974939.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: FS List , Will Andrews Subject: Re: testing/review of atomic export update patch X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Sep 2012 23:14:05 -0000 Justin T. Gibbs wrote: > On Sep 16, 2012, at 3:41 PM, Rick Macklem > wrote: > > > Hi, > > > > There is a simple patch at: > > http://people.freebsd.org/~rmacklem/atomic-export.patch > > that can be applied to a kernel + mountd, so that the new > > nfsd can be suspended by mountd while the exports are being > > reloaded. It adds a new "-S" flag to mountd to enable this. > > (This avoids the long standing bug where clients receive ESTALE > > replies to RPCs while mountd is reloading exports.) > > At Spectra, we are successfully using the NFSE patch set from > nfse.sourceforge.net (FreeBSD PR 136865). It addresses > the ESTALE problem in addition to cleaning up several aspects > of exports processing. > > Have you reviewed the NFSE work? Do you have any issues > or concerns with it? What is the right path for getting NFSE > integrated into FreeBSD? > I, personally, have not found the time to review it. As such, I can't state specifics, however there have been concerns w.r.t. a switch from mountd->nfse resulting in different behaviour when used with the same /etc/exports file used for mountd. Some questions that need to be answered w.r.t. nfse, which I haven't had the time to do: - Are the differences listed here significant enough for a change to be considered a POLA violation? http://nfse.sourceforge.net/COMPATIBILITY - If the server mount point is /sub1 and the only line referring to this server volume in /etc/exports looks like: /sub1/sub2 client.net Does the following mount command work on client.net # mount -t nfs -o nfsv3 server.net:/sub1 /mnt when nfse is run with -C using the /etc/exports file? (If this mount works, many would consider this a POLA violation.) This is typically referred to as an "administrative control", since it is only enforced by mountd for the Mount protocol, but is considered an important feature by some (rwatson@ expressed a desire/need for it). - Does the nfse patch handle exporting of all file systems types and, in particular, the `zfs share` case. Beyond that, someone with the time to shepherd it into head as a mountd replacement. (I`ll admit I`m mainly interested in NFSv4.1 these days and proposed the simple patch because I do not have the time to look at nfse seriously and figured it might be sufficient to keep people happy.) rick > -- > Justin From owner-freebsd-fs@FreeBSD.ORG Wed Sep 19 00:37:57 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E4671106566B; Wed, 19 Sep 2012 00:37:57 +0000 (UTC) (envelope-from thomas@gibfest.dk) Received: from mail.tyknet.dk (mail.tyknet.dk [IPv6:2a01:4f8:141:52a3:186::]) by mx1.freebsd.org (Postfix) with ESMTP id 72B7B8FC0C; Wed, 19 Sep 2012 00:37:57 +0000 (UTC) Received: from [10.10.1.100] (unknown [217.71.4.82]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.tyknet.dk (Postfix) with ESMTPSA id 8A63410287D; Wed, 19 Sep 2012 02:37:48 +0200 (CEST) X-DKIM: OpenDKIM Filter v2.5.2 mail.tyknet.dk 8A63410287D DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=gibfest.dk; s=default; t=1348015068; bh=eeTVbCgeGbGPrHP5HH2Fvyd496gfMCEq+iyJGlaX0FM=; h=Date:From:To:CC:Subject:References:In-Reply-To; b=bbxPfpKTvBFbIqknmBOi5Nyjz+eJsjY49g8mIc6OjnKGPHyGd/lkR5VTx4Rec9PCB 5GOyPbndoGh12bN/o3bx44LS4WEveYGKnYmuXK9ZmhIfby0REqOPfc677X9cuwXR6m KzY6z/KeLwHtnNPkse+38amM3nLUcVFKpbHyBG/A= Message-ID: <505913DB.1060200@gibfest.dk> Date: Wed, 19 Sep 2012 02:37:47 +0200 From: Thomas Steen Rasmussen User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:12.0) Gecko/20120604 Thunderbird/12.0.1 MIME-Version: 1.0 To: Glen Barber References: <50438BF5.8030004@gibfest.dk> <5043B0CB.8040907@gibfest.dk> <20120902193100.GG1266@glenbarber.us> <5043C9A5.5070409@FreeBSD.org> <20120902213425.GA1507@glenbarber.us> <5043D6E7.8090308@FreeBSD.org> <5043DA77.6000304@gibfest.dk> <20120903031502.GH1507@glenbarber.us> In-Reply-To: <20120903031502.GH1507@glenbarber.us> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: zfs send -r missing - but documented in zfs(8) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Sep 2012 00:37:58 -0000 On 03.09.2012 05:15, Glen Barber wrote: > > Right now, I'd say a FreeBSD PR is not necessary since this is an > upstream bug as well. Hello Glen and list, A couple of weeks has passed with no news from upstream, and since these things tend to take a bit of time, I believe it would be best to remove references to -r until we actually have code to support it. I've opened up http://www.freebsd.org/cgi/query-pr.cgi?pr=171761: misc/171761: Small patch to (temporarily) remove -r from zfs send usage and zfs(8) with a patch to do just that. Just an FYI :) Best regards, Thomas Steen Rasmussen From owner-freebsd-fs@FreeBSD.ORG Wed Sep 19 02:48:06 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CD2BC106564A; Wed, 19 Sep 2012 02:48:06 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id 3B5ED8FC0C; Wed, 19 Sep 2012 02:48:05 +0000 (UTC) Received: by lbbgg13 with SMTP id gg13so584307lbb.13 for ; Tue, 18 Sep 2012 19:48:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; bh=FomcIhMz1egk12lxRcHB3pA+3pDeqlG3uJoGLlPwzkc=; b=XPOPKlTtbjzDdE/LMGiLiPVZW2Bx7+SlWwm2T/VyAa5Ph9iGiYLVBVtu/Iad4BHSkf bpxQeFx0WwvyRh4MkTA6IwPX+TIGCZrkphZXdLuWmRGYDRIypxlobzSrCIJGHLbM5fOb iM0CG1Et1RhDU/YmP8SQcM1cq58wpQk/Myb1ip8c9BD5sirZWuSBj6fgH/n17QenZU4q 2PWvi/UXX4Yhco98LfEf8HTdvWgokph5rMyBwfOAvUAkB+VG7n09woiD8q8D6fPsvx+2 rN/jPbCIZMzSX4kmQCJ9yJ4hpD6+vFQgBOYJeSz04cR0d9I+bY/eFU4W3S6nX9Js3bfj KFAA== MIME-Version: 1.0 Received: by 10.152.48.70 with SMTP id j6mr1342704lan.57.1348022883954; Tue, 18 Sep 2012 19:48:03 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.112.102.39 with HTTP; Tue, 18 Sep 2012 19:48:03 -0700 (PDT) In-Reply-To: <20120917140055.GA9037@x2.osted.lan> References: <20120829060158.GA38721@x2.osted.lan> <20120831052003.GA91340@x2.osted.lan> <20120905201531.GA54452@x2.osted.lan> <20120917140055.GA9037@x2.osted.lan> Date: Wed, 19 Sep 2012 03:48:03 +0100 X-Google-Sender-Auth: 9xVWoK3IL0t4skUVBDmQF81AApc Message-ID: From: Attilio Rao To: FreeBSD FS , freebsd-current@freebsd.org, Peter Holm , =?UTF-8?Q?Gustau_P=C3=A9rez?= , George Neville-Neil , Florian Smeets , bdrewery@freebsd.org Content-Type: text/plain; charset=UTF-8 Cc: Subject: Re: MPSAFE VFS -- List of upcoming actions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: attilio@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Sep 2012 02:48:06 -0000 On Fri, Jul 13, 2012 at 12:18 AM, Attilio Rao wrote: > 2012/7/4 Attilio Rao : >> 2012/6/29 Attilio Rao : >>> As already published several times, according to the following plan: >>> http://wiki.freebsd.org/NONMPSAFE_DEORBIT_VFS >>> >> >> I still haven't heard from Vivien or Edward, anyway as NTFS is >> basically only used RO these days (also the mount_ntfs code just >> permits RO mounting) I stripped all the uncomplete/bogus write support >> with the following patch: >> http://www.freebsd.org/~attilio/ntfs_remove_write.patch >> >> This is an attempt to make the code smaller and possibly just focus on >> the locking that really matter (as read-only filesystem). >> On some points of the patch I'm a bit less sure as we could easily >> take into account also write for things like vaccess() arguments, and >> make easier to re-add correct write support at some point in the >> future, but still force RO, even if the approach used in the patch is >> more correct IMHO. >> As an added bonus this patch cleans some dirty code in the mount >> operation and fixes a bug as vfs_mountedfrom() is called before real >> mounting is completed and can still fail. > > A quick update on this. > It looks like NTFS won't be completed for this GSoC thus I seriously > need to find an alternative to not loose the NTFS support entirely. > > I tried to look into the NTFS implementation right now and it is > really a poor support. As Peter has also verified, it can deadlock in > no-time, it compeltely violates VFS rules, etc. IMHO it deserves a > complete rewrite if we would still support in-kernel NTFS. I also > tried to look at the NetBSD implementation. Their code is someway > similar to our, but they used very complicated (and very dirty) code > to do the locking. Even if I don't know well enough NetBSD VFS, I have > the impression not all the races are correctly handled. Definitively, > not something I would like to port. > > Considering all that the only viable option would be meaning an > userland filesystem implementation. My preferred choice would be to > import PUFFS and librefuse on top of it but honestly it requires a lot > of time to be completed, time which I don't currently have as in 2 > months Giant must be gone by the VFS. > > I then decided to switch to gnn's rewamp of FUSE patches. You can find > his initial e-mail here: > http://lists.freebsd.org/pipermail/freebsd-fs/2012-March/013876.html > > I've precisely got the second version of George's patch and created > this dolphin branch: > svn://svn.freebsd.org/base/projects/fuse > > I'm fixing low hanging fruit for the moment (see r238411 for example) > and I still have to make a throughful review. > However my idea is to commit the support once: > - ntfs-3g is well stress-tested and proves to be bug-free > - there is no major/big technical issue pending after the reviews In the last weeks Peter, Florian, Gustau and I have been working in stabilizing fuse support. In the specific, Peter has worked hard on producing several utilities to nit stress-test fuse and in particular ntfs, Florian has improved fuse related ports (as explained later) and Gustau has done sparse testing. I feel moderately satisfied by the level of stability of fuse now to propose to wider usage, in particular given the huge amount of complaints I'm hearing around about occasional fuse users. The final target of the project is to completely import into base the content of fusefs-kmod starting from earlier posted patches by George. So far, we took care only of importing in the fuse branch the kernel part, so that fusefs-kmod userland part is still needed to be installed from ports, but I was studying the mount_fusefs licensing before to process with the import for the userland bits of it. The fixing has been happening here: svn://svn.freebsd.org/base/projects/fuse/ which is essentially an HEAD branch + fuse kernel components. In order to get fuse, please compile a kernel from this branch with FUSE option or simply build and load fuse module. Alternatively, a kernel patch that should work with HEAD@240684 is here: http://www.freebsd.org/~attilio/fuse_import/fuse_240684.patch I guess the patch can easilly apply to all FreeBSD branches, really, but it is not tested to anything else different then -CURRENT. As said you still need currently to build fusefs-kmod port. However you need these further patches, to be put in the fusefs-kmod/files/ directory:: http://www.freebsd.org/~attilio/fuse_import/patch-Makefile http://www.freebsd.org/~attilio/fuse_import/patch-mount_fusefs__mount_fusefs2.c They both disable the old kernel building/linking and import new functionality to let the new kernel support work well in presence of many consumers. In addition to fusefs-kmod, Bryan and Florian have also updated fusefs-lib and fusefs-ntfs ports. For instance, please refer to this e-mail: http://lists.freebsd.org/pipermail/freebsd-ports/2012-August/077950.html Even if this work is someway independent by the fusefs-kmod import, I warmly suggest to all of you to use their patches (and this what we have been testing so far too). At this point what I'm looking for are reviews and further testing. I would like to spend some words on what you should expect from this work: *Fuse is far from being perfect*. I cannot stress this enough. Peter stress-tests could break also Fuse on Linux generally and by Fuse authors admissions the modules can never guarantee to be completely starvation-free. However, they tend to be designed in a way that sleeps can be at least interrupted easily, making at least easy to recover from deadlocks. This is mostly retained also in FreeBSD, for what I can tell. Also, sometimes fuse seems to leave a small amount of hidden files, when it find references on files it wants to delete. This happens also under Linux and it is part of FUSE design, not much we can do. However, if deadlocks can be someway tollerated, things you should really pay attention are dumps of fuse modules (like ntfs-3g binary) and kernel panics. They must not happen and if they do they need to be fixed promptly. However, the good new is that ntfs seems doing exceptionally good. Florian could use ntfs as a backend for postgresql test. I think this is by far a big improvement if compared to current in-kernel ntfs which is completely torned. So far we have almost entirely tested only ntfs-3g. I know Gustau also used other modules like sshfs and George used GlusterFS with his older patches, but I encourage you to test as many modules as you want, as they may expose different bugs. Of course, I don't plan to spend much more time on FUSE, but I can occasionally look at bugs as they fall in the filesystems category and I'm always interested in keeping a good open eye on such issues. A few operational informations: - In the next days I will import the userland bits of fusefs-kmod to the fuse project branch making the port obsolete. When this happens I will make this clear to the user of this thread. - If no major bug is remained by the early October, I will commit this to -CURRENT - I expect Bryan and Florian to commit libfuse and ntfs updates soon. They can do independently from the fusefs-kmod retiral, but I would prefer their patches to go on first. - After that I will handover fusefs maintainership to gnn as agreed in precedence but I will be around helping with analysis and fixing, depending on time availability In the end I have really 2 minor questions: - One is about importing the mount_fusefs userland bits. I don't think we need a vendor import at all because they were developed by a FreeBSD GSoC student and kept in his git repo (or someone else's). Anyway, i'd just commit as new files once I do a good sweep. I hope nobody objects to that. - Another one is: fusefs-kmod right now is only amd64/i386 specific. I have no idea why as it has not any MD specific code. However I'm sure it has not been tested on other arches so far. Anyway I left it usable by all the arches. I think this is the correct choice. If someone objects with valid argument I can bring it back to be usable only on i386 and amd64. That's all, for any question please don't hesitate to contact me and the other people involved in this work. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein From owner-freebsd-fs@FreeBSD.ORG Wed Sep 19 03:47:38 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B5770106566C; Wed, 19 Sep 2012 03:47:38 +0000 (UTC) (envelope-from kob6558@gmail.com) Received: from mail-wi0-f172.google.com (mail-wi0-f172.google.com [209.85.212.172]) by mx1.freebsd.org (Postfix) with ESMTP id 24C498FC12; Wed, 19 Sep 2012 03:47:36 +0000 (UTC) Received: by wibhi8 with SMTP id hi8so3896450wib.13 for ; Tue, 18 Sep 2012 20:47:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=AIwLTTx0DDhUOJDKhVusSQJ4ffJAKBEZjAkeOSnIBJU=; b=j/9jxSo/I5kO8V5zsWPs2IyQuzOdklv+QXl4xEGqRHB/63lfuL3MW8KC31hxamVFHu yrCtyB8divYT3Oxo2qcHHDnofFnWUxbnl7qWzqqRbOpXWwN7auM4quP492WoWYtbb3LD /hH4CjmYW6FbqR9K1RqI1/FphJLjc75uytVSnJJAfabUuvjr7UWJr0khXKcyZb5wI162 u9CkO0llpDwhN6medh8a9+GQxuxMjNRbF1BFq6vSyWK5beJIIdRs7oLK1uvbMO2S4UT+ aBVqOqz8pGucNtLKs1/QC3L61aMhI94M+/CDjclQXtcuHcK/119w5pHDaKF0Qjndkg5Z Sefw== MIME-Version: 1.0 Received: by 10.180.83.66 with SMTP id o2mr3680771wiy.14.1348026455765; Tue, 18 Sep 2012 20:47:35 -0700 (PDT) Received: by 10.223.151.130 with HTTP; Tue, 18 Sep 2012 20:47:35 -0700 (PDT) In-Reply-To: References: <20120829060158.GA38721@x2.osted.lan> <20120831052003.GA91340@x2.osted.lan> <20120905201531.GA54452@x2.osted.lan> <20120917140055.GA9037@x2.osted.lan> Date: Tue, 18 Sep 2012 20:47:35 -0700 Message-ID: From: Kevin Oberman To: attilio@freebsd.org Content-Type: text/plain; charset=UTF-8 Cc: Peter Holm , bdrewery@freebsd.org, FreeBSD FS , freebsd-current@freebsd.org Subject: Re: MPSAFE VFS -- List of upcoming actions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Sep 2012 03:47:38 -0000 On Tue, Sep 18, 2012 at 7:48 PM, Attilio Rao wrote: > On Fri, Jul 13, 2012 at 12:18 AM, Attilio Rao wrote: >> 2012/7/4 Attilio Rao : >>> 2012/6/29 Attilio Rao : >>>> As already published several times, according to the following plan: >>>> http://wiki.freebsd.org/NONMPSAFE_DEORBIT_VFS >>>> >>> >>> I still haven't heard from Vivien or Edward, anyway as NTFS is >>> basically only used RO these days (also the mount_ntfs code just >>> permits RO mounting) I stripped all the uncomplete/bogus write support >>> with the following patch: >>> http://www.freebsd.org/~attilio/ntfs_remove_write.patch >>> >>> This is an attempt to make the code smaller and possibly just focus on >>> the locking that really matter (as read-only filesystem). >>> On some points of the patch I'm a bit less sure as we could easily >>> take into account also write for things like vaccess() arguments, and >>> make easier to re-add correct write support at some point in the >>> future, but still force RO, even if the approach used in the patch is >>> more correct IMHO. >>> As an added bonus this patch cleans some dirty code in the mount >>> operation and fixes a bug as vfs_mountedfrom() is called before real >>> mounting is completed and can still fail. >> >> A quick update on this. >> It looks like NTFS won't be completed for this GSoC thus I seriously >> need to find an alternative to not loose the NTFS support entirely. >> >> I tried to look into the NTFS implementation right now and it is >> really a poor support. As Peter has also verified, it can deadlock in >> no-time, it compeltely violates VFS rules, etc. IMHO it deserves a >> complete rewrite if we would still support in-kernel NTFS. I also >> tried to look at the NetBSD implementation. Their code is someway >> similar to our, but they used very complicated (and very dirty) code >> to do the locking. Even if I don't know well enough NetBSD VFS, I have >> the impression not all the races are correctly handled. Definitively, >> not something I would like to port. >> >> Considering all that the only viable option would be meaning an >> userland filesystem implementation. My preferred choice would be to >> import PUFFS and librefuse on top of it but honestly it requires a lot >> of time to be completed, time which I don't currently have as in 2 >> months Giant must be gone by the VFS. >> >> I then decided to switch to gnn's rewamp of FUSE patches. You can find >> his initial e-mail here: >> http://lists.freebsd.org/pipermail/freebsd-fs/2012-March/013876.html >> >> I've precisely got the second version of George's patch and created >> this dolphin branch: >> svn://svn.freebsd.org/base/projects/fuse >> >> I'm fixing low hanging fruit for the moment (see r238411 for example) >> and I still have to make a throughful review. >> However my idea is to commit the support once: >> - ntfs-3g is well stress-tested and proves to be bug-free >> - there is no major/big technical issue pending after the reviews > > In the last weeks Peter, Florian, Gustau and I have been working in > stabilizing fuse support. In the specific, Peter has worked hard on > producing several utilities to nit stress-test fuse and in particular > ntfs, Florian has improved fuse related ports (as explained later) and > Gustau has done sparse testing. I feel moderately satisfied by the > level of stability of fuse now to propose to wider usage, in > particular given the huge amount of complaints I'm hearing around > about occasional fuse users. > > The final target of the project is to completely import into base the > content of fusefs-kmod starting from earlier posted patches by George. > So far, we took care only of importing in the fuse branch the kernel > part, so that fusefs-kmod userland part is still needed to be > installed from ports, but I was studying the mount_fusefs licensing > before to process with the import for the userland bits of it. > > The fixing has been happening here: > svn://svn.freebsd.org/base/projects/fuse/ > > which is essentially an HEAD branch + fuse kernel components. In order > to get fuse, please compile a kernel from this branch with FUSE option > or simply build and load fuse module. > Alternatively, a kernel patch that should work with HEAD@240684 is here: > http://www.freebsd.org/~attilio/fuse_import/fuse_240684.patch > > I guess the patch can easilly apply to all FreeBSD branches, really, > but it is not tested to anything else different then -CURRENT. > > As said you still need currently to build fusefs-kmod port. However > you need these further patches, to be put in the fusefs-kmod/files/ > directory:: > http://www.freebsd.org/~attilio/fuse_import/patch-Makefile > http://www.freebsd.org/~attilio/fuse_import/patch-mount_fusefs__mount_fusefs2.c > > They both disable the old kernel building/linking and import new > functionality to let the new kernel support work well in presence of > many consumers. > > In addition to fusefs-kmod, Bryan and Florian have also updated > fusefs-lib and fusefs-ntfs ports. For instance, please refer to this > e-mail: > http://lists.freebsd.org/pipermail/freebsd-ports/2012-August/077950.html > > Even if this work is someway independent by the fusefs-kmod import, I > warmly suggest to all of you to use their patches (and this what we > have been testing so far too). > > At this point what I'm looking for are reviews and further testing. > I would like to spend some words on what you should expect from this work: > *Fuse is far from being perfect*. > I cannot stress this enough. Peter stress-tests could break also Fuse > on Linux generally and by Fuse authors admissions the modules can > never guarantee to be completely starvation-free. However, they tend > to be designed in a way that sleeps can be at least interrupted > easily, making at least easy to recover from deadlocks. This is mostly > retained also in FreeBSD, for what I can tell. Also, sometimes fuse > seems to leave a small amount of hidden files, when it find references > on files it wants to delete. This happens also under Linux and it is > part of FUSE design, not much we can do. > However, if deadlocks can be someway tollerated, things you should > really pay attention are dumps of fuse modules (like ntfs-3g binary) > and kernel panics. They must not happen and if they do they need to be > fixed promptly. > However, the good new is that ntfs seems doing exceptionally good. > Florian could use ntfs as a backend for postgresql test. I think this > is by far a big improvement if compared to current in-kernel ntfs > which is completely torned. > > So far we have almost entirely tested only ntfs-3g. I know Gustau also > used other modules like sshfs and George used GlusterFS with his older > patches, but I encourage you to test as many modules as you want, as > they may expose different bugs. Of course, I don't plan to spend much > more time on FUSE, but I can occasionally look at bugs as they fall in > the filesystems category and I'm always interested in keeping a good > open eye on such issues. > > A few operational informations: > - In the next days I will import the userland bits of fusefs-kmod to > the fuse project branch making the port obsolete. When this happens I > will make this clear to the user of this thread. > - If no major bug is remained by the early October, I will commit this > to -CURRENT > - I expect Bryan and Florian to commit libfuse and ntfs updates soon. > They can do independently from the fusefs-kmod retiral, but I would > prefer their patches to go on first. > - After that I will handover fusefs maintainership to gnn as agreed in > precedence but I will be around helping with analysis and fixing, > depending on time availability > > In the end I have really 2 minor questions: > - One is about importing the mount_fusefs userland bits. I don't think > we need a vendor import at all because they were developed by a > FreeBSD GSoC student and kept in his git repo (or someone else's). > Anyway, i'd just commit as new files once I do a good sweep. I hope > nobody objects to that. > - Another one is: fusefs-kmod right now is only amd64/i386 specific. I > have no idea why as it has not any MD specific code. However I'm sure > it has not been tested on other arches so far. Anyway I left it usable > by all the arches. I think this is the correct choice. If someone > objects with valid argument I can bring it back to be usable only on > i386 and amd64. > > That's all, for any question please don't hesitate to contact me and > the other people involved in this work. Attilio (and the crew), Thanks for working on fusefs-ntfs. It's been increasingly worrying to me that we might lose it and I really depend on it. I really hope to be able to use rsync to update files without killing my system some day. I tried the new fusefs-libs and fusefs-ntfs ports from Florian and Bryan, but ran into trouble as I could no longer build the kmod after installing the updated fusefs-libs. It had an unresolved symbol: cc -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -nostdinc -I../include -I. -I@ -I@/contrib/altq -finline-limit=8000 --param inline-unit-growth=100 --param large-function-growth=1000 -fno-common -fno-omit-frame-pointer -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float -fno-asynchronous-unwind-tables -ffreestanding -fstack-protector -std=iso9899:1999 -fstack-protector -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef -Wno-pointer-sign -fformat-extensions -Wmissing-include-dirs -fdiagnostics-show-option -c fuse_vnops.c fuse_vnops.c: In function 'create_filehandle': fuse_vnops.c:1586: error: 'struct fuse_open_in' has no member named 'mode' *** [fuse_vnops.o] Error code 1 This was on amd64 9-Stable r239879 until/unless this issue is resolved, please keep the existing port available and/or mark the new one to not install on pre-10 systems. -- R. Kevin Oberman, Network Engineer E-mail: kob6558@gmail.com From owner-freebsd-fs@FreeBSD.ORG Wed Sep 19 06:17:05 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 63988106566C for ; Wed, 19 Sep 2012 06:17:05 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id D676F8FC15 for ; Wed, 19 Sep 2012 06:17:04 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q8J6HCUA003288; Wed, 19 Sep 2012 09:17:12 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q8J6H0sA042206; Wed, 19 Sep 2012 09:17:00 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q8J6Gxrf042205; Wed, 19 Sep 2012 09:16:59 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 19 Sep 2012 09:16:59 +0300 From: Konstantin Belousov To: Rick Macklem Message-ID: <20120919061659.GS37286@deviant.kiev.zoral.com.ua> References: <20120918085941.GZ37286@deviant.kiev.zoral.com.ua> <21418398.765673.1347975294365.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="2CmXnuBWhlSJqbcw" Content-Disposition: inline In-Reply-To: <21418398.765673.1347975294365.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Mutt/1.5.21 (2010-09-15) X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: FS List Subject: Re: testing/review of atomic export update patch X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Sep 2012 06:17:05 -0000 --2CmXnuBWhlSJqbcw Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Sep 18, 2012 at 09:34:54AM -0400, Rick Macklem wrote: > Konstantin Belousov wrote: > > On Mon, Sep 17, 2012 at 05:32:44PM -0400, Rick Macklem wrote: > > > Konstantin Belousov wrote: > > > > On Sun, Sep 16, 2012 at 05:41:25PM -0400, Rick Macklem wrote: > > > > > Hi, > > > > > > > > > > There is a simple patch at: > > > > > http://people.freebsd.org/~rmacklem/atomic-export.patch > > > > > that can be applied to a kernel + mountd, so that the new > > > > > nfsd can be suspended by mountd while the exports are being > > > > > reloaded. It adds a new "-S" flag to mountd to enable this. > > > > > (This avoids the long standing bug where clients receive ESTALE > > > > > replies to RPCs while mountd is reloading exports.) > > > > > > > > This looks simple, but also somewhat worrisome. What would happen > > > > if the mountd crashes after nfsd suspension is requested, but > > > > before > > > > resume was performed ? > > > > > > > > Might be, mountd should check for suspended nfsd on start and > > > > unsuspend > > > > it, if some flag is specified ? > > > Well, I think that happens with the patch as it stands. > > > > > > suspend is done if the "-S" option is specified, but that is a no op > > > if it is already suspended. The resume is done no matter what flags > > > are provided, so mountd will always try and do a "resume". > > > --> get_exportlist() is always called when mountd is started up and > > > it does the resume unconditionally when it completes. > > > If mountd repeatedly crashes before completing get_exportlist() > > > when it is started up, the exports will be all messed up, so > > > having the nfsd threads suspended doesn't seem so bad for this > > > case (which hopefully never happens;-). > > > > > > Both suspend and resume are just no ops for unpatched kernels. > > > > > > Maybe the comment in front of "resume" should explicitly explain > > > this, instead of saying resume is harmless to do under all > > > conditions? > > > > > > Thanks for looking at it, rick > > I see. > >=20 > > My another note is that there is no any protection against parallel > > instances of suspend/resume happen. For instance, one thread could set > > suspend_nfsd =3D 1 and be descheduled, while another executes resume > > code sequence meantime. Then it would see suspend_nfsd !=3D 0, while > > nfsv4rootfs_lock not held, and tries to unlock it. It seems that > > nfsv4_unlock would silently exit. The suspending thread resumes, > > and obtains the lock. You end up with suspend_nfsd =3D=3D 0 but lock he= ld. > Yes. I had assumed that mountd would be the only thing using these syscal= ls > and it is single threaded. (The syscalls can only be done by root for the > obvious reasons.;-) >=20 > Maybe the following untested version of the syscalls would be better, sin= ce > they would allow multiple concurrent calls to either suspend or resume. > (There would still be an indeterminate case if one thread called resume > concurrently with another few calling suspend, but that is unavoidable, > I think?) >=20 > Again, thanks for the comments, rick > --- untested version of syscalls --- > } else if ((uap->flag & NFSSVC_SUSPENDNFSD) !=3D 0) { > NFSLOCKV4ROOTMUTEX(); > if (suspend_nfsd =3D=3D 0) { > /* Lock out all nfsd threads */ > igotlock =3D 0; > while (igotlock =3D=3D 0 && suspend_nfsd =3D=3D 0) { > igotlock =3D nfsv4_lock(&nfsv4rootfs_lock, 1, > NULL, NFSV4ROOTLOCKMUTEXPTR, NULL); > } > suspend_nfsd =3D 1; > } > NFSUNLOCKV4ROOTMUTEX(); > error =3D 0; > } else if ((uap->flag & NFSSVC_RESUMENFSD) !=3D 0) { > NFSLOCKV4ROOTMUTEX(); > if (suspend_nfsd !=3D 0) { > nfsv4_unlock(&nfsv4rootfs_lock, 0); > suspend_nfsd =3D 0; > } > NFSUNLOCKV4ROOTMUTEX(); > error =3D 0; > } =46rom the cursory look, this variant is an improvement, mostly by taking the interlock before testing suspend_nfsd, and using the while loop. Is it possible to also make the sleep for the lock interruptible ? So that blocked mountd could be killed by a signal ? --2CmXnuBWhlSJqbcw Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAlBZY1sACgkQC3+MBN1Mb4jzjgCfVE5TsFuaN7NItix9xLNCMjam eKkAn2IsumdW+ckxb4xAGXZorptD5njG =J1Hs -----END PGP SIGNATURE----- --2CmXnuBWhlSJqbcw-- From owner-freebsd-fs@FreeBSD.ORG Wed Sep 19 07:30:14 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 1CFF1106564A; Wed, 19 Sep 2012 07:30:14 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-lpp01m010-f54.google.com (mail-lpp01m010-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id 755DD8FC0C; Wed, 19 Sep 2012 07:30:12 +0000 (UTC) Received: by lahe6 with SMTP id e6so410484lah.13 for ; Wed, 19 Sep 2012 00:30:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=/L2gqn9cpJ29eGUEFoYpM+7cUjSo/qtgHi48XJNYoIw=; b=ybHEeBhUGiL9VFpmOGdxlL5HLbf0nTQbDflRirEJJ+2ZdB29FRS2YgKYUDW2m1PGvX L4E+jWQ0cn19DdrXzwu0wLW3bXM+KbCP0XNYIHWFDFtMNd1GrvemWeptSs86B0MrObzG JnWzfi/1VyNx7Hkm5e7v3SCpSa5g2iNfPCW86+7E4Dkktitt/FyJDcSqQiD3YD/OSr6w hRlnVs+rAZs38RZ7ZCR3tsM5PrI6UZtOMTVcCBOJ8ao55io8/ecujpb8aQ1zQQtG4SYt nN4q1Kr/G6EBWgaDuwufEHC99VFaRq4zCZtsoDODfXSQxNI2lVM2EV70PYef+BvcvpfP fcDg== MIME-Version: 1.0 Received: by 10.152.131.68 with SMTP id ok4mr1888249lab.47.1348039811094; Wed, 19 Sep 2012 00:30:11 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.112.102.39 with HTTP; Wed, 19 Sep 2012 00:30:11 -0700 (PDT) In-Reply-To: References: <20120829060158.GA38721@x2.osted.lan> <20120831052003.GA91340@x2.osted.lan> <20120905201531.GA54452@x2.osted.lan> <20120917140055.GA9037@x2.osted.lan> Date: Wed, 19 Sep 2012 08:30:11 +0100 X-Google-Sender-Auth: E7omFsgrPjQO55RBiG5X8WqrrcY Message-ID: From: Attilio Rao To: Kevin Oberman Content-Type: text/plain; charset=UTF-8 Cc: Peter Holm , bdrewery@freebsd.org, FreeBSD FS , freebsd-current@freebsd.org Subject: Re: MPSAFE VFS -- List of upcoming actions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: attilio@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Sep 2012 07:30:14 -0000 On Wed, Sep 19, 2012 at 4:47 AM, Kevin Oberman wrote: > On Tue, Sep 18, 2012 at 7:48 PM, Attilio Rao wrote: >> On Fri, Jul 13, 2012 at 12:18 AM, Attilio Rao wrote: >>> 2012/7/4 Attilio Rao : >>>> 2012/6/29 Attilio Rao : >>>>> As already published several times, according to the following plan: >>>>> http://wiki.freebsd.org/NONMPSAFE_DEORBIT_VFS >>>>> >>>> >>>> I still haven't heard from Vivien or Edward, anyway as NTFS is >>>> basically only used RO these days (also the mount_ntfs code just >>>> permits RO mounting) I stripped all the uncomplete/bogus write support >>>> with the following patch: >>>> http://www.freebsd.org/~attilio/ntfs_remove_write.patch >>>> >>>> This is an attempt to make the code smaller and possibly just focus on >>>> the locking that really matter (as read-only filesystem). >>>> On some points of the patch I'm a bit less sure as we could easily >>>> take into account also write for things like vaccess() arguments, and >>>> make easier to re-add correct write support at some point in the >>>> future, but still force RO, even if the approach used in the patch is >>>> more correct IMHO. >>>> As an added bonus this patch cleans some dirty code in the mount >>>> operation and fixes a bug as vfs_mountedfrom() is called before real >>>> mounting is completed and can still fail. >>> >>> A quick update on this. >>> It looks like NTFS won't be completed for this GSoC thus I seriously >>> need to find an alternative to not loose the NTFS support entirely. >>> >>> I tried to look into the NTFS implementation right now and it is >>> really a poor support. As Peter has also verified, it can deadlock in >>> no-time, it compeltely violates VFS rules, etc. IMHO it deserves a >>> complete rewrite if we would still support in-kernel NTFS. I also >>> tried to look at the NetBSD implementation. Their code is someway >>> similar to our, but they used very complicated (and very dirty) code >>> to do the locking. Even if I don't know well enough NetBSD VFS, I have >>> the impression not all the races are correctly handled. Definitively, >>> not something I would like to port. >>> >>> Considering all that the only viable option would be meaning an >>> userland filesystem implementation. My preferred choice would be to >>> import PUFFS and librefuse on top of it but honestly it requires a lot >>> of time to be completed, time which I don't currently have as in 2 >>> months Giant must be gone by the VFS. >>> >>> I then decided to switch to gnn's rewamp of FUSE patches. You can find >>> his initial e-mail here: >>> http://lists.freebsd.org/pipermail/freebsd-fs/2012-March/013876.html >>> >>> I've precisely got the second version of George's patch and created >>> this dolphin branch: >>> svn://svn.freebsd.org/base/projects/fuse >>> >>> I'm fixing low hanging fruit for the moment (see r238411 for example) >>> and I still have to make a throughful review. >>> However my idea is to commit the support once: >>> - ntfs-3g is well stress-tested and proves to be bug-free >>> - there is no major/big technical issue pending after the reviews >> >> In the last weeks Peter, Florian, Gustau and I have been working in >> stabilizing fuse support. In the specific, Peter has worked hard on >> producing several utilities to nit stress-test fuse and in particular >> ntfs, Florian has improved fuse related ports (as explained later) and >> Gustau has done sparse testing. I feel moderately satisfied by the >> level of stability of fuse now to propose to wider usage, in >> particular given the huge amount of complaints I'm hearing around >> about occasional fuse users. >> >> The final target of the project is to completely import into base the >> content of fusefs-kmod starting from earlier posted patches by George. >> So far, we took care only of importing in the fuse branch the kernel >> part, so that fusefs-kmod userland part is still needed to be >> installed from ports, but I was studying the mount_fusefs licensing >> before to process with the import for the userland bits of it. >> >> The fixing has been happening here: >> svn://svn.freebsd.org/base/projects/fuse/ >> >> which is essentially an HEAD branch + fuse kernel components. In order >> to get fuse, please compile a kernel from this branch with FUSE option >> or simply build and load fuse module. >> Alternatively, a kernel patch that should work with HEAD@240684 is here: >> http://www.freebsd.org/~attilio/fuse_import/fuse_240684.patch >> >> I guess the patch can easilly apply to all FreeBSD branches, really, >> but it is not tested to anything else different then -CURRENT. >> >> As said you still need currently to build fusefs-kmod port. However >> you need these further patches, to be put in the fusefs-kmod/files/ >> directory:: >> http://www.freebsd.org/~attilio/fuse_import/patch-Makefile >> http://www.freebsd.org/~attilio/fuse_import/patch-mount_fusefs__mount_fusefs2.c >> >> They both disable the old kernel building/linking and import new >> functionality to let the new kernel support work well in presence of >> many consumers. >> >> In addition to fusefs-kmod, Bryan and Florian have also updated >> fusefs-lib and fusefs-ntfs ports. For instance, please refer to this >> e-mail: >> http://lists.freebsd.org/pipermail/freebsd-ports/2012-August/077950.html >> >> Even if this work is someway independent by the fusefs-kmod import, I >> warmly suggest to all of you to use their patches (and this what we >> have been testing so far too). >> >> At this point what I'm looking for are reviews and further testing. >> I would like to spend some words on what you should expect from this work: >> *Fuse is far from being perfect*. >> I cannot stress this enough. Peter stress-tests could break also Fuse >> on Linux generally and by Fuse authors admissions the modules can >> never guarantee to be completely starvation-free. However, they tend >> to be designed in a way that sleeps can be at least interrupted >> easily, making at least easy to recover from deadlocks. This is mostly >> retained also in FreeBSD, for what I can tell. Also, sometimes fuse >> seems to leave a small amount of hidden files, when it find references >> on files it wants to delete. This happens also under Linux and it is >> part of FUSE design, not much we can do. >> However, if deadlocks can be someway tollerated, things you should >> really pay attention are dumps of fuse modules (like ntfs-3g binary) >> and kernel panics. They must not happen and if they do they need to be >> fixed promptly. >> However, the good new is that ntfs seems doing exceptionally good. >> Florian could use ntfs as a backend for postgresql test. I think this >> is by far a big improvement if compared to current in-kernel ntfs >> which is completely torned. >> >> So far we have almost entirely tested only ntfs-3g. I know Gustau also >> used other modules like sshfs and George used GlusterFS with his older >> patches, but I encourage you to test as many modules as you want, as >> they may expose different bugs. Of course, I don't plan to spend much >> more time on FUSE, but I can occasionally look at bugs as they fall in >> the filesystems category and I'm always interested in keeping a good >> open eye on such issues. >> >> A few operational informations: >> - In the next days I will import the userland bits of fusefs-kmod to >> the fuse project branch making the port obsolete. When this happens I >> will make this clear to the user of this thread. >> - If no major bug is remained by the early October, I will commit this >> to -CURRENT >> - I expect Bryan and Florian to commit libfuse and ntfs updates soon. >> They can do independently from the fusefs-kmod retiral, but I would >> prefer their patches to go on first. >> - After that I will handover fusefs maintainership to gnn as agreed in >> precedence but I will be around helping with analysis and fixing, >> depending on time availability >> >> In the end I have really 2 minor questions: >> - One is about importing the mount_fusefs userland bits. I don't think >> we need a vendor import at all because they were developed by a >> FreeBSD GSoC student and kept in his git repo (or someone else's). >> Anyway, i'd just commit as new files once I do a good sweep. I hope >> nobody objects to that. >> - Another one is: fusefs-kmod right now is only amd64/i386 specific. I >> have no idea why as it has not any MD specific code. However I'm sure >> it has not been tested on other arches so far. Anyway I left it usable >> by all the arches. I think this is the correct choice. If someone >> objects with valid argument I can bring it back to be usable only on >> i386 and amd64. >> >> That's all, for any question please don't hesitate to contact me and >> the other people involved in this work. > > Attilio (and the crew), > > Thanks for working on fusefs-ntfs. It's been increasingly worrying to > me that we might lose it and I really depend on it. I really hope to > be able to use rsync to update files without killing my system some > day. > > I tried the new fusefs-libs and fusefs-ntfs ports from Florian and > Bryan, but ran into trouble as I could no longer build the kmod after > installing the updated fusefs-libs. It had an unresolved symbol: > cc -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE > -nostdinc -I../include -I. -I@ -I@/contrib/altq -finline-limit=8000 > --param inline-unit-growth=100 --param large-function-growth=1000 > -fno-common -fno-omit-frame-pointer -mcmodel=kernel -mno-red-zone > -mno-mmx -mno-sse -msoft-float -fno-asynchronous-unwind-tables > -ffreestanding -fstack-protector -std=iso9899:1999 -fstack-protector > -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes > -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef > -Wno-pointer-sign -fformat-extensions -Wmissing-include-dirs > -fdiagnostics-show-option -c fuse_vnops.c > fuse_vnops.c: In function 'create_filehandle': > fuse_vnops.c:1586: error: 'struct fuse_open_in' has no member named 'mode' > *** [fuse_vnops.o] Error code 1 > > This was on amd64 9-Stable r239879 until/unless this issue is > resolved, please keep the existing port available and/or mark the new > one to not install on pre-10 systems. If you follow the rule I described in this e-mail, the fusefs-kmod kernel part won't be build anymore, so you won't run into this. If it is build yet, please let me know because there is a bug in the 2 patches I posted for fusefs-kmod port. Attilio -- Peace can only be achieved by understanding - A. Einstein From owner-freebsd-fs@FreeBSD.ORG Wed Sep 19 18:35:23 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 584BD106566B for ; Wed, 19 Sep 2012 18:35:23 +0000 (UTC) (envelope-from dg@pki2.com) Received: from btw.pki2.com (btw.pki2.com [IPv6:2001:470:a:6fd::2]) by mx1.freebsd.org (Postfix) with ESMTP id 0F0A18FC18 for ; Wed, 19 Sep 2012 18:35:23 +0000 (UTC) Received: from btw.pki2.com (btw.pki2.com [192.168.23.1]) by btw.pki2.com (8.14.5/8.14.5) with ESMTP id q8JIZFBX046148 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Wed, 19 Sep 2012 11:35:17 -0700 (PDT) (envelope-from dg@pki2.com) Date: Wed, 19 Sep 2012 11:35:15 -0700 (PDT) From: Dennis Glatting X-X-Sender: dennisg@btw.pki2.com To: fs@freebsd.org Message-ID: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII X-yoursite-MailScanner-Information: Dennis Glatting X-yoursite-MailScanner-ID: q8JIZFBX046148 X-yoursite-MailScanner: Found to be clean X-MailScanner-From: dg@pki2.com Cc: Subject: How to recover from theis ZFS error? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Sep 2012 18:35:23 -0000 One of my pools (disk-1) with 12T of data is reporting this error after a scrub. Is there a way to fix this error without backing up and restoring 12T of data? errors: Permanent errors have been detected in the following files: :<0x0> disk-1:<0x0> From owner-freebsd-fs@FreeBSD.ORG Wed Sep 19 19:03:55 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 56F90106564A; Wed, 19 Sep 2012 19:03:55 +0000 (UTC) (envelope-from kob6558@gmail.com) Received: from mail-we0-f182.google.com (mail-we0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id D1F218FC15; Wed, 19 Sep 2012 19:03:53 +0000 (UTC) Received: by weyx56 with SMTP id x56so932319wey.13 for ; Wed, 19 Sep 2012 12:03:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=od0OjYD8RVH6/r4g3UQYYDtTt7moal/DORSpoeU95UQ=; b=E45NE5FYbL6rIJOavSzeiYeyxoRNCUKtkRKC6katc8e2DMRUfBAXatvI/SPXp04D6o lZ/5NUWhcKpZPuk75S1QjfYgwwWVzcsQEiee6HlY1DrhcUbaqb+QsAZQjgdmIQh7KlWZ IuWwbZ0FZXt8Mw1vgFc23RDp2qnIV8MwdRvRbWm008lnZ3sw/zJduEKylQDBxz3LFzue P7onxJriYJzpS1afXCizCJFmABO1pyxTtH1g0+rU5jkNRIVsLLj/JWSx2+EWdCbEfXKE T7r9e5bchIXeOyWHasYikM61U8c4YzUnspUJnwwJHXphQM1TSSoA1RDG8QCuU2KS8ghi S1ug== MIME-Version: 1.0 Received: by 10.180.107.103 with SMTP id hb7mr8553096wib.3.1348081432502; Wed, 19 Sep 2012 12:03:52 -0700 (PDT) Received: by 10.223.66.194 with HTTP; Wed, 19 Sep 2012 12:03:52 -0700 (PDT) In-Reply-To: References: <20120829060158.GA38721@x2.osted.lan> <20120831052003.GA91340@x2.osted.lan> <20120905201531.GA54452@x2.osted.lan> <20120917140055.GA9037@x2.osted.lan> Date: Wed, 19 Sep 2012 12:03:52 -0700 Message-ID: From: Kevin Oberman To: attilio@freebsd.org Content-Type: text/plain; charset=UTF-8 Cc: Peter Holm , bdrewery@freebsd.org, FreeBSD FS , freebsd-current@freebsd.org Subject: Re: MPSAFE VFS -- List of upcoming actions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Sep 2012 19:03:55 -0000 On Wed, Sep 19, 2012 at 12:30 AM, Attilio Rao wrote: > On Wed, Sep 19, 2012 at 4:47 AM, Kevin Oberman wrote: >> On Tue, Sep 18, 2012 at 7:48 PM, Attilio Rao wrote: >>> On Fri, Jul 13, 2012 at 12:18 AM, Attilio Rao wrote: >>>> 2012/7/4 Attilio Rao : >>>>> 2012/6/29 Attilio Rao : >>>>>> As already published several times, according to the following plan: >>>>>> http://wiki.freebsd.org/NONMPSAFE_DEORBIT_VFS >>>>>> >>>>> >>>>> I still haven't heard from Vivien or Edward, anyway as NTFS is >>>>> basically only used RO these days (also the mount_ntfs code just >>>>> permits RO mounting) I stripped all the uncomplete/bogus write support >>>>> with the following patch: >>>>> http://www.freebsd.org/~attilio/ntfs_remove_write.patch >>>>> >>>>> This is an attempt to make the code smaller and possibly just focus on >>>>> the locking that really matter (as read-only filesystem). >>>>> On some points of the patch I'm a bit less sure as we could easily >>>>> take into account also write for things like vaccess() arguments, and >>>>> make easier to re-add correct write support at some point in the >>>>> future, but still force RO, even if the approach used in the patch is >>>>> more correct IMHO. >>>>> As an added bonus this patch cleans some dirty code in the mount >>>>> operation and fixes a bug as vfs_mountedfrom() is called before real >>>>> mounting is completed and can still fail. >>>> >>>> A quick update on this. >>>> It looks like NTFS won't be completed for this GSoC thus I seriously >>>> need to find an alternative to not loose the NTFS support entirely. >>>> >>>> I tried to look into the NTFS implementation right now and it is >>>> really a poor support. As Peter has also verified, it can deadlock in >>>> no-time, it compeltely violates VFS rules, etc. IMHO it deserves a >>>> complete rewrite if we would still support in-kernel NTFS. I also >>>> tried to look at the NetBSD implementation. Their code is someway >>>> similar to our, but they used very complicated (and very dirty) code >>>> to do the locking. Even if I don't know well enough NetBSD VFS, I have >>>> the impression not all the races are correctly handled. Definitively, >>>> not something I would like to port. >>>> >>>> Considering all that the only viable option would be meaning an >>>> userland filesystem implementation. My preferred choice would be to >>>> import PUFFS and librefuse on top of it but honestly it requires a lot >>>> of time to be completed, time which I don't currently have as in 2 >>>> months Giant must be gone by the VFS. >>>> >>>> I then decided to switch to gnn's rewamp of FUSE patches. You can find >>>> his initial e-mail here: >>>> http://lists.freebsd.org/pipermail/freebsd-fs/2012-March/013876.html >>>> >>>> I've precisely got the second version of George's patch and created >>>> this dolphin branch: >>>> svn://svn.freebsd.org/base/projects/fuse >>>> >>>> I'm fixing low hanging fruit for the moment (see r238411 for example) >>>> and I still have to make a throughful review. >>>> However my idea is to commit the support once: >>>> - ntfs-3g is well stress-tested and proves to be bug-free >>>> - there is no major/big technical issue pending after the reviews >>> >>> In the last weeks Peter, Florian, Gustau and I have been working in >>> stabilizing fuse support. In the specific, Peter has worked hard on >>> producing several utilities to nit stress-test fuse and in particular >>> ntfs, Florian has improved fuse related ports (as explained later) and >>> Gustau has done sparse testing. I feel moderately satisfied by the >>> level of stability of fuse now to propose to wider usage, in >>> particular given the huge amount of complaints I'm hearing around >>> about occasional fuse users. >>> >>> The final target of the project is to completely import into base the >>> content of fusefs-kmod starting from earlier posted patches by George. >>> So far, we took care only of importing in the fuse branch the kernel >>> part, so that fusefs-kmod userland part is still needed to be >>> installed from ports, but I was studying the mount_fusefs licensing >>> before to process with the import for the userland bits of it. >>> >>> The fixing has been happening here: >>> svn://svn.freebsd.org/base/projects/fuse/ >>> >>> which is essentially an HEAD branch + fuse kernel components. In order >>> to get fuse, please compile a kernel from this branch with FUSE option >>> or simply build and load fuse module. >>> Alternatively, a kernel patch that should work with HEAD@240684 is here: >>> http://www.freebsd.org/~attilio/fuse_import/fuse_240684.patch >>> >>> I guess the patch can easilly apply to all FreeBSD branches, really, >>> but it is not tested to anything else different then -CURRENT. >>> >>> As said you still need currently to build fusefs-kmod port. However >>> you need these further patches, to be put in the fusefs-kmod/files/ >>> directory:: >>> http://www.freebsd.org/~attilio/fuse_import/patch-Makefile >>> http://www.freebsd.org/~attilio/fuse_import/patch-mount_fusefs__mount_fusefs2.c >>> >>> They both disable the old kernel building/linking and import new >>> functionality to let the new kernel support work well in presence of >>> many consumers. >>> >>> In addition to fusefs-kmod, Bryan and Florian have also updated >>> fusefs-lib and fusefs-ntfs ports. For instance, please refer to this >>> e-mail: >>> http://lists.freebsd.org/pipermail/freebsd-ports/2012-August/077950.html >>> >>> Even if this work is someway independent by the fusefs-kmod import, I >>> warmly suggest to all of you to use their patches (and this what we >>> have been testing so far too). >>> >>> At this point what I'm looking for are reviews and further testing. >>> I would like to spend some words on what you should expect from this work: >>> *Fuse is far from being perfect*. >>> I cannot stress this enough. Peter stress-tests could break also Fuse >>> on Linux generally and by Fuse authors admissions the modules can >>> never guarantee to be completely starvation-free. However, they tend >>> to be designed in a way that sleeps can be at least interrupted >>> easily, making at least easy to recover from deadlocks. This is mostly >>> retained also in FreeBSD, for what I can tell. Also, sometimes fuse >>> seems to leave a small amount of hidden files, when it find references >>> on files it wants to delete. This happens also under Linux and it is >>> part of FUSE design, not much we can do. >>> However, if deadlocks can be someway tollerated, things you should >>> really pay attention are dumps of fuse modules (like ntfs-3g binary) >>> and kernel panics. They must not happen and if they do they need to be >>> fixed promptly. >>> However, the good new is that ntfs seems doing exceptionally good. >>> Florian could use ntfs as a backend for postgresql test. I think this >>> is by far a big improvement if compared to current in-kernel ntfs >>> which is completely torned. >>> >>> So far we have almost entirely tested only ntfs-3g. I know Gustau also >>> used other modules like sshfs and George used GlusterFS with his older >>> patches, but I encourage you to test as many modules as you want, as >>> they may expose different bugs. Of course, I don't plan to spend much >>> more time on FUSE, but I can occasionally look at bugs as they fall in >>> the filesystems category and I'm always interested in keeping a good >>> open eye on such issues. >>> >>> A few operational informations: >>> - In the next days I will import the userland bits of fusefs-kmod to >>> the fuse project branch making the port obsolete. When this happens I >>> will make this clear to the user of this thread. >>> - If no major bug is remained by the early October, I will commit this >>> to -CURRENT >>> - I expect Bryan and Florian to commit libfuse and ntfs updates soon. >>> They can do independently from the fusefs-kmod retiral, but I would >>> prefer their patches to go on first. >>> - After that I will handover fusefs maintainership to gnn as agreed in >>> precedence but I will be around helping with analysis and fixing, >>> depending on time availability >>> >>> In the end I have really 2 minor questions: >>> - One is about importing the mount_fusefs userland bits. I don't think >>> we need a vendor import at all because they were developed by a >>> FreeBSD GSoC student and kept in his git repo (or someone else's). >>> Anyway, i'd just commit as new files once I do a good sweep. I hope >>> nobody objects to that. >>> - Another one is: fusefs-kmod right now is only amd64/i386 specific. I >>> have no idea why as it has not any MD specific code. However I'm sure >>> it has not been tested on other arches so far. Anyway I left it usable >>> by all the arches. I think this is the correct choice. If someone >>> objects with valid argument I can bring it back to be usable only on >>> i386 and amd64. >>> >>> That's all, for any question please don't hesitate to contact me and >>> the other people involved in this work. >> >> Attilio (and the crew), >> >> Thanks for working on fusefs-ntfs. It's been increasingly worrying to >> me that we might lose it and I really depend on it. I really hope to >> be able to use rsync to update files without killing my system some >> day. >> >> I tried the new fusefs-libs and fusefs-ntfs ports from Florian and >> Bryan, but ran into trouble as I could no longer build the kmod after >> installing the updated fusefs-libs. It had an unresolved symbol: >> cc -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE >> -nostdinc -I../include -I. -I@ -I@/contrib/altq -finline-limit=8000 >> --param inline-unit-growth=100 --param large-function-growth=1000 >> -fno-common -fno-omit-frame-pointer -mcmodel=kernel -mno-red-zone >> -mno-mmx -mno-sse -msoft-float -fno-asynchronous-unwind-tables >> -ffreestanding -fstack-protector -std=iso9899:1999 -fstack-protector >> -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes >> -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef >> -Wno-pointer-sign -fformat-extensions -Wmissing-include-dirs >> -fdiagnostics-show-option -c fuse_vnops.c >> fuse_vnops.c: In function 'create_filehandle': >> fuse_vnops.c:1586: error: 'struct fuse_open_in' has no member named 'mode' >> *** [fuse_vnops.o] Error code 1 >> >> This was on amd64 9-Stable r239879 until/unless this issue is >> resolved, please keep the existing port available and/or mark the new >> one to not install on pre-10 systems. > > If you follow the rule I described in this e-mail, the fusefs-kmod > kernel part won't be build anymore, so you won't run into this. > If it is build yet, please let me know because there is a bug in the 2 > patches I posted for fusefs-kmod port. Attilo, I assumed that your new kernel module was only tested/working with current, so I did not try to use it. I was only referring to the use of the updated of fusefs-libs and fusefs-ntfs that Florian and Bryan provided. I had tested these on 9-stable and found that after installing the updated fusefs-libs, the old fusefs-kmod port would no longer compile. Today Florian sent me a one line patch to fuse-modue/fuse-vnops.c in the current fusefs-kmod port which appears to have fixed the problem. It compiled fine and it is currently running on the system on which I am typing this. I have done a bit of light testing and it works to this point. I'll do some heavier testing later today. So it looks like this there is probably no issue with Florian committing the new fusefs-libs and fusefs-ntfs ports for those of us not running current. If I get enough time, I'll look into applying the patches to the kernel module on 9-stableand see how that does, but I have my day job and contractors working in the house, so I won't make any promises. Thanks again to you and all of the others who contributed to this. May not be perfect, but it is a huge win over the kernel NTFS code, especially or those of us who need to actually write a file now and then. -- R. Kevin Oberman, Network Engineer E-mail: kob6558@gmail.com From owner-freebsd-fs@FreeBSD.ORG Wed Sep 19 19:09:00 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A0EFF1065670; Wed, 19 Sep 2012 19:09:00 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id E95698FC12; Wed, 19 Sep 2012 19:08:58 +0000 (UTC) Received: by lbbgg13 with SMTP id gg13so1714225lbb.13 for ; Wed, 19 Sep 2012 12:08:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=12+s5RgA3unmVGZDyI1LX38jPqNdroOc1I3M/BgBDD8=; b=eJad25XU58ZMWnfiHy2MHybC9htlZLKB3HeELkkHPMOFuoDTIyFtDAG/gEqe+n3owi vPEQJ6dQnXse0SboLvHA1fKe/TiA2KO1TqETHsL0oX63prN4pXNR/QQbGBJ/B9jrbsZL W80Kq62jt1alrKkOCTDQqxl8ka0u8cZ54/GbD4JYCyMsxGA0IZRJmSFMIRQ0RhSMq38u RwXAsiKlyeMs46p7lpO9lGroImFX26qt20ljFoueKEBiWdrQFlqI0+gjMNrMxBLpBhNU FjZAR+BWsglxq7oe9HdvM+BTL4qw4ACqMyVjQ9JiXMDBoQD6fSVc7X0T6Ci3NqFy6hdf BjYg== MIME-Version: 1.0 Received: by 10.152.131.68 with SMTP id ok4mr3414683lab.47.1348081736990; Wed, 19 Sep 2012 12:08:56 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.112.102.39 with HTTP; Wed, 19 Sep 2012 12:08:56 -0700 (PDT) In-Reply-To: References: <20120829060158.GA38721@x2.osted.lan> <20120831052003.GA91340@x2.osted.lan> <20120905201531.GA54452@x2.osted.lan> <20120917140055.GA9037@x2.osted.lan> Date: Wed, 19 Sep 2012 20:08:56 +0100 X-Google-Sender-Auth: rSaECWf7wPkM6KW-iZebcef82T8 Message-ID: From: Attilio Rao To: Kevin Oberman Content-Type: text/plain; charset=UTF-8 Cc: Peter Holm , bdrewery@freebsd.org, FreeBSD FS , freebsd-current@freebsd.org Subject: Re: MPSAFE VFS -- List of upcoming actions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: attilio@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Sep 2012 19:09:00 -0000 On 9/19/12, Kevin Oberman wrote: > On Wed, Sep 19, 2012 at 12:30 AM, Attilio Rao wrote: >> On Wed, Sep 19, 2012 at 4:47 AM, Kevin Oberman wrote: >>> On Tue, Sep 18, 2012 at 7:48 PM, Attilio Rao >>> wrote: >>>> On Fri, Jul 13, 2012 at 12:18 AM, Attilio Rao >>>> wrote: >>>>> 2012/7/4 Attilio Rao : >>>>>> 2012/6/29 Attilio Rao : >>>>>>> As already published several times, according to the following plan: >>>>>>> http://wiki.freebsd.org/NONMPSAFE_DEORBIT_VFS >>>>>>> >>>>>> >>>>>> I still haven't heard from Vivien or Edward, anyway as NTFS is >>>>>> basically only used RO these days (also the mount_ntfs code just >>>>>> permits RO mounting) I stripped all the uncomplete/bogus write >>>>>> support >>>>>> with the following patch: >>>>>> http://www.freebsd.org/~attilio/ntfs_remove_write.patch >>>>>> >>>>>> This is an attempt to make the code smaller and possibly just focus >>>>>> on >>>>>> the locking that really matter (as read-only filesystem). >>>>>> On some points of the patch I'm a bit less sure as we could easily >>>>>> take into account also write for things like vaccess() arguments, and >>>>>> make easier to re-add correct write support at some point in the >>>>>> future, but still force RO, even if the approach used in the patch is >>>>>> more correct IMHO. >>>>>> As an added bonus this patch cleans some dirty code in the mount >>>>>> operation and fixes a bug as vfs_mountedfrom() is called before real >>>>>> mounting is completed and can still fail. >>>>> >>>>> A quick update on this. >>>>> It looks like NTFS won't be completed for this GSoC thus I seriously >>>>> need to find an alternative to not loose the NTFS support entirely. >>>>> >>>>> I tried to look into the NTFS implementation right now and it is >>>>> really a poor support. As Peter has also verified, it can deadlock in >>>>> no-time, it compeltely violates VFS rules, etc. IMHO it deserves a >>>>> complete rewrite if we would still support in-kernel NTFS. I also >>>>> tried to look at the NetBSD implementation. Their code is someway >>>>> similar to our, but they used very complicated (and very dirty) code >>>>> to do the locking. Even if I don't know well enough NetBSD VFS, I have >>>>> the impression not all the races are correctly handled. Definitively, >>>>> not something I would like to port. >>>>> >>>>> Considering all that the only viable option would be meaning an >>>>> userland filesystem implementation. My preferred choice would be to >>>>> import PUFFS and librefuse on top of it but honestly it requires a lot >>>>> of time to be completed, time which I don't currently have as in 2 >>>>> months Giant must be gone by the VFS. >>>>> >>>>> I then decided to switch to gnn's rewamp of FUSE patches. You can find >>>>> his initial e-mail here: >>>>> http://lists.freebsd.org/pipermail/freebsd-fs/2012-March/013876.html >>>>> >>>>> I've precisely got the second version of George's patch and created >>>>> this dolphin branch: >>>>> svn://svn.freebsd.org/base/projects/fuse >>>>> >>>>> I'm fixing low hanging fruit for the moment (see r238411 for example) >>>>> and I still have to make a throughful review. >>>>> However my idea is to commit the support once: >>>>> - ntfs-3g is well stress-tested and proves to be bug-free >>>>> - there is no major/big technical issue pending after the reviews >>>> >>>> In the last weeks Peter, Florian, Gustau and I have been working in >>>> stabilizing fuse support. In the specific, Peter has worked hard on >>>> producing several utilities to nit stress-test fuse and in particular >>>> ntfs, Florian has improved fuse related ports (as explained later) and >>>> Gustau has done sparse testing. I feel moderately satisfied by the >>>> level of stability of fuse now to propose to wider usage, in >>>> particular given the huge amount of complaints I'm hearing around >>>> about occasional fuse users. >>>> >>>> The final target of the project is to completely import into base the >>>> content of fusefs-kmod starting from earlier posted patches by George. >>>> So far, we took care only of importing in the fuse branch the kernel >>>> part, so that fusefs-kmod userland part is still needed to be >>>> installed from ports, but I was studying the mount_fusefs licensing >>>> before to process with the import for the userland bits of it. >>>> >>>> The fixing has been happening here: >>>> svn://svn.freebsd.org/base/projects/fuse/ >>>> >>>> which is essentially an HEAD branch + fuse kernel components. In order >>>> to get fuse, please compile a kernel from this branch with FUSE option >>>> or simply build and load fuse module. >>>> Alternatively, a kernel patch that should work with HEAD@240684 is >>>> here: >>>> http://www.freebsd.org/~attilio/fuse_import/fuse_240684.patch >>>> >>>> I guess the patch can easilly apply to all FreeBSD branches, really, >>>> but it is not tested to anything else different then -CURRENT. >>>> >>>> As said you still need currently to build fusefs-kmod port. However >>>> you need these further patches, to be put in the fusefs-kmod/files/ >>>> directory:: >>>> http://www.freebsd.org/~attilio/fuse_import/patch-Makefile >>>> http://www.freebsd.org/~attilio/fuse_import/patch-mount_fusefs__mount_fusefs2.c >>>> >>>> They both disable the old kernel building/linking and import new >>>> functionality to let the new kernel support work well in presence of >>>> many consumers. >>>> >>>> In addition to fusefs-kmod, Bryan and Florian have also updated >>>> fusefs-lib and fusefs-ntfs ports. For instance, please refer to this >>>> e-mail: >>>> http://lists.freebsd.org/pipermail/freebsd-ports/2012-August/077950.html >>>> >>>> Even if this work is someway independent by the fusefs-kmod import, I >>>> warmly suggest to all of you to use their patches (and this what we >>>> have been testing so far too). >>>> >>>> At this point what I'm looking for are reviews and further testing. >>>> I would like to spend some words on what you should expect from this >>>> work: >>>> *Fuse is far from being perfect*. >>>> I cannot stress this enough. Peter stress-tests could break also Fuse >>>> on Linux generally and by Fuse authors admissions the modules can >>>> never guarantee to be completely starvation-free. However, they tend >>>> to be designed in a way that sleeps can be at least interrupted >>>> easily, making at least easy to recover from deadlocks. This is mostly >>>> retained also in FreeBSD, for what I can tell. Also, sometimes fuse >>>> seems to leave a small amount of hidden files, when it find references >>>> on files it wants to delete. This happens also under Linux and it is >>>> part of FUSE design, not much we can do. >>>> However, if deadlocks can be someway tollerated, things you should >>>> really pay attention are dumps of fuse modules (like ntfs-3g binary) >>>> and kernel panics. They must not happen and if they do they need to be >>>> fixed promptly. >>>> However, the good new is that ntfs seems doing exceptionally good. >>>> Florian could use ntfs as a backend for postgresql test. I think this >>>> is by far a big improvement if compared to current in-kernel ntfs >>>> which is completely torned. >>>> >>>> So far we have almost entirely tested only ntfs-3g. I know Gustau also >>>> used other modules like sshfs and George used GlusterFS with his older >>>> patches, but I encourage you to test as many modules as you want, as >>>> they may expose different bugs. Of course, I don't plan to spend much >>>> more time on FUSE, but I can occasionally look at bugs as they fall in >>>> the filesystems category and I'm always interested in keeping a good >>>> open eye on such issues. >>>> >>>> A few operational informations: >>>> - In the next days I will import the userland bits of fusefs-kmod to >>>> the fuse project branch making the port obsolete. When this happens I >>>> will make this clear to the user of this thread. >>>> - If no major bug is remained by the early October, I will commit this >>>> to -CURRENT >>>> - I expect Bryan and Florian to commit libfuse and ntfs updates soon. >>>> They can do independently from the fusefs-kmod retiral, but I would >>>> prefer their patches to go on first. >>>> - After that I will handover fusefs maintainership to gnn as agreed in >>>> precedence but I will be around helping with analysis and fixing, >>>> depending on time availability >>>> >>>> In the end I have really 2 minor questions: >>>> - One is about importing the mount_fusefs userland bits. I don't think >>>> we need a vendor import at all because they were developed by a >>>> FreeBSD GSoC student and kept in his git repo (or someone else's). >>>> Anyway, i'd just commit as new files once I do a good sweep. I hope >>>> nobody objects to that. >>>> - Another one is: fusefs-kmod right now is only amd64/i386 specific. I >>>> have no idea why as it has not any MD specific code. However I'm sure >>>> it has not been tested on other arches so far. Anyway I left it usable >>>> by all the arches. I think this is the correct choice. If someone >>>> objects with valid argument I can bring it back to be usable only on >>>> i386 and amd64. >>>> >>>> That's all, for any question please don't hesitate to contact me and >>>> the other people involved in this work. >>> >>> Attilio (and the crew), >>> >>> Thanks for working on fusefs-ntfs. It's been increasingly worrying to >>> me that we might lose it and I really depend on it. I really hope to >>> be able to use rsync to update files without killing my system some >>> day. >>> >>> I tried the new fusefs-libs and fusefs-ntfs ports from Florian and >>> Bryan, but ran into trouble as I could no longer build the kmod after >>> installing the updated fusefs-libs. It had an unresolved symbol: >>> cc -O2 -pipe -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE >>> -nostdinc -I../include -I. -I@ -I@/contrib/altq -finline-limit=8000 >>> --param inline-unit-growth=100 --param large-function-growth=1000 >>> -fno-common -fno-omit-frame-pointer -mcmodel=kernel -mno-red-zone >>> -mno-mmx -mno-sse -msoft-float -fno-asynchronous-unwind-tables >>> -ffreestanding -fstack-protector -std=iso9899:1999 -fstack-protector >>> -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes >>> -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef >>> -Wno-pointer-sign -fformat-extensions -Wmissing-include-dirs >>> -fdiagnostics-show-option -c fuse_vnops.c >>> fuse_vnops.c: In function 'create_filehandle': >>> fuse_vnops.c:1586: error: 'struct fuse_open_in' has no member named >>> 'mode' >>> *** [fuse_vnops.o] Error code 1 >>> >>> This was on amd64 9-Stable r239879 until/unless this issue is >>> resolved, please keep the existing port available and/or mark the new >>> one to not install on pre-10 systems. >> >> If you follow the rule I described in this e-mail, the fusefs-kmod >> kernel part won't be build anymore, so you won't run into this. >> If it is build yet, please let me know because there is a bug in the 2 >> patches I posted for fusefs-kmod port. > > Attilo, > > I assumed that your new kernel module was only tested/working with > current, so I did not try to use it. I was only referring to the use > of the updated of fusefs-libs and fusefs-ntfs that Florian and Bryan > provided. I had tested these on 9-stable and found that after > installing the updated fusefs-libs, the old fusefs-kmod port would no > longer compile. > > Today Florian sent me a one line patch to fuse-modue/fuse-vnops.c in > the current fusefs-kmod port which appears to have fixed the problem. > It compiled fine and it is currently running on the system on which I > am typing this. I have done a bit of light testing and it works to > this point. I'll do some heavier testing later today. So it looks like > this there is probably no issue with Florian committing the new > fusefs-libs and fusefs-ntfs ports for those of us not running current. Thanks for let us know. I think that Bryan and Florian should really update the ports as soon as possible. Also, I hope that someone will sync the fusefs-kmod port (in particular the kernel part) with the kernel code that our branch brings along. I think Florian volountereed for this, so there should not be a problem on that. Attilio -- Peace can only be achieved by understanding - A. Einstein From owner-freebsd-fs@FreeBSD.ORG Wed Sep 19 22:26:16 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id BD778106579D for ; Wed, 19 Sep 2012 22:26:16 +0000 (UTC) (envelope-from bdrewery@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 941AB8FC19 for ; Wed, 19 Sep 2012 22:26:16 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q8JMQGEc091054 for ; Wed, 19 Sep 2012 22:26:16 GMT (envelope-from bdrewery@freefall.freebsd.org) Received: (from bdrewery@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q8JMQGff091049 for freebsd-fs@freebsd.org; Wed, 19 Sep 2012 22:26:16 GMT (envelope-from bdrewery) Received: (qmail 68708 invoked from network); 19 Sep 2012 17:26:10 -0500 Received: from unknown (HELO ?192.168.0.74?) (freebsd@shatow.net@74.94.87.209) by sweb.xzibition.com with ESMTPA; 19 Sep 2012 17:26:10 -0500 Message-ID: <505A468E.2080902@FreeBSD.org> Date: Wed, 19 Sep 2012 17:26:22 -0500 From: Bryan Drewery Organization: FreeBSD User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20120824 Thunderbird/15.0 MIME-Version: 1.0 To: attilio@freebsd.org References: <20120829060158.GA38721@x2.osted.lan> <20120831052003.GA91340@x2.osted.lan> <20120905201531.GA54452@x2.osted.lan> <20120917140055.GA9037@x2.osted.lan> In-Reply-To: X-Enigmail-Version: 1.4.4 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: Peter Holm , FreeBSD FS , freebsd-current@freebsd.org Subject: Re: MPSAFE VFS -- List of upcoming actions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Sep 2012 22:26:16 -0000 On 9/18/2012 9:48 PM, Attilio Rao wrote: > In addition to fusefs-kmod, Bryan and Florian have also updated > fusefs-lib and fusefs-ntfs ports. For instance, please refer to this > e-mail: > http://lists.freebsd.org/pipermail/freebsd-ports/2012-August/077950.html > > Even if this work is someway independent by the fusefs-kmod import, I > warmly suggest to all of you to use their patches (and this what we > have been testing so far too). I have committed my updates to sysutils/fusefs-ntfs now. -- Regards, Bryan Drewery bdrewery@freenode/EFNet From owner-freebsd-fs@FreeBSD.ORG Thu Sep 20 05:11:41 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9DA07106564A; Thu, 20 Sep 2012 05:11:41 +0000 (UTC) (envelope-from kamikaze@bsdforen.de) Received: from mail.server1.bsdforen.de (bsdforen.de [82.193.243.81]) by mx1.freebsd.org (Postfix) with ESMTP id 3AF288FC08; Thu, 20 Sep 2012 05:11:40 +0000 (UTC) Received: from mobileKamikaze.norad (HSI-KBW-134-3-231-194.hsi14.kabel-badenwuerttemberg.de [134.3.231.194]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.server1.bsdforen.de (Postfix) with ESMTPSA id 32711861A0; Thu, 20 Sep 2012 07:11:32 +0200 (CEST) Message-ID: <505AA583.7090401@bsdforen.de> Date: Thu, 20 Sep 2012 07:11:31 +0200 From: Dominic Fandrey User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120912 Thunderbird/15.0.1 MIME-Version: 1.0 To: attilio@FreeBSD.org References: <20120829060158.GA38721@x2.osted.lan> <20120831052003.GA91340@x2.osted.lan> <20120905201531.GA54452@x2.osted.lan> <20120917140055.GA9037@x2.osted.lan> In-Reply-To: Content-Type: text/plain; charset=ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: Peter Holm , bdrewery@freebsd.org, FreeBSD FS , freebsd-current@freebsd.org Subject: Re: MPSAFE VFS -- List of upcoming actions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Sep 2012 05:11:41 -0000 On 19/09/2012 04:48, Attilio Rao wrote: > On Fri, Jul 13, 2012 at 12:18 AM, Attilio Rao wrote: > ... > Alternatively, a kernel patch that should work with HEAD@240684 is here: > http://www.freebsd.org/~attilio/fuse_import/fuse_240684.patch > > I guess the patch can easilly apply to all FreeBSD branches, really, > but it is not tested to anything else different then -CURRENT. RELENG_9, fetched yesterday: ===> fuse (all) env CCACHE_PREFIX=/usr/local/bin/distcc /usr/local/bin/ccache cc -O2 -pipe -march=core2 -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -nostdinc -DHAVE_KERNEL_OPTION_HEADERS -include /usr/obj/HP6510b-9/amd64/usr/src/sys/HP6510b-9/opt_global.h -I. -I@ -I@/contrib/altq -finline-limit=8000 --param inline-unit-growth=100 --param large-function-growth=1000 -fno-common -g -fno-omit-frame-pointer -I/usr/obj/HP6510b-9/amd64/usr/src/sys/HP6510b-9 -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float -fno-asynchronous-unwind-tables -ffreestanding -fstack-protector -std=iso9899:1999 -fstack-protector -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef -Wno-pointer-sign -fformat-extensions -Wmissing-include-dirs -fdiagnostics-show-option -c /usr/src/sys/modules/fuse/../../fs/fuse/fuse_device.c env CCACHE_PREFIX=/usr/local/bin/distcc /usr/local/bin/ccache cc -O2 -pipe -march=core2 -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -nostdinc -DHAVE_KERNEL_OPTION_HEADERS -include /usr/obj/HP6510b-9/amd64/usr/src/sys/HP6510b-9/opt_global.h -I. -I@ -I@/contrib/altq -finline-limit=8000 --param inline-unit-growth=100 --param large-function-growth=1000 -fno-common -g -fno-omit-frame-pointer -I/usr/obj/HP6510b-9/amd64/usr/src/sys/HP6510b-9 -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float -fno-asynchronous-unwind-tables -ffreestanding -fstack-protector -std=iso9899:1999 -fstack-protector -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef -Wno-pointer-sign -fformat-extensions -Wmissing-include-dirs -fdiagnostics-show-option -c /usr/src/sys/modules/fuse/../../fs/fuse/fuse_node.c distcc[20814] ERROR: compile /root/.ccache/tmp/fuse_node.tmp.mobileKamikaze.norad.20806.i on localhost failed cc1: warnings being treated as errors /usr/src/sys/modules/fuse/../../fs/fuse/fuse_node.c: In function 'fuse_vnode_setsize': /usr/src/sys/modules/fuse/../../fs/fuse/fuse_node.c:378: warning: passing argument 3 of 'vtruncbuf' makes pointer from integer without a cast /usr/src/sys/modules/fuse/../../fs/fuse/fuse_node.c:378: error: too few arguments to function 'vtruncbuf' *** [fuse_node.o] Error code 1 1 error *** [all] Error code 2 1 error *** [modules-all] Error code 2 1 error *** [buildkernel] Error code 2 1 error *** [buildkernel] Error code 2 Stop in /usr/src. -- A: Because it fouls the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail? From owner-freebsd-fs@FreeBSD.ORG Thu Sep 20 12:59:13 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 28AC7106566B; Thu, 20 Sep 2012 12:59:13 +0000 (UTC) (envelope-from simon@comsys.ntu-kpi.kiev.ua) Received: from comsys.kpi.ua (comsys.kpi.ua [77.47.192.42]) by mx1.freebsd.org (Postfix) with ESMTP id 737B88FC12; Thu, 20 Sep 2012 12:59:12 +0000 (UTC) Received: from pm513-1.comsys.kpi.ua ([10.18.52.101] helo=pm513-1.comsys.ntu-kpi.kiev.ua) by comsys.kpi.ua with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.63) (envelope-from ) id 1TEgLW-0005iT-6v; Thu, 20 Sep 2012 15:59:10 +0300 Received: by pm513-1.comsys.ntu-kpi.kiev.ua (Postfix, from userid 1001) id 8EC2B1CC23; Thu, 20 Sep 2012 15:59:09 +0300 (EEST) Date: Thu, 20 Sep 2012 15:59:09 +0300 From: Andrey Simonenko To: Rick Macklem Message-ID: <20120920125909.GA9013@pm513-1.comsys.ntu-kpi.kiev.ua> References: <2050472507.821722.1348009974939.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2050472507.821722.1348009974939.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Mutt/1.5.21 (2010-09-15) X-Authenticated-User: simon@comsys.ntu-kpi.kiev.ua X-Authenticator: plain X-Sender-Verify: SUCCEEDED (sender exists & accepts mail) X-Exim-Version: 4.63 (build at 28-Apr-2011 07:11:12) X-Date: 2012-09-20 15:59:10 X-Connected-IP: 10.18.52.101:19601 X-Message-Linecount: 192 X-Body-Linecount: 175 X-Message-Size: 6779 X-Body-Size: 5939 Cc: FS List , Will Andrews , "Justin T. Gibbs" Subject: Re: testing/review of atomic export update patch X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Sep 2012 12:59:13 -0000 On Tue, Sep 18, 2012 at 07:12:54PM -0400, Rick Macklem wrote: > Justin T. Gibbs wrote: > > On Sep 16, 2012, at 3:41 PM, Rick Macklem > > wrote: > > > > > Hi, > > > > > > There is a simple patch at: > > > http://people.freebsd.org/~rmacklem/atomic-export.patch > > > that can be applied to a kernel + mountd, so that the new > > > nfsd can be suspended by mountd while the exports are being > > > reloaded. It adds a new "-S" flag to mountd to enable this. > > > (This avoids the long standing bug where clients receive ESTALE > > > replies to RPCs while mountd is reloading exports.) > > > > At Spectra, we are successfully using the NFSE patch set from > > nfse.sourceforge.net (FreeBSD PR 136865). It addresses > > the ESTALE problem in addition to cleaning up several aspects > > of exports processing. > > > > Have you reviewed the NFSE work? Do you have any issues > > or concerns with it? What is the right path for getting NFSE > > integrated into FreeBSD? > > > I, personally, have not found the time to review it. As such, > I can't state specifics, however there have been concerns w.r.t. > a switch from mountd->nfse resulting in different behaviour when > used with the same /etc/exports file used for mountd. > > Some questions that need to be answered w.r.t. nfse, which I > haven't had the time to do: > - Are the differences listed here significant enough for a > change to be considered a POLA violation? > http://nfse.sourceforge.net/COMPATIBILITY Just to be clear with what is "POLA violation", can somebody who has interest in this topic answer the following questions: Was the following change to mountd considered POLA violation? (this change ignored the semantics of the -alldirs option) http://www.freebsd.org/cgi/cvsweb.cgi/src/usr.sbin/mountd/mountd.c.diff?r1=1.83;r2=1.84 Are the following changes (corrections) to mountd considered POLA violation? http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/170295 http://www.freebsd.org/cgi/query-pr.cgi?pr=bin/170413 > - If the server mount point is /sub1 and the only line > referring to this server volume in /etc/exports looks like: > > /sub1/sub2 client.net > > Does the following mount command work on client.net > # mount -t nfs -o nfsv3 server.net:/sub1 /mnt > when nfse is run with -C using the /etc/exports file? This command will cause "access denied" for the NFSv2/3 clients and NFS server will export /sub1 to NFSv2/3 clients. > This is typically referred to as an "administrative control", > since it is only enforced by mountd for the Mount protocol, > but is considered an important feature by some (rwatson@ > expressed a desire/need for it). Exactly like this. > - Does the nfse patch handle exporting of all file systems types > and, in particular, the `zfs share` case. It does not depend on file system type, a file system just should support NFS. NFSE is integrated as a part of NFS server code, so ZFS snapshots will not be exported automatically. Currently if ZFS file system is exported, then its snapshots are exported as well (not optional BTW) because VFS_CHECKEXP() is used. "zfs share/unshare" modifies /etc/zfs/exports and depending on presence of the /etc/nfs.exports file sends SIGHUP to mountd or calls "nfse -c ..." to update settings dynamically. It is assumed that if ZFS file system is exported via "zfs share", then it is not exported in another file (details in fsshare.c:nfse_update() in src/cddl. 1. Example how mountd after 1.84 of mountd.c understands -alldirs: # cat /etc/exports /cdrom -alldirs # mountd # showmount -e Exports list on localhost: /cdrom Everyone # mount | grep /cdrom # mount -t nfs -o nfsv3 127.0.0.1:/ /mnt # ls /mnt ... # 2. Example how nfse understands /etc/exports with only one line with pathname that is not a mount point: # cat /etc/exports /sub1/sub2 127.0.0.1 # mount | grep /sub1 /dev/md0 on /sub1 (ufs, local) # nfse -C # mount -t nfs -o nfsv3 127.0.0.1:/sub1 /mnt [tcp] 127.0.0.1:/sub1: Permission denied # mount -t nfs -o nfsv3 127.0.0.1:/sub1/sub2 /mnt # ls /mnt file.txt # ls /sub1/sub2 file.txt # showmount -e Exports list on localhost: /sub1 127.0.0.1 # nfse -c show ... Pathname /sub1 (exported) Export specifications: -rw -sec sys -maproot=-2:-2 -host 127.0.0.1 Subdirectories for NFSv2/3: /sub1/sub2 -host 127.0.0.1 ... # 3. Example how nfse understands /etc/exports with one line with -alldirs: # cat /etc/eports /sub1/sub2 -alldirs 127.0.0.1 # mount | grep /sub1 /dev/md0 on /sub1 (ufs, local) # nfse -C # mount -t nfs -o nfsv3 127.0.0.1:/sub1 /mnt [tcp] 127.0.0.1:/sub1: Permission denied # mount -t nfs -o nfsv3 127.0.0.1:/sub1/sub2 /mnt [tcp] 127.0.0.1:/sub1/sub2: Permission denied # showmount -e Exports list on localhost: # nfse -c show ... Pathname /sub1/sub2 File system options: -alldirs Export specifications: -rw -sec sys -maproot=-2:-2 -host 127.0.0.1 Subdirectories for NFSv2/3: /sub1/sub2 -alldirs -host 127.0.0.1 ... # mount /dev/md1 /sub1/sub2 # mount | grep /sub1 /dev/md0 on /sub1 (ufs, local) /dev/md1 on /sub1/sub2 (ufs, local) # mount -t nfs -o nfsv3 127.0.0.1:/sub1 /mnt [tcp] 127.0.0.1:/sub1: Permission denied # mount -t nfs -o nfsv3 127.0.0.1:/sub1/sub2 /mnt # showmount -e Exports list on localhost: /sub1/sub2 127.0.0.1 # nfse -c show ... Pathname /sub1/sub2 (exported) File system options: -alldirs Export specifications: -rw -sec sys -maproot=-2:-2 -host 127.0.0.1 Subdirectories for NFSv2/3: /sub1/sub2 -alldirs -host 127.0.0.1 ... # All examples were checked on 10-CURRENT with recent NFSE. Some lines from the "nfse -c show" command were removed. There is no /var/run/mountd.pid -> /var/run/nfse.pid symlink, so mount does not send SIGHUP to nfse. From owner-freebsd-fs@FreeBSD.ORG Thu Sep 20 15:46:32 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 0E355106566B for ; Thu, 20 Sep 2012 15:46:32 +0000 (UTC) (envelope-from jusher71@yahoo.com) Received: from nm4-vm4.bullet.mail.ne1.yahoo.com (nm4-vm4.bullet.mail.ne1.yahoo.com [98.138.91.164]) by mx1.freebsd.org (Postfix) with SMTP id 950FA8FC08 for ; Thu, 20 Sep 2012 15:46:31 +0000 (UTC) Received: from [98.138.90.48] by nm4.bullet.mail.ne1.yahoo.com with NNFMP; 20 Sep 2012 15:46:25 -0000 Received: from [98.138.89.195] by tm1.bullet.mail.ne1.yahoo.com with NNFMP; 20 Sep 2012 15:46:25 -0000 Received: from [127.0.0.1] by omp1053.mail.ne1.yahoo.com with NNFMP; 20 Sep 2012 15:46:25 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 119621.67281.bm@omp1053.mail.ne1.yahoo.com Received: (qmail 57976 invoked by uid 60001); 20 Sep 2012 15:46:25 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1348155984; bh=5kX3eAU4bvIPixEHIyg4smH6GjdnHIl184cuG1m+qa0=; h=X-YMail-OSG:Received:X-Mailer:Message-ID:Date:From:Subject:To:MIME-Version:Content-Type; b=uYEA5j5aGQAaU5s9ep8YUGXmzolowM6g4n9OAfU2Mxy8GxYJ4xeDGytIhkiKIwh7ihafxB2hD+WAeruwVJ/N3vE8liS6jfWT/BGFeHodFMmwu5ZPDPS2o81o07/br2wpoex/GyKgauSr49acbph5hgUpVjGWZftPzqAINTwGVDs= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:Message-ID:Date:From:Subject:To:MIME-Version:Content-Type; b=2SisoEr0efAwa97x0ds0zzp8vtv4NXja2sLoS7ZYP5W14ReuzhFKSZBsva7HGVBm3tBR/WXj++UJPkXH8q6tCTfI/fiCfaRXTTED2Da36cD+OasENpKkj9xqySb0XLdC8tUZuoTksfKQ3X816ey3wA6njIazZ8+Wxx01+Tm74Qk=; X-YMail-OSG: 6APzz3UVM1mLWy2PESfRgMMeb7L9DhL7aDSUtjnc9_drP.7 F9gYQWIqQi3Qoqw.P678FRIdjUxg0.ckdOPGnwOvt_HJuPfCaYJ9.73Q_S40 taEjQVAcTSC.rQUdvipUvZJQoIlD5BRiogBcCgXAD7LeHp0rIYJSpEvCGa_E R.KhrM6HAe2c0Y2qsQKlcF13LBTrwK4AIGAWqYhm2tCao6pRC_GS4mHMo3h4 TyjFiaIemp6ib_VMvEVfvyFX1SICSGR1J0lEhMuJc0EW.bBzh219EMbgE2SL imeNLEwjBSX3Q2jjvWBUcoNk0BzFX1ujxL9F5uPjwqU9oM_UTWF_pf_AT8mx w0oOoWy29N8EMQvW.7.BMkgYdf8YZ6PPEkYv8oF_tM7l.b7TkWQc32haoGD7 xNwKFhY4XCEelMw-- Received: from [12.202.173.2] by web121204.mail.ne1.yahoo.com via HTTP; Thu, 20 Sep 2012 08:46:24 PDT X-Mailer: YahooMailClassic/15.0.8 YahooMailWebService/0.8.121.416 Message-ID: <1348155984.52722.YahooMailClassic@web121204.mail.ne1.yahoo.com> Date: Thu, 20 Sep 2012 08:46:24 -0700 (PDT) From: Jason Usher To: freebsd-fs@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Subject: ZFS stats output - used, compressed, deduped, etc. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Sep 2012 15:46:32 -0000 Hi, I have a ZFS filesystem with compression turned on. Does the "used" property show me the actual data size, or the compressed data size ? If it shows me the compressed size, where can I see the actual data size ? I also wonder about checking status of dedupe - I created my pool without dedupe, and continue to NOT enable dedupe - from zpool history, we see: zpool create -f -O atime=off -O setuid=off -O exec=off -m /mnt/pool pool raidz3 da0 da1 da2 da3 da4 da5 da6 da7 da8 da9 da10 da11 Later, I enabled dedup for just a single filesystem on this pool: zfs set dedup=on pool/dataset and now, I see in 'zpool list' a value for dedupratio: pool dedupratio 1.65x - Why do I see a value here ? Isn't dedupe still OFF for the pool as a whole ? I do NOT want to enable dedupe for the entire pool. Also, why do I not see any dedupe stats for the individual filesystem ? I see compressratio, and I see dedup=on, but I don't see any dedupratio for the filesystem itself... Did turning on dedupe for a single filesystem turn it on for the entire pool ? From owner-freebsd-fs@FreeBSD.ORG Thu Sep 20 21:55:56 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 58C5F106564A for ; Thu, 20 Sep 2012 21:55:56 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 016A88FC16 for ; Thu, 20 Sep 2012 21:55:55 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EAHqQW1CDaFvO/2dsb2JhbAA+BxaFdbg/giABAQUjBFIbDgoCAg0ZAlkGiBYLpyaTBoEhiXshhQ+BEgOVZIEUjw2DA4E+Ihs X-IronPort-AV: E=Sophos;i="4.80,456,1344225600"; d="scan'208";a="179950066" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 20 Sep 2012 17:55:26 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 86877B4014; Thu, 20 Sep 2012 17:55:26 -0400 (EDT) Date: Thu, 20 Sep 2012 17:55:26 -0400 (EDT) From: Rick Macklem To: Konstantin Belousov Message-ID: <1237981048.964353.1348178126537.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20120919061659.GS37286@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Cc: FS List Subject: Re: testing/review of atomic export update patch X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Sep 2012 21:55:56 -0000 Konstantin Belousov wrote: > On Tue, Sep 18, 2012 at 09:34:54AM -0400, Rick Macklem wrote: > > Konstantin Belousov wrote: > > > On Mon, Sep 17, 2012 at 05:32:44PM -0400, Rick Macklem wrote: > > > > Konstantin Belousov wrote: > > > > > On Sun, Sep 16, 2012 at 05:41:25PM -0400, Rick Macklem wrote: > > > > > > Hi, > > > > > > > > > > > > There is a simple patch at: > > > > > > http://people.freebsd.org/~rmacklem/atomic-export.patch > > > > > > that can be applied to a kernel + mountd, so that the new > > > > > > nfsd can be suspended by mountd while the exports are being > > > > > > reloaded. It adds a new "-S" flag to mountd to enable this. > > > > > > (This avoids the long standing bug where clients receive > > > > > > ESTALE > > > > > > replies to RPCs while mountd is reloading exports.) > > > > > > > > > > This looks simple, but also somewhat worrisome. What would > > > > > happen > > > > > if the mountd crashes after nfsd suspension is requested, but > > > > > before > > > > > resume was performed ? > > > > > > > > > > Might be, mountd should check for suspended nfsd on start and > > > > > unsuspend > > > > > it, if some flag is specified ? > > > > Well, I think that happens with the patch as it stands. > > > > > > > > suspend is done if the "-S" option is specified, but that is a > > > > no op > > > > if it is already suspended. The resume is done no matter what > > > > flags > > > > are provided, so mountd will always try and do a "resume". > > > > --> get_exportlist() is always called when mountd is started up > > > > and > > > > it does the resume unconditionally when it completes. > > > > If mountd repeatedly crashes before completing > > > > get_exportlist() > > > > when it is started up, the exports will be all messed up, so > > > > having the nfsd threads suspended doesn't seem so bad for > > > > this > > > > case (which hopefully never happens;-). > > > > > > > > Both suspend and resume are just no ops for unpatched kernels. > > > > > > > > Maybe the comment in front of "resume" should explicitly explain > > > > this, instead of saying resume is harmless to do under all > > > > conditions? > > > > > > > > Thanks for looking at it, rick > > > I see. > > > > > > My another note is that there is no any protection against > > > parallel > > > instances of suspend/resume happen. For instance, one thread could > > > set > > > suspend_nfsd = 1 and be descheduled, while another executes resume > > > code sequence meantime. Then it would see suspend_nfsd != 0, while > > > nfsv4rootfs_lock not held, and tries to unlock it. It seems that > > > nfsv4_unlock would silently exit. The suspending thread resumes, > > > and obtains the lock. You end up with suspend_nfsd == 0 but lock > > > held. > > Yes. I had assumed that mountd would be the only thing using these > > syscalls > > and it is single threaded. (The syscalls can only be done by root > > for the > > obvious reasons.;-) > > > > Maybe the following untested version of the syscalls would be > > better, since > > they would allow multiple concurrent calls to either suspend or > > resume. > > (There would still be an indeterminate case if one thread called > > resume > > concurrently with another few calling suspend, but that is > > unavoidable, > > I think?) > > > > Again, thanks for the comments, rick > > --- untested version of syscalls --- > > } else if ((uap->flag & NFSSVC_SUSPENDNFSD) != 0) { > > NFSLOCKV4ROOTMUTEX(); > > if (suspend_nfsd == 0) { > > /* Lock out all nfsd threads */ > > igotlock = 0; > > while (igotlock == 0 && suspend_nfsd == 0) { > > igotlock = nfsv4_lock(&nfsv4rootfs_lock, 1, > > NULL, NFSV4ROOTLOCKMUTEXPTR, NULL); > > } > > suspend_nfsd = 1; > > } > > NFSUNLOCKV4ROOTMUTEX(); > > error = 0; > > } else if ((uap->flag & NFSSVC_RESUMENFSD) != 0) { > > NFSLOCKV4ROOTMUTEX(); > > if (suspend_nfsd != 0) { > > nfsv4_unlock(&nfsv4rootfs_lock, 0); > > suspend_nfsd = 0; > > } > > NFSUNLOCKV4ROOTMUTEX(); > > error = 0; > > } > > From the cursory look, this variant is an improvement, mostly by > taking > the interlock before testing suspend_nfsd, and using the while loop. > > Is it possible to also make the sleep for the lock interruptible ? > So that blocked mountd could be killed by a signal ? Well, it would require some coding. An extra argument to nfsv4_lock() to indicate to do so and then either the caller would have to check for a pending termination signal when it returns 0 (indicates didn't get lock) or a new return value to indicate EINTR. The latter would require all the calls to it to be changed to recognize the new 3rd return case. Because there are a lot of these calls, I'd tend towards just having the caller check for a pending signal. Not sure if it would make much difference though. The only time it would get stuck in nfsv4_lock() is if the nfsd threads are all wedged and in that case having mountd wedged too probably doesn't make much difference, since the NFS service is toast in that case anyhow. If you think it is worth doing, I can add that. I basically see this as a "stop-gap" fix until such time as something like nfse is done, but since I haven't the time to look at nfse right now, I have no idea when/if that might happen. rick From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 00:03:16 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2C083106566C; Fri, 21 Sep 2012 00:03:16 +0000 (UTC) (envelope-from flo@smeets.im) Received: from mail.solomo.de (mail.solomo.de [IPv6:2a01:238:42c7:9a00::2]) by mx1.freebsd.org (Postfix) with ESMTP id 8C2F68FC1E; Fri, 21 Sep 2012 00:03:15 +0000 (UTC) Received: from mail.solomo.de (localhost [127.0.0.1]) by mail.solomo.de (Postfix) with ESMTP id 790DDC3833; Fri, 21 Sep 2012 02:03:14 +0200 (CEST) X-Virus-Scanned: amavisd-new at solomo.de Received: from mail.solomo.de ([127.0.0.1]) by mail.solomo.de (mail.solomo.de [127.0.0.1]) (amavisd-new, port 10024) with LMTP id eJL7agiouy0y; Fri, 21 Sep 2012 02:03:13 +0200 (CEST) Received: from nibbler-osx.local (unknown [IPv6:2001:4dd0:ff00:8bb6:d806:7e81:457:3997]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.solomo.de (Postfix) with ESMTPSA id 5A5F1C381A; Fri, 21 Sep 2012 02:03:13 +0200 (CEST) Message-ID: <505BAEBF.7070403@smeets.im> Date: Fri, 21 Sep 2012 02:03:11 +0200 From: Florian Smeets User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20120905 Thunderbird/16.0 MIME-Version: 1.0 To: Bryan Drewery References: <20120829060158.GA38721@x2.osted.lan> <20120831052003.GA91340@x2.osted.lan> <20120905201531.GA54452@x2.osted.lan> <20120917140055.GA9037@x2.osted.lan> <505A468E.2080902@FreeBSD.org> In-Reply-To: <505A468E.2080902@FreeBSD.org> X-Enigmail-Version: 1.5a1pre Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig856A503AD0801149A196BDD4" Cc: FreeBSD FS , Peter Holm , attilio@freebsd.org, freebsd-current@freebsd.org Subject: Re: MPSAFE VFS -- List of upcoming actions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2012 00:03:16 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig856A503AD0801149A196BDD4 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 20.09.12 00:26, Bryan Drewery wrote: > On 9/18/2012 9:48 PM, Attilio Rao wrote: >> In addition to fusefs-kmod, Bryan and Florian have also updated >> fusefs-lib and fusefs-ntfs ports. For instance, please refer to this >> e-mail: >> http://lists.freebsd.org/pipermail/freebsd-ports/2012-August/077950.ht= ml >> >> Even if this work is someway independent by the fusefs-kmod import, I >> warmly suggest to all of you to use their patches (and this what we >> have been testing so far too). >=20 > I have committed my updates to sysutils/fusefs-ntfs now. >=20 The sysutils/fusefs-libs port was updated a few minutes ago. Florian --------------enig856A503AD0801149A196BDD4 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iEYEARECAAYFAlBbrsAACgkQapo8P8lCvwmtHQCfc6yPoAmqqlh5DSm/XfJ9PnmY TAcAn1LDy4OziPr+8ydUvSLvHXKTGkrw =ME21 -----END PGP SIGNATURE----- --------------enig856A503AD0801149A196BDD4-- From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 00:24:48 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F3044106566B; Fri, 21 Sep 2012 00:24:47 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id 546508FC0A; Fri, 21 Sep 2012 00:24:45 +0000 (UTC) Received: by lbbgg13 with SMTP id gg13so3927786lbb.13 for ; Thu, 20 Sep 2012 17:24:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; bh=6W00lKwbtN1omyhyOsTSSYofFPMlDSJIchDghmi1WT8=; b=mhtCa1FR/auVB5rcKLF6ozVNBHUfLguw4EbInJXRGlWUBu+yittMGF0Pc9O1mODCxM Ub/XeIkehjK7mYIj3VNGZqEWj9a9TjjhB7YPFRuisSnuUxUz+SmNHx0Lip8RB9+z1WCl r2VqKwaj86qkGwsY4Tex9aIIW1qzpwQPVNc2m0bMIfQIWJh3JYAT06JZWPDtOXsiwtsm ARk6QpWnKNvB/3uRl1KNOwuHDbaqtw1n+m6nQFlzxKJ6bH4hmo6gMC34KtASytfGZ1IO hMj69LmLdBQbLf6PLyKbcjigMsaE5FnC2uDl9rtuh7m/P3Jf+bhbYkDjA+jL6s8vjsYt G8bg== MIME-Version: 1.0 Received: by 10.112.82.66 with SMTP id g2mr1189036lby.15.1348187084804; Thu, 20 Sep 2012 17:24:44 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.112.102.39 with HTTP; Thu, 20 Sep 2012 17:24:44 -0700 (PDT) In-Reply-To: References: <20120829060158.GA38721@x2.osted.lan> <20120831052003.GA91340@x2.osted.lan> <20120905201531.GA54452@x2.osted.lan> <20120917140055.GA9037@x2.osted.lan> Date: Fri, 21 Sep 2012 01:24:44 +0100 X-Google-Sender-Auth: e9-qkoeyV4fWH0A9ttW69Y7epXU Message-ID: From: Attilio Rao To: FreeBSD FS , freebsd-current@freebsd.org, Peter Holm , =?UTF-8?Q?Gustau_P=C3=A9rez?= , George Neville-Neil , Florian Smeets , bdrewery@freebsd.org Content-Type: text/plain; charset=UTF-8 Cc: Subject: Re: MPSAFE VFS -- List of upcoming actions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: attilio@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2012 00:24:48 -0000 On Fri, Sep 21, 2012 at 1:22 AM, Attilio Rao wrote: [ trimm ] > > You can use the branch directly or this patch against -CURRENT at 240752: > http://www.freebsd.org/~attilio/fuse_import/fuse_240752.patch > > In order to test this work, then, you just need to patch (or use > directly the branch) your sources with this patch and install ports > normally as they work. Forgot to tell: with the new branch you *must not* install fusefs-kmod port. Please test it from a pristine installation or double-check if your fusefs-kmod port is completely gone (if already installed) before to report bugs as its functionality could be tainting the branch one. Attilio -- Peace can only be achieved by understanding - A. Einstein From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 00:28:26 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 44A03106564A; Fri, 21 Sep 2012 00:28:26 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-lb0-f182.google.com (mail-lb0-f182.google.com [209.85.217.182]) by mx1.freebsd.org (Postfix) with ESMTP id A119D8FC08; Fri, 21 Sep 2012 00:28:24 +0000 (UTC) Received: by lbbgg13 with SMTP id gg13so3930523lbb.13 for ; Thu, 20 Sep 2012 17:28:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; bh=Phz+wLnOM6ybKNOMNmFaJ5/EXOoTOgo/FllHDRXoHYA=; b=hU0D0yC2H1Zmdx4/fP7IOxG7TjNGhLkXE7qRwuuS3n0lmU7VW/mxkFuY+INnDttACk E41/0EmKeZnLNuPJOoivDd/RjxpZMVb+rTcdHx4czQWZdjG0KofliSYy+TT1sU/6RJ2y YXWYTzDyJgM4jcHdKkZXmry6JMU4fs0+avfvWRehcyvvgeZzPRn8boibhibx99F+mJVT HSwn+IWlqTW7/eUNb/gbdTPwgHS/R7WqAZBTRksy9svQNlpFnRxWss9E7KoZJd/GYlOa yQ5OAHM20BwUNyuk9Qsl0a73Z19ObRJjmXCg4mVvPrP/Bbo7LMpCfrIgOYlSXH8OH00L DSTg== MIME-Version: 1.0 Received: by 10.152.112.233 with SMTP id it9mr2813805lab.40.1348186974838; Thu, 20 Sep 2012 17:22:54 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.112.102.39 with HTTP; Thu, 20 Sep 2012 17:22:54 -0700 (PDT) In-Reply-To: References: <20120829060158.GA38721@x2.osted.lan> <20120831052003.GA91340@x2.osted.lan> <20120905201531.GA54452@x2.osted.lan> <20120917140055.GA9037@x2.osted.lan> Date: Fri, 21 Sep 2012 01:22:54 +0100 X-Google-Sender-Auth: wZLKdGCIiTExkzPYkAJj7fAHXKw Message-ID: From: Attilio Rao To: FreeBSD FS , freebsd-current@freebsd.org, Peter Holm , =?UTF-8?Q?Gustau_P=C3=A9rez?= , George Neville-Neil , Florian Smeets , bdrewery@freebsd.org Content-Type: text/plain; charset=UTF-8 Cc: Subject: Re: MPSAFE VFS -- List of upcoming actions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: attilio@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2012 00:28:26 -0000 On Wed, Sep 19, 2012 at 3:48 AM, Attilio Rao wrote: > On Fri, Jul 13, 2012 at 12:18 AM, Attilio Rao wrote: >> 2012/7/4 Attilio Rao : >>> 2012/6/29 Attilio Rao : >>>> As already published several times, according to the following plan: >>>> http://wiki.freebsd.org/NONMPSAFE_DEORBIT_VFS >>>> >>> >>> I still haven't heard from Vivien or Edward, anyway as NTFS is >>> basically only used RO these days (also the mount_ntfs code just >>> permits RO mounting) I stripped all the uncomplete/bogus write support >>> with the following patch: >>> http://www.freebsd.org/~attilio/ntfs_remove_write.patch >>> >>> This is an attempt to make the code smaller and possibly just focus on >>> the locking that really matter (as read-only filesystem). >>> On some points of the patch I'm a bit less sure as we could easily >>> take into account also write for things like vaccess() arguments, and >>> make easier to re-add correct write support at some point in the >>> future, but still force RO, even if the approach used in the patch is >>> more correct IMHO. >>> As an added bonus this patch cleans some dirty code in the mount >>> operation and fixes a bug as vfs_mountedfrom() is called before real >>> mounting is completed and can still fail. >> >> A quick update on this. >> It looks like NTFS won't be completed for this GSoC thus I seriously >> need to find an alternative to not loose the NTFS support entirely. >> >> I tried to look into the NTFS implementation right now and it is >> really a poor support. As Peter has also verified, it can deadlock in >> no-time, it compeltely violates VFS rules, etc. IMHO it deserves a >> complete rewrite if we would still support in-kernel NTFS. I also >> tried to look at the NetBSD implementation. Their code is someway >> similar to our, but they used very complicated (and very dirty) code >> to do the locking. Even if I don't know well enough NetBSD VFS, I have >> the impression not all the races are correctly handled. Definitively, >> not something I would like to port. >> >> Considering all that the only viable option would be meaning an >> userland filesystem implementation. My preferred choice would be to >> import PUFFS and librefuse on top of it but honestly it requires a lot >> of time to be completed, time which I don't currently have as in 2 >> months Giant must be gone by the VFS. >> >> I then decided to switch to gnn's rewamp of FUSE patches. You can find >> his initial e-mail here: >> http://lists.freebsd.org/pipermail/freebsd-fs/2012-March/013876.html >> >> I've precisely got the second version of George's patch and created >> this dolphin branch: >> svn://svn.freebsd.org/base/projects/fuse >> >> I'm fixing low hanging fruit for the moment (see r238411 for example) >> and I still have to make a throughful review. >> However my idea is to commit the support once: >> - ntfs-3g is well stress-tested and proves to be bug-free >> - there is no major/big technical issue pending after the reviews > > In the last weeks Peter, Florian, Gustau and I have been working in > stabilizing fuse support. In the specific, Peter has worked hard on > producing several utilities to nit stress-test fuse and in particular > ntfs, Florian has improved fuse related ports (as explained later) and > Gustau has done sparse testing. I feel moderately satisfied by the > level of stability of fuse now to propose to wider usage, in > particular given the huge amount of complaints I'm hearing around > about occasional fuse users. > > The final target of the project is to completely import into base the > content of fusefs-kmod starting from earlier posted patches by George. > So far, we took care only of importing in the fuse branch the kernel > part, so that fusefs-kmod userland part is still needed to be > installed from ports, but I was studying the mount_fusefs licensing > before to process with the import for the userland bits of it. > > The fixing has been happening here: > svn://svn.freebsd.org/base/projects/fuse/ > > which is essentially an HEAD branch + fuse kernel components. In order > to get fuse, please compile a kernel from this branch with FUSE option > or simply build and load fuse module. > Alternatively, a kernel patch that should work with HEAD@240684 is here: > http://www.freebsd.org/~attilio/fuse_import/fuse_240684.patch > > I guess the patch can easilly apply to all FreeBSD branches, really, > but it is not tested to anything else different then -CURRENT. > > As said you still need currently to build fusefs-kmod port. However > you need these further patches, to be put in the fusefs-kmod/files/ > directory:: > http://www.freebsd.org/~attilio/fuse_import/patch-Makefile > http://www.freebsd.org/~attilio/fuse_import/patch-mount_fusefs__mount_fusefs2.c > > They both disable the old kernel building/linking and import new > functionality to let the new kernel support work well in presence of > many consumers. > > In addition to fusefs-kmod, Bryan and Florian have also updated > fusefs-lib and fusefs-ntfs ports. For instance, please refer to this > e-mail: > http://lists.freebsd.org/pipermail/freebsd-ports/2012-August/077950.html > > Even if this work is someway independent by the fusefs-kmod import, I > warmly suggest to all of you to use their patches (and this what we > have been testing so far too. So, after Bryan and Florian ports update, I've also committed userland part of fusefs-kmod and now the project branch fully mirrors functionality of fusefs-kmod. The code in projects/fuse, infact, will also install mount_fusefs as part of the fuse support. You can use the branch directly or this patch against -CURRENT at 240752: http://www.freebsd.org/~attilio/fuse_import/fuse_240752.patch In order to test this work, then, you just need to patch (or use directly the branch) your sources with this patch and install ports normally as they work. If no major bugs are found before October 4th, this is the code that is going to be committed to HEAD. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 08:05:29 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 80A50106564A for ; Fri, 21 Sep 2012 08:05:29 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 83B718FC12 for ; Fri, 21 Sep 2012 08:05:27 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q8L85Thp099520; Fri, 21 Sep 2012 11:05:29 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q8L85Gae022597; Fri, 21 Sep 2012 11:05:16 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q8L85GtC022596; Fri, 21 Sep 2012 11:05:16 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 21 Sep 2012 11:05:16 +0300 From: Konstantin Belousov To: Rick Macklem Message-ID: <20120921080516.GC37286@deviant.kiev.zoral.com.ua> References: <20120919061659.GS37286@deviant.kiev.zoral.com.ua> <1237981048.964353.1348178126537.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="7iqzzmjEMnnYslDD" Content-Disposition: inline In-Reply-To: <1237981048.964353.1348178126537.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Mutt/1.5.21 (2010-09-15) X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: FS List Subject: Re: testing/review of atomic export update patch X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2012 08:05:29 -0000 --7iqzzmjEMnnYslDD Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Sep 20, 2012 at 05:55:26PM -0400, Rick Macklem wrote: > Konstantin Belousov wrote: > > On Tue, Sep 18, 2012 at 09:34:54AM -0400, Rick Macklem wrote: > > > Konstantin Belousov wrote: > > > > On Mon, Sep 17, 2012 at 05:32:44PM -0400, Rick Macklem wrote: > > > > > Konstantin Belousov wrote: > > > > > > On Sun, Sep 16, 2012 at 05:41:25PM -0400, Rick Macklem wrote: > > > > > > > Hi, > > > > > > > > > > > > > > There is a simple patch at: > > > > > > > http://people.freebsd.org/~rmacklem/atomic-export.patch > > > > > > > that can be applied to a kernel + mountd, so that the new > > > > > > > nfsd can be suspended by mountd while the exports are being > > > > > > > reloaded. It adds a new "-S" flag to mountd to enable this. > > > > > > > (This avoids the long standing bug where clients receive > > > > > > > ESTALE > > > > > > > replies to RPCs while mountd is reloading exports.) > > > > > > > > > > > > This looks simple, but also somewhat worrisome. What would > > > > > > happen > > > > > > if the mountd crashes after nfsd suspension is requested, but > > > > > > before > > > > > > resume was performed ? > > > > > > > > > > > > Might be, mountd should check for suspended nfsd on start and > > > > > > unsuspend > > > > > > it, if some flag is specified ? > > > > > Well, I think that happens with the patch as it stands. > > > > > > > > > > suspend is done if the "-S" option is specified, but that is a > > > > > no op > > > > > if it is already suspended. The resume is done no matter what > > > > > flags > > > > > are provided, so mountd will always try and do a "resume". > > > > > --> get_exportlist() is always called when mountd is started up > > > > > and > > > > > it does the resume unconditionally when it completes. > > > > > If mountd repeatedly crashes before completing > > > > > get_exportlist() > > > > > when it is started up, the exports will be all messed up, so > > > > > having the nfsd threads suspended doesn't seem so bad for > > > > > this > > > > > case (which hopefully never happens;-). > > > > > > > > > > Both suspend and resume are just no ops for unpatched kernels. > > > > > > > > > > Maybe the comment in front of "resume" should explicitly explain > > > > > this, instead of saying resume is harmless to do under all > > > > > conditions? > > > > > > > > > > Thanks for looking at it, rick > > > > I see. > > > > > > > > My another note is that there is no any protection against > > > > parallel > > > > instances of suspend/resume happen. For instance, one thread could > > > > set > > > > suspend_nfsd =3D 1 and be descheduled, while another executes resume > > > > code sequence meantime. Then it would see suspend_nfsd !=3D 0, while > > > > nfsv4rootfs_lock not held, and tries to unlock it. It seems that > > > > nfsv4_unlock would silently exit. The suspending thread resumes, > > > > and obtains the lock. You end up with suspend_nfsd =3D=3D 0 but lock > > > > held. > > > Yes. I had assumed that mountd would be the only thing using these > > > syscalls > > > and it is single threaded. (The syscalls can only be done by root > > > for the > > > obvious reasons.;-) > > > > > > Maybe the following untested version of the syscalls would be > > > better, since > > > they would allow multiple concurrent calls to either suspend or > > > resume. > > > (There would still be an indeterminate case if one thread called > > > resume > > > concurrently with another few calling suspend, but that is > > > unavoidable, > > > I think?) > > > > > > Again, thanks for the comments, rick > > > --- untested version of syscalls --- > > > } else if ((uap->flag & NFSSVC_SUSPENDNFSD) !=3D 0) { > > > NFSLOCKV4ROOTMUTEX(); > > > if (suspend_nfsd =3D=3D 0) { > > > /* Lock out all nfsd threads */ > > > igotlock =3D 0; > > > while (igotlock =3D=3D 0 && suspend_nfsd =3D=3D 0) { > > > igotlock =3D nfsv4_lock(&nfsv4rootfs_lock, 1, > > > NULL, NFSV4ROOTLOCKMUTEXPTR, NULL); > > > } > > > suspend_nfsd =3D 1; > > > } > > > NFSUNLOCKV4ROOTMUTEX(); > > > error =3D 0; > > > } else if ((uap->flag & NFSSVC_RESUMENFSD) !=3D 0) { > > > NFSLOCKV4ROOTMUTEX(); > > > if (suspend_nfsd !=3D 0) { > > > nfsv4_unlock(&nfsv4rootfs_lock, 0); > > > suspend_nfsd =3D 0; > > > } > > > NFSUNLOCKV4ROOTMUTEX(); > > > error =3D 0; > > > } > >=20 > > From the cursory look, this variant is an improvement, mostly by > > taking > > the interlock before testing suspend_nfsd, and using the while loop. > >=20 > > Is it possible to also make the sleep for the lock interruptible ? > > So that blocked mountd could be killed by a signal ? > Well, it would require some coding. An extra argument to nfsv4_lock() > to indicate to do so and then either the caller would have to check > for a pending termination signal when it returns 0 (indicates didn't get > lock) or a new return value to indicate EINTR. The latter would require > all the calls to it to be changed to recognize the new 3rd return case. > Because there are a lot of these calls, I'd tend towards just having the > caller check for a pending signal. >=20 > Not sure if it would make much difference though. The only time it > would get stuck in nfsv4_lock() is if the nfsd threads are all wedged > and in that case having mountd wedged too probably doesn't make much > difference, since the NFS service is toast in that case anyhow. >=20 > If you think it is worth doing, I can add that. I basically see this > as a "stop-gap" fix until such time as something like nfse is done, > but since I haven't the time to look at nfse right now, I have no > idea when/if that might happen. Ok, please go ahead with the patch. Having the patch even in its current form is obviously better then not to have it. If the wedged mountd appears to be annoying enough for me, I would do the change. Thanks. --7iqzzmjEMnnYslDD Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAlBcH7wACgkQC3+MBN1Mb4hrPwCdH6HrPJL/FeYl2hofEkPEB299 ISQAn2OuFrZuC0lpmL/lFF1xen2APSs1 =UIog -----END PGP SIGNATURE----- --7iqzzmjEMnnYslDD-- From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 21:11:46 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D5488106566B for ; Fri, 21 Sep 2012 21:11:46 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 716C88FC14 for ; Fri, 21 Sep 2012 21:11:45 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EAJrXXFCDaFvO/2dsb2JhbAA+BxaFdbkdgiABAQQBIwRSBRYOCgICDRkCWQaIEgYLpjCSeoEhiXshhHOBEgOVZIEVjw2DA4E+Ihs X-IronPort-AV: E=Sophos;i="4.80,465,1344225600"; d="scan'208";a="180103908" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 21 Sep 2012 17:11:38 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id BECC1B4026; Fri, 21 Sep 2012 17:11:38 -0400 (EDT) Date: Fri, 21 Sep 2012 17:11:38 -0400 (EDT) From: Rick Macklem To: Konstantin Belousov Message-ID: <683271364.1028517.1348261898771.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20120921080516.GC37286@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: FS List Subject: Re: testing/review of atomic export update patch X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2012 21:11:46 -0000 Konstantin Belousov wrote: > On Thu, Sep 20, 2012 at 05:55:26PM -0400, Rick Macklem wrote: > > Konstantin Belousov wrote: > > > On Tue, Sep 18, 2012 at 09:34:54AM -0400, Rick Macklem wrote: > > > > Konstantin Belousov wrote: > > > > > On Mon, Sep 17, 2012 at 05:32:44PM -0400, Rick Macklem wrote: > > > > > > Konstantin Belousov wrote: > > > > > > > On Sun, Sep 16, 2012 at 05:41:25PM -0400, Rick Macklem > > > > > > > wrote: > > > > > > > > Hi, > > > > > > > > > > > > > > > > There is a simple patch at: > > > > > > > > http://people.freebsd.org/~rmacklem/atomic-export.patch > > > > > > > > that can be applied to a kernel + mountd, so that the > > > > > > > > new > > > > > > > > nfsd can be suspended by mountd while the exports are > > > > > > > > being > > > > > > > > reloaded. It adds a new "-S" flag to mountd to enable > > > > > > > > this. > > > > > > > > (This avoids the long standing bug where clients receive > > > > > > > > ESTALE > > > > > > > > replies to RPCs while mountd is reloading exports.) > > > > > > > > > > > > > > This looks simple, but also somewhat worrisome. What would > > > > > > > happen > > > > > > > if the mountd crashes after nfsd suspension is requested, > > > > > > > but > > > > > > > before > > > > > > > resume was performed ? > > > > > > > > > > > > > > Might be, mountd should check for suspended nfsd on start > > > > > > > and > > > > > > > unsuspend > > > > > > > it, if some flag is specified ? > > > > > > Well, I think that happens with the patch as it stands. > > > > > > > > > > > > suspend is done if the "-S" option is specified, but that is > > > > > > a > > > > > > no op > > > > > > if it is already suspended. The resume is done no matter > > > > > > what > > > > > > flags > > > > > > are provided, so mountd will always try and do a "resume". > > > > > > --> get_exportlist() is always called when mountd is started > > > > > > up > > > > > > and > > > > > > it does the resume unconditionally when it completes. > > > > > > If mountd repeatedly crashes before completing > > > > > > get_exportlist() > > > > > > when it is started up, the exports will be all messed > > > > > > up, so > > > > > > having the nfsd threads suspended doesn't seem so bad > > > > > > for > > > > > > this > > > > > > case (which hopefully never happens;-). > > > > > > > > > > > > Both suspend and resume are just no ops for unpatched > > > > > > kernels. > > > > > > > > > > > > Maybe the comment in front of "resume" should explicitly > > > > > > explain > > > > > > this, instead of saying resume is harmless to do under all > > > > > > conditions? > > > > > > > > > > > > Thanks for looking at it, rick > > > > > I see. > > > > > > > > > > My another note is that there is no any protection against > > > > > parallel > > > > > instances of suspend/resume happen. For instance, one thread > > > > > could > > > > > set > > > > > suspend_nfsd = 1 and be descheduled, while another executes > > > > > resume > > > > > code sequence meantime. Then it would see suspend_nfsd != 0, > > > > > while > > > > > nfsv4rootfs_lock not held, and tries to unlock it. It seems > > > > > that > > > > > nfsv4_unlock would silently exit. The suspending thread > > > > > resumes, > > > > > and obtains the lock. You end up with suspend_nfsd == 0 but > > > > > lock > > > > > held. > > > > Yes. I had assumed that mountd would be the only thing using > > > > these > > > > syscalls > > > > and it is single threaded. (The syscalls can only be done by > > > > root > > > > for the > > > > obvious reasons.;-) > > > > > > > > Maybe the following untested version of the syscalls would be > > > > better, since > > > > they would allow multiple concurrent calls to either suspend or > > > > resume. > > > > (There would still be an indeterminate case if one thread called > > > > resume > > > > concurrently with another few calling suspend, but that is > > > > unavoidable, > > > > I think?) > > > > > > > > Again, thanks for the comments, rick > > > > --- untested version of syscalls --- > > > > } else if ((uap->flag & NFSSVC_SUSPENDNFSD) != 0) { > > > > NFSLOCKV4ROOTMUTEX(); > > > > if (suspend_nfsd == 0) { > > > > /* Lock out all nfsd threads */ > > > > igotlock = 0; > > > > while (igotlock == 0 && suspend_nfsd == 0) { > > > > igotlock = nfsv4_lock(&nfsv4rootfs_lock, 1, > > > > NULL, NFSV4ROOTLOCKMUTEXPTR, NULL); > > > > } > > > > suspend_nfsd = 1; > > > > } > > > > NFSUNLOCKV4ROOTMUTEX(); > > > > error = 0; > > > > } else if ((uap->flag & NFSSVC_RESUMENFSD) != 0) { > > > > NFSLOCKV4ROOTMUTEX(); > > > > if (suspend_nfsd != 0) { > > > > nfsv4_unlock(&nfsv4rootfs_lock, 0); > > > > suspend_nfsd = 0; > > > > } > > > > NFSUNLOCKV4ROOTMUTEX(); > > > > error = 0; > > > > } > > > > > > From the cursory look, this variant is an improvement, mostly by > > > taking > > > the interlock before testing suspend_nfsd, and using the while > > > loop. > > > > > > Is it possible to also make the sleep for the lock interruptible ? > > > So that blocked mountd could be killed by a signal ? > > Well, it would require some coding. An extra argument to > > nfsv4_lock() > > to indicate to do so and then either the caller would have to check > > for a pending termination signal when it returns 0 (indicates didn't > > get > > lock) or a new return value to indicate EINTR. The latter would > > require > > all the calls to it to be changed to recognize the new 3rd return > > case. > > Because there are a lot of these calls, I'd tend towards just having > > the > > caller check for a pending signal. > > > > Not sure if it would make much difference though. The only time it > > would get stuck in nfsv4_lock() is if the nfsd threads are all > > wedged > > and in that case having mountd wedged too probably doesn't make much > > difference, since the NFS service is toast in that case anyhow. > > > > If you think it is worth doing, I can add that. I basically see this > > as a "stop-gap" fix until such time as something like nfse is done, > > but since I haven't the time to look at nfse right now, I have no > > idea when/if that might happen. > > Ok, please go ahead with the patch. Having the patch even in its > current > form is obviously better then not to have it. If the wedged mountd > appears to be annoying enough for me, I would do the change. > Ah, that's ok, I'll do it. I'll do it as a separate commit first, since I can't see it being controversial. I'll cobble to-gether a version of the atomic-export patch using it after that. Have a good weekend, rick > Thanks. From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 21:18:21 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E5819106564A for ; Fri, 21 Sep 2012 21:18:21 +0000 (UTC) (envelope-from tjg@soe.ucsc.edu) Received: from mail-pb0-f54.google.com (mail-pb0-f54.google.com [209.85.160.54]) by mx1.freebsd.org (Postfix) with ESMTP id B6C168FC08 for ; Fri, 21 Sep 2012 21:18:21 +0000 (UTC) Received: by pbbrp2 with SMTP id rp2so9255341pbb.13 for ; Fri, 21 Sep 2012 14:18:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ucsc.edu; s=ucsc-google; h=mime-version:date:message-id:subject:from:to:content-type; bh=mWjJELHmgZ+JzW5187eVF+ddUokUSpsU6bSkldKOVfY=; b=jk4zxD+TFAzFq9jCPa6FV5y5aiGz6lBDzbXqhhwcWcj7lpsKePmB8fpBGs5DTg6B6L l85x+Kg91MFFQnZHafZ02ae0aFLEU7ldzgigI2cymVjFmD9uSFyE00J11MbjYLnpex8D v6JpdeJmJPCtq66+9+R9UNVPKORymsf9E7srI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type :x-gm-message-state; bh=mWjJELHmgZ+JzW5187eVF+ddUokUSpsU6bSkldKOVfY=; b=YgB7o2miftTx1VDTdIXH7QetRGVWfGtbnSW6aUBxM38s0Kv7/pvSLHGPWt1rSGiET0 8wM1zF8zhOh2BY0J1QA+Uz/pR+GV8n2Sq8RxrEWxkkxv4NK35vL15eSPlvaW0Uw1NWvl 22gBWH+zQ2LgCRoHme7hFox5w/fwCGxWMov/KTHpB7pho+7V/5UCyLNEW3kM24Z3jP5E eOPlkrFjJrJ5dwEfjLZ++WYyCPlkP8ajfFRBOVlc8uqTWvD77dYVLVO6luVUk8OOlq2a gLrPjoDalv592EMnvv2Si+Lsff6OgvWkFQmN6MvUYKXhwqcH9zNjOj1f746drtf9Moax lKKw== MIME-Version: 1.0 Received: by 10.68.218.196 with SMTP id pi4mr18418366pbc.128.1348262301144; Fri, 21 Sep 2012 14:18:21 -0700 (PDT) Received: by 10.68.25.69 with HTTP; Fri, 21 Sep 2012 14:18:21 -0700 (PDT) Date: Fri, 21 Sep 2012 14:18:21 -0700 Message-ID: From: Tim Gustafson To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Gm-Message-State: ALoCoQk2CUhno3qMOHE+AJg+ih/TPSqDoctWg+PStjxbVJSIYdTSCuoXXnIXVEnup/Ry6UMRHkI1 Subject: Exporting ZFS File System to Multiple Subnets X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2012 21:18:22 -0000 Hi, I Googled around about exporting ZFS file systems to multiple subnets, but most of what I found was years old, so I thought I'd ask to see what the current state of things are. We have about 2,000 file systems that we need to export to a handful of subnets. Most file systems are exported to the same set of subnets. If I were setting up /etc/exports, I would do something like: /export/home -alldirs -network=1.2.3.0/22 /export/home -alldirs -network=1.2.3.0/23 /export/projects -alldirs -network=4.5.6.0/22 /export/projects -alldirs -network=4.5.6.0/23 Perhaps followed by some additional lines to specify additional export networks for specific filesystems: /export/projects/foo -network 7.8.9.0/24 /export/projects/bar -network 9.8.7.0/24 But, FreeBSD's implementation of the "zfs sharenfs" property does not allow multiple subnets to be specified. And it seems that ZFS somehow gets in the way of /etc/exports for ZFS file systems, so that if I turn off sharenfs ("zfs inherit sharenfs tank/export; zfs inherit sharenfs tank/projects") it actually blocks the export lines in /etc/exports from being mounted. So, how can I use FreeBSD and ZFS to export file systems in this way? It seems like sharenfs is a dead end, and it also seems like /etc/exports doesn't work because ZFS gets in the way. This seems like a really significant impediment to using FreeBSD as a ZFS file server for anything other than the most basic of configurations. Is there some magic that I can use to at least work around this limitation for now? Ideally, we'd just move to NFSv4, but a significant portion of our clients are not NFSv4 ready, so that's not an option. -- Tim Gustafson tjg@soe.ucsc.edu 831-459-5354 Baskin Engineering, Room 313A From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 21:40:46 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 26F3D106566B for ; Fri, 21 Sep 2012 21:40:46 +0000 (UTC) (envelope-from zeus@ibs.dn.ua) Received: from relay.ibs.dn.ua (relay.ibs.dn.ua [91.216.196.25]) by mx1.freebsd.org (Postfix) with ESMTP id 9614E8FC12 for ; Fri, 21 Sep 2012 21:40:45 +0000 (UTC) Received: from ibs.dn.ua (relay.ibs.dn.ua [91.216.196.25]) by relay.ibs.dn.ua with ESMTP id q8LLeaKi088464; Sat, 22 Sep 2012 00:40:36 +0300 (EEST) Message-ID: <20120922004036.88462@relay.ibs.dn.ua> Date: Sat, 22 Sep 2012 00:40:36 +0300 From: Zeus Panchenko To: "Tim Gustafson" In-reply-to: Your message of Fri, 21 Sep 2012 14:18:21 -0700 References: Organization: I.B.S. LLC X-Mailer: MH-E 8.2; GNU Mailutils 2.99.97; GNU Emacs 23.4.1 X-Face: &sReWXo3Iwtqql1[My(t1Gkx; y?KF@KF`4X+'9Cs@PtK^y%}^.>Mtbpyz6U=,Op:KPOT.uG )Nvx`=er!l?WASh7KeaGhga"1[&yz$_7ir'cVp7o%CGbJ/V)j/=]vzvvcqcZkf; JDurQG6wTg+?/xA go`}1.Ze//K; Fk&/&OoHd'[b7iGt2UO>o(YskCT[_D)kh4!yY'<&:yt+zM=A`@`~9U+P[qS:f; #9z~ Or/Bo#N-'S'!'[3Wog'ADkyMqmGDvga?WW)qd=?)`Y&k=o}>!ST\ Cc: freebsd-fs@freebsd.org Subject: Re: Exporting ZFS File System to Multiple Subnets X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Zeus Panchenko List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2012 21:40:46 -0000 Tim Gustafson wrote: > > But, FreeBSD's implementation of the "zfs sharenfs" property does not > allow multiple subnets to be specified. the only way to do that I know is described here: http://freebsd.1045724.n5.nabble.com/zfs-sharenfs-to-multiple-subnets-found-a-dirty-looking-hack-td4030378.html looks weird but works ... -- Zeus V. Panchenko jid:zeus@im.ibs.dn.ua IT Dpt., I.B.S. LLC GMT+2 (EET) From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 21:42:31 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8089E1065670 for ; Fri, 21 Sep 2012 21:42:31 +0000 (UTC) (envelope-from tjg@soe.ucsc.edu) Received: from mail-pb0-f54.google.com (mail-pb0-f54.google.com [209.85.160.54]) by mx1.freebsd.org (Postfix) with ESMTP id 4D45A8FC08 for ; Fri, 21 Sep 2012 21:42:31 +0000 (UTC) Received: by pbbrp2 with SMTP id rp2so9289230pbb.13 for ; Fri, 21 Sep 2012 14:42:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ucsc.edu; s=ucsc-google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=mElFXrdCMhsC53iL6AzgD2znxJA11SAJ4//Swf7QPv4=; b=ZVzuUaQZG6ipkdjuTd+TWHRIpyLK0NZ0b1HwVjM39+rWcg3wBy4h22aONcIqhejL8c jqsPUp13r175s1SRu1oPKta5Tg7RXtEvIObBgEq15X8squcboJt7IdgP6lrDwKnxiO90 00bB09h2b2k6piXOsyFLd8Rtkladj6tP/HzN8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:x-gm-message-state; bh=mElFXrdCMhsC53iL6AzgD2znxJA11SAJ4//Swf7QPv4=; b=lu2/LxWPUhhUCXK1TErgkKVXjJb2On/P/wcEatIbQPb0exSc1EMuWRpleDmZXNeco7 IrytNwcJvfB4zcNBsNOUewI5XvcgKz3WJh3f3S0ZBjjV0QlynRXJ4Hz0BYAwfwJNCsOa wvaJRHcVofB3QCtt9tLAW/5L4IGIVeEngDjfJUZUgQVurgDsvcNzKb8M6GD5CH21Jh9n qq43/RJz/pkNUMCQtLmlral6qyZC+qQtNbHuhSHi+RjjsUaLzRWBA9S7IfLixClqGKVR JkPEZE4mmixBoQ6oOlKUQo1SWqcvf/vWAxc7Y9WRhtfZcjde4txridD7sxAxWJNs5kI0 U6Xw== MIME-Version: 1.0 Received: by 10.68.222.226 with SMTP id qp2mr18608947pbc.57.1348263751009; Fri, 21 Sep 2012 14:42:31 -0700 (PDT) Received: by 10.68.25.69 with HTTP; Fri, 21 Sep 2012 14:42:30 -0700 (PDT) In-Reply-To: <20120922004036.88462@relay.ibs.dn.ua> References: <20120922004036.88462@relay.ibs.dn.ua> Date: Fri, 21 Sep 2012 14:42:30 -0700 Message-ID: From: Tim Gustafson To: Zeus Panchenko Content-Type: text/plain; charset=UTF-8 X-Gm-Message-State: ALoCoQn3IgV12QQF97sAtLBP3Hp1ElK6VkhZjNhor6+ZFABiXK0b+omuetLfB71ICr24S4BpX0+f Cc: freebsd-fs@freebsd.org Subject: Re: Exporting ZFS File System to Multiple Subnets X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2012 21:42:31 -0000 > the only way to do that I know is described here: > http://freebsd.1045724.n5.nabble.com/zfs-sharenfs-to-multiple-subnets-found-a-dirty-looking-hack-td4030378.html I've seen that, but it certainly feels "dirty". -- Tim Gustafson tjg@soe.ucsc.edu 831-459-5354 Baskin Engineering, Room 313A From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 22:38:43 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 433FF106564A for ; Fri, 21 Sep 2012 22:38:43 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id D22368FC08 for ; Fri, 21 Sep 2012 22:38:42 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EAJXrXFCDaFvO/2dsb2JhbAA+BxaFdbkegiABAQUjBFIbDgoCAg0ZAlkGhiSBdAumMJJ4gSGJeyGEc4ESA5VkgRWPDYMDgT4JGRs X-IronPort-AV: E=Sophos;i="4.80,465,1344225600"; d="scan'208";a="180112553" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 21 Sep 2012 18:38:41 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 91D7679463; Fri, 21 Sep 2012 18:38:41 -0400 (EDT) Date: Fri, 21 Sep 2012 18:38:41 -0400 (EDT) From: Rick Macklem To: Konstantin Belousov Message-ID: <1697573610.1030942.1348267121541.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20120921080516.GC37286@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: FS List Subject: Re: testing/review of atomic export update patch X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2012 22:38:43 -0000 Konstantin Belousov wrote: > On Thu, Sep 20, 2012 at 05:55:26PM -0400, Rick Macklem wrote: > > Konstantin Belousov wrote: > > > On Tue, Sep 18, 2012 at 09:34:54AM -0400, Rick Macklem wrote: > > > > Konstantin Belousov wrote: > > > > > On Mon, Sep 17, 2012 at 05:32:44PM -0400, Rick Macklem wrote: > > > > > > Konstantin Belousov wrote: > > > > > > > On Sun, Sep 16, 2012 at 05:41:25PM -0400, Rick Macklem > > > > > > > wrote: > > > > > > > > Hi, > > > > > > > > > > > > > > > > There is a simple patch at: > > > > > > > > http://people.freebsd.org/~rmacklem/atomic-export.patch > > > > > > > > that can be applied to a kernel + mountd, so that the > > > > > > > > new > > > > > > > > nfsd can be suspended by mountd while the exports are > > > > > > > > being > > > > > > > > reloaded. It adds a new "-S" flag to mountd to enable > > > > > > > > this. > > > > > > > > (This avoids the long standing bug where clients receive > > > > > > > > ESTALE > > > > > > > > replies to RPCs while mountd is reloading exports.) > > > > > > > > > > > > > > This looks simple, but also somewhat worrisome. What would > > > > > > > happen > > > > > > > if the mountd crashes after nfsd suspension is requested, > > > > > > > but > > > > > > > before > > > > > > > resume was performed ? > > > > > > > > > > > > > > Might be, mountd should check for suspended nfsd on start > > > > > > > and > > > > > > > unsuspend > > > > > > > it, if some flag is specified ? > > > > > > Well, I think that happens with the patch as it stands. > > > > > > > > > > > > suspend is done if the "-S" option is specified, but that is > > > > > > a > > > > > > no op > > > > > > if it is already suspended. The resume is done no matter > > > > > > what > > > > > > flags > > > > > > are provided, so mountd will always try and do a "resume". > > > > > > --> get_exportlist() is always called when mountd is started > > > > > > up > > > > > > and > > > > > > it does the resume unconditionally when it completes. > > > > > > If mountd repeatedly crashes before completing > > > > > > get_exportlist() > > > > > > when it is started up, the exports will be all messed > > > > > > up, so > > > > > > having the nfsd threads suspended doesn't seem so bad > > > > > > for > > > > > > this > > > > > > case (which hopefully never happens;-). > > > > > > > > > > > > Both suspend and resume are just no ops for unpatched > > > > > > kernels. > > > > > > > > > > > > Maybe the comment in front of "resume" should explicitly > > > > > > explain > > > > > > this, instead of saying resume is harmless to do under all > > > > > > conditions? > > > > > > > > > > > > Thanks for looking at it, rick > > > > > I see. > > > > > > > > > > My another note is that there is no any protection against > > > > > parallel > > > > > instances of suspend/resume happen. For instance, one thread > > > > > could > > > > > set > > > > > suspend_nfsd = 1 and be descheduled, while another executes > > > > > resume > > > > > code sequence meantime. Then it would see suspend_nfsd != 0, > > > > > while > > > > > nfsv4rootfs_lock not held, and tries to unlock it. It seems > > > > > that > > > > > nfsv4_unlock would silently exit. The suspending thread > > > > > resumes, > > > > > and obtains the lock. You end up with suspend_nfsd == 0 but > > > > > lock > > > > > held. > > > > Yes. I had assumed that mountd would be the only thing using > > > > these > > > > syscalls > > > > and it is single threaded. (The syscalls can only be done by > > > > root > > > > for the > > > > obvious reasons.;-) > > > > > > > > Maybe the following untested version of the syscalls would be > > > > better, since > > > > they would allow multiple concurrent calls to either suspend or > > > > resume. > > > > (There would still be an indeterminate case if one thread called > > > > resume > > > > concurrently with another few calling suspend, but that is > > > > unavoidable, > > > > I think?) > > > > > > > > Again, thanks for the comments, rick > > > > --- untested version of syscalls --- > > > > } else if ((uap->flag & NFSSVC_SUSPENDNFSD) != 0) { > > > > NFSLOCKV4ROOTMUTEX(); > > > > if (suspend_nfsd == 0) { > > > > /* Lock out all nfsd threads */ > > > > igotlock = 0; > > > > while (igotlock == 0 && suspend_nfsd == 0) { > > > > igotlock = nfsv4_lock(&nfsv4rootfs_lock, 1, > > > > NULL, NFSV4ROOTLOCKMUTEXPTR, NULL); > > > > } > > > > suspend_nfsd = 1; > > > > } > > > > NFSUNLOCKV4ROOTMUTEX(); > > > > error = 0; > > > > } else if ((uap->flag & NFSSVC_RESUMENFSD) != 0) { > > > > NFSLOCKV4ROOTMUTEX(); > > > > if (suspend_nfsd != 0) { > > > > nfsv4_unlock(&nfsv4rootfs_lock, 0); > > > > suspend_nfsd = 0; > > > > } > > > > NFSUNLOCKV4ROOTMUTEX(); > > > > error = 0; > > > > } > > > > > > From the cursory look, this variant is an improvement, mostly by > > > taking > > > the interlock before testing suspend_nfsd, and using the while > > > loop. > > > > > > Is it possible to also make the sleep for the lock interruptible ? > > > So that blocked mountd could be killed by a signal ? > > Well, it would require some coding. An extra argument to > > nfsv4_lock() > > to indicate to do so and then either the caller would have to check > > for a pending termination signal when it returns 0 (indicates didn't > > get > > lock) or a new return value to indicate EINTR. The latter would > > require > > all the calls to it to be changed to recognize the new 3rd return > > case. > > Because there are a lot of these calls, I'd tend towards just having > > the > > caller check for a pending signal. > > > > Not sure if it would make much difference though. The only time it > > would get stuck in nfsv4_lock() is if the nfsd threads are all > > wedged > > and in that case having mountd wedged too probably doesn't make much > > difference, since the NFS service is toast in that case anyhow. > > > > If you think it is worth doing, I can add that. I basically see this > > as a "stop-gap" fix until such time as something like nfse is done, > > but since I haven't the time to look at nfse right now, I have no > > idea when/if that might happen. > > Ok, please go ahead with the patch. Having the patch even in its > current > form is obviously better then not to have it. If the wedged mountd > appears to be annoying enough for me, I would do the change. > > Thanks. Oops, I spoke too soon. When I took a look at the code, I realized that having nfsv4_lock() return when a pending signal interrupts the msleep() isn't easy. As such, I think I'll leave it out of the patch for now. For those who find these things interesting, the reason the above is hard is the funny intentional semantics that nfsv4_lock() implements. Most of the time, the nfsd threads can concurrently handle the NFSv4 state structures, using a mutex to serialize access to the lists and never sleeping while doing so. However, there are a few case (mainly delegation recall) where sleeping and knowing that no other thread is modifying the lists is necessary. As such, nfsv4_lock() will be called by potentially many nfsd threads (up to 200+) wanting this exclusive sleep lock. However, it is coded so that only the first one that wakes up after the shared locks (I call it a ref count in the code) have been released, gets the exclusive lock. The rest of the threads wake up after the first thread releases the exclusive lock, but simply return without getting the lock. This avoids up to 200+ threads getting the exclusive lock in turn and then going "oh, I don't need it since that other thread already got the work done" so it releases the exclusive lock and gets a shared one. (It also implements the exclusive lock as having priority over the shared lock request, so that nfsv4_lock() won't wait indefinitely for the exclusive lock.) The above is done by having the first thread that wakes up once the shared locks are released clear the "want an exclusive lock" flag as it acquires it. If a call were to return due to a signal, it wouldn't know if it should clear the "want an exclusive lock" flag or not, since it wouldn't know if other threads currently want it or not. This could probably be fixed by adding a count of how many threads are currently sleeping, waiting for the "want an exclusive flag", but that's too scary for me to do, unless it really is needed. (As you might have guessed, it's pretty easy to break this in subtle ways and I'm a chicken;-) rick From owner-freebsd-fs@FreeBSD.ORG Fri Sep 21 22:38:51 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 89E7B10656EA for ; Fri, 21 Sep 2012 22:38:51 +0000 (UTC) (envelope-from tjg@soe.ucsc.edu) Received: from mail-pb0-f54.google.com (mail-pb0-f54.google.com [209.85.160.54]) by mx1.freebsd.org (Postfix) with ESMTP id 55BD18FC0C for ; Fri, 21 Sep 2012 22:38:50 +0000 (UTC) Received: by pbbrp2 with SMTP id rp2so9362830pbb.13 for ; Fri, 21 Sep 2012 15:38:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ucsc.edu; s=ucsc-google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=c1EtDBKD0Udx6sF0Wq+8Z+23Wv+DedvHZOtenLL/Gq0=; b=W3jlHWK/R4Vs/SBtjwRIC9Wo9booiD5bIITudW+r1T6/v0ng8KZYQKaf1Q25n/sK/W H/hzRyjMp1MtnX7gTTA5nZf0yHN1eJj+OlG8ZIiWvfTdfw3KgqI47Hs9gIvwowWjavXn rKaaaEdQQ6LlBMqSrfieVFOuWeub9HE+YVGwQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=c1EtDBKD0Udx6sF0Wq+8Z+23Wv+DedvHZOtenLL/Gq0=; b=WmBPpXqG6RlxARqwnWK0zaeV1dY8jphRCsafxcdsejWH3xE+SVQlyIN/9UOC9bkIwS +J3/0MCrwGZipY430SUp9Od4rNBWdTRSEFtLo94iPR6ig4MBHYmQQh3mp9gFIIjXsMiJ b2zPzaOoME+8Xyt0EqOmD/F+X+ItvrIrl/9XsZ+Dad94A8D7ezgOJAF5e824z3WzJKzB 7HgwIPIg6pqr9gUG3wN0P0XsJLApoTQVIl+dxf9eJ56s3OHUBS6ley+U1UnDy2YlH18U +pdOgxmF74mXWoAAWbojPjERUYLxIHpqznB11un0hGeJzWpccV0AjGb3K1sSlPov2zs0 m4Ng== MIME-Version: 1.0 Received: by 10.66.85.4 with SMTP id d4mr16460771paz.11.1348267130543; Fri, 21 Sep 2012 15:38:50 -0700 (PDT) Received: by 10.68.25.69 with HTTP; Fri, 21 Sep 2012 15:38:50 -0700 (PDT) In-Reply-To: References: <20120922004036.88462@relay.ibs.dn.ua> Date: Fri, 21 Sep 2012 15:38:50 -0700 Message-ID: From: Tim Gustafson To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Gm-Message-State: ALoCoQkpIKFaOmi9tMDWGf/GdCZqE32XYTIPwxcBflcrfrA1LQIqtsTnFEKYf5V7wk/YA3WLQV+/ Subject: Re: Exporting ZFS File System to Multiple Subnets X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2012 22:38:51 -0000 > the only way to do that I know is described here: > http://freebsd.1045724.n5.nabble.com/zfs-sharenfs-to-multiple-subnets-found-a-dirty-looking-hack-td4030378.html > > I've seen that, but it certainly feels "dirty". Wouldn't it just be better to disable the "sharenfs" property altogether, or make it a non-operational property, and allow regular /etc/exports rules to work? Or perhaps have a psuedo-value for the "sharenfs" property that would enable normal /etc/exports processing? Something like: zfs set sharenfs=exports tank I'd rather edit /etc/exports by hand anyhow. -- Tim Gustafson tjg@soe.ucsc.edu 831-459-5354 Baskin Engineering, Room 313A From owner-freebsd-fs@FreeBSD.ORG Sat Sep 22 12:54:09 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 026161065672; Sat, 22 Sep 2012 12:54:09 +0000 (UTC) (envelope-from flo@smeets.im) Received: from mail.solomo.de (mail.solomo.de [85.214.62.193]) by mx1.freebsd.org (Postfix) with ESMTP id 827CC8FC1B; Sat, 22 Sep 2012 12:54:08 +0000 (UTC) Received: from mail.solomo.de (localhost [127.0.0.1]) by mail.solomo.de (Postfix) with ESMTP id 3B148C382A; Sat, 22 Sep 2012 14:54:01 +0200 (CEST) X-Virus-Scanned: amavisd-new at solomo.de Received: from mail.solomo.de ([127.0.0.1]) by mail.solomo.de (mail.solomo.de [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 8fa5t4JIKIkT; Sat, 22 Sep 2012 14:54:00 +0200 (CEST) Received: from nibbler-osx-wlan.fritz.box (unknown [IPv6:2001:4dd0:ff00:8bb6:54b8:e77b:246:f66e]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mail.solomo.de (Postfix) with ESMTPSA id 695C8C3833; Sat, 22 Sep 2012 14:54:00 +0200 (CEST) Message-ID: <505DB4E6.8030407@smeets.im> Date: Sat, 22 Sep 2012 14:53:58 +0200 From: Florian Smeets User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20120905 Thunderbird/16.0 MIME-Version: 1.0 To: FreeBSD FS X-Enigmail-Version: 1.5a1pre Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigDB4E30752B92537383E0FAEF" Subject: panic: _sx_xlock_hard: recursed on non-recursive sx zfsvfs->z_hold_mtx[i] @ ...cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1407 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Sep 2012 12:54:09 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigDB4E30752B92537383E0FAEF Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi, I hit the above mentioned panic quite frequently on recent versions of head (r240806). This happens when building packages in the ports tinderbox which uses nullfs and zfs extensively. Kib had a look at it and suspects that his recent nullfs changes expose a bug in zfs. The backtrace is as follows: #0 doadump (textdump=3D1) at /usr/home/flo/dev/checkouts/svn-src/sys/kern/kern_shutdown.c:266 #1 0xffffffff804c6a64 in kern_reboot (howto=3D260) at /usr/home/flo/dev/checkouts/svn-src/sys/kern/kern_shutdown.c:449 #2 0xffffffff804c648a in panic (fmt=3D0x0) at /usr/home/flo/dev/checkouts/svn-src/sys/kern/kern_shutdown.c:637 #3 0xffffffff804ce6e5 in _sx_xlock_hard (sx=3DVariable "sx" is not avail= able. ) at /usr/home/flo/dev/checkouts/svn-src/sys/kern/kern_sx.c:523 #4 0xffffffff804ce77e in _sx_xlock (sx=3DVariable "sx" is not available.= ) at sx.h:152 #5 0xffffffff80e17533 in zfs_zinactive (zp=3D0xfffffe011951ec80) at /usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/op= ensolaris/uts/common/fs/zfs/zfs_znode.c:1407 #6 0xffffffff80e45366 in zfs_inactive (vp=3D0xfffffe019bdfad90, cr=3DVariable "cr" is not available. ) at /usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/op= ensolaris/uts/common/fs/zfs/zfs_vnops.c:4590 #7 0xffffffff80e4552a in zfs_freebsd_inactive (ap=3DVariable "ap" is not= available. ) at /usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/op= ensolaris/uts/common/fs/zfs/zfs_vnops.c:6102 #8 0xffffffff8070aae7 in VOP_INACTIVE_APV (vop=3D0xffffffff80eb5fe0, a=3D0xffffff89092d3d20) at vnode_if.c:1863 #9 0xffffffff8055e3b7 in vinactive (vp=3D0xfffffe019bdfad90, td=3D0xfffffe0017bad900) at vnode_if.h:807 #10 0xffffffff80562526 in vputx (vp=3D0xfffffe019bdfad90, func=3D2) at /usr/home/flo/dev/checkouts/svn-src/sys/kern/vfs_subr.c:2290 #11 0xffffffff80d8a5f0 in null_reclaim (ap=3DVariable "ap" is not availab= le. ) at /usr/home/flo/dev/checkouts/svn-src/sys/modules/nullfs/../../fs/nullfs/nu= ll_vnops.c:706 #12 0xffffffff8070a9d7 in VOP_RECLAIM_APV (vop=3D0xffffffff80d8b180, a=3D0xffffff89092d3e60) at vnode_if.c:1926 #13 0xffffffff8055f64d in vgonel (vp=3D0xfffffe019bdb73e0) at vnode_if.h:= 830 #14 0xffffffff80561815 in vnlru_free (count=3D1) at /usr/home/flo/dev/checkouts/svn-src/sys/kern/vfs_subr.c:931 #15 0xffffffff80561b1f in getnewvnode (tag=3D0xffffffff80eae0f3 "zfs", mp=3D0xfffffe0010dc3cc0, vops=3D0xffffffff80eb5fe0, vpp=3D0xffffff89092d3= f88) at /usr/home/flo/dev/checkouts/svn-src/sys/kern/vfs_subr.c:953 #16 0xffffffff80e168b5 in zfs_znode_cache_constructor (buf=3D0xfffffe019b437af0, arg=3DVariable "arg" is not available. ) at /usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/op= ensolaris/uts/common/fs/zfs/zfs_znode.c:135 #17 0xffffffff80e189cc in zfs_znode_alloc (zfsvfs=3D0xfffffe0010de4000, db=3D0xfffffe048c138000, blksz=3D0, obj_type=3DDMU_OT_SA, hdl=3D0xfffffe0= 19b441cd0) at /usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/op= ensolaris/uts/common/fs/zfs/zfs_znode.c:663 #18 0xffffffff80e19b65 in zfs_mknode (dzp=3D0xfffffe00b84dd7d0, vap=3D0xffffff89092d4740, tx=3D0xfffffe0303916600, cr=3D0xfffffe000c668e0= 0, flag=3D0, zpp=3D0xffffff89092d46a0, acl_ids=3D0xffffff89092d4670) at /usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/op= ensolaris/uts/common/fs/zfs/zfs_znode.c:1012 #19 0xffffffff80e46d6f in zfs_freebsd_create (ap=3DVariable "ap" is not available. ) at /usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/op= ensolaris/uts/common/fs/zfs/zfs_vnops.c:1657 #20 0xffffffff8070cef1 in VOP_CREATE_APV (vop=3D0xffffffff80eb5fe0, a=3D0xffffff89092d47f0) at vnode_if.c:250 #21 0xffffffff8056f569 in vn_open_cred (ndp=3D0xffffff89092d4880, flagp=3D0xffffff89092d487c, cmode=3DVariable "cmode" is not available. ) at vnode_if.h:109 #22 0xffffffff80569236 in kern_openat (td=3D0xfffffe0017bad900, fd=3D-100= , path=3D0x801c2b300
, pathseg=3DVariabl= e "pathseg" is not available. ) at /usr/home/flo/dev/checkouts/svn-src/sys/kern/vfs_syscalls.c:1134 #23 0xffffffff806b8329 in amd64_syscall (td=3D0xfffffe0017bad900, traced=3D0) at subr_syscall.c:135 #24 0xffffffff806a2eb7 in Xfast_syscall () at /usr/home/flo/dev/checkouts/svn-src/sys/amd64/amd64/exception.S:387 #25 0x00000008017702ec in ?? () Previous frame inner to this frame (corrupt stack?) I have the vmcore and kernel symbols, so if someone wants to know more I should be able to provide further data. Florian --------------enigDB4E30752B92537383E0FAEF Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iEYEARECAAYFAlBdtOcACgkQapo8P8lCvwmDrQCg4X40ttRVkbrjx/cbKmNv+oHY sGQAoK8mpzOUgJYVlTaCZLLGneRlMfBe =ZGUd -----END PGP SIGNATURE----- --------------enigDB4E30752B92537383E0FAEF-- From owner-freebsd-fs@FreeBSD.ORG Sat Sep 22 13:33:59 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id EEE951065670; Sat, 22 Sep 2012 13:33:58 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id E1A668FC08; Sat, 22 Sep 2012 13:33:57 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id QAA06629; Sat, 22 Sep 2012 16:33:55 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1TFPqE-000NLt-W9; Sat, 22 Sep 2012 16:33:55 +0300 Message-ID: <505DBE41.20303@FreeBSD.org> Date: Sat, 22 Sep 2012 16:33:53 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120913 Thunderbird/15.0.1 MIME-Version: 1.0 To: FreeBSD FS References: <505DB4E6.8030407@smeets.im> In-Reply-To: <505DB4E6.8030407@smeets.im> X-Enigmail-Version: 1.4.3 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Florian Smeets , Pawel Jakub Dawidek Subject: Re: panic: _sx_xlock_hard: recursed on non-recursive sx zfsvfs->z_hold_mtx[i] @ ...cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1407 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Sep 2012 13:33:59 -0000 on 22/09/2012 15:53 Florian Smeets said the following: > Hi, > > I hit the above mentioned panic quite frequently on recent versions of head > (r240806). This happens when building packages in the ports tinderbox which > uses nullfs and zfs extensively. Kib had a look at it and suspects that his > recent nullfs changes expose a bug in zfs. > > The backtrace is as follows: Since getnewvnode() can call vnlru_free() the call flow can recurse back into fs code. So it's dangerous in general to hold any fs locks around getnewvnode call, as kib advises. In this case it was a nullfs vnode that caused recursion into zfs, but it could have been a zfs vnode. The only thing required for a panic is a hash collision of zfs object id, so that the same z_hold_mtx is used. But I imagine that it would be quite tough to drop z_hold_mtx in zfs_znode_cache_constructor. > #0 doadump (textdump=1) at > /usr/home/flo/dev/checkouts/svn-src/sys/kern/kern_shutdown.c:266 #1 > 0xffffffff804c6a64 in kern_reboot (howto=260) at > /usr/home/flo/dev/checkouts/svn-src/sys/kern/kern_shutdown.c:449 #2 > 0xffffffff804c648a in panic (fmt=0x0) at > /usr/home/flo/dev/checkouts/svn-src/sys/kern/kern_shutdown.c:637 #3 > 0xffffffff804ce6e5 in _sx_xlock_hard (sx=Variable "sx" is not available. ) > at /usr/home/flo/dev/checkouts/svn-src/sys/kern/kern_sx.c:523 #4 > 0xffffffff804ce77e in _sx_xlock (sx=Variable "sx" is not available. ) at > sx.h:152 #5 0xffffffff80e17533 in zfs_zinactive (zp=0xfffffe011951ec80) > at > /usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1407 > > #6 0xffffffff80e45366 in zfs_inactive (vp=0xfffffe019bdfad90, > cr=Variable "cr" is not available. ) at > /usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4590 > > #7 0xffffffff80e4552a in zfs_freebsd_inactive (ap=Variable "ap" is not > available. ) at > /usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:6102 > > #8 0xffffffff8070aae7 in VOP_INACTIVE_APV (vop=0xffffffff80eb5fe0, > a=0xffffff89092d3d20) at vnode_if.c:1863 #9 0xffffffff8055e3b7 in > vinactive (vp=0xfffffe019bdfad90, td=0xfffffe0017bad900) at vnode_if.h:807 > #10 0xffffffff80562526 in vputx (vp=0xfffffe019bdfad90, func=2) at > /usr/home/flo/dev/checkouts/svn-src/sys/kern/vfs_subr.c:2290 #11 > 0xffffffff80d8a5f0 in null_reclaim (ap=Variable "ap" is not available. ) > at > /usr/home/flo/dev/checkouts/svn-src/sys/modules/nullfs/../../fs/nullfs/null_vnops.c:706 > > #12 0xffffffff8070a9d7 in VOP_RECLAIM_APV (vop=0xffffffff80d8b180, > a=0xffffff89092d3e60) at vnode_if.c:1926 #13 0xffffffff8055f64d in vgonel > (vp=0xfffffe019bdb73e0) at vnode_if.h:830 #14 0xffffffff80561815 in > vnlru_free (count=1) at > /usr/home/flo/dev/checkouts/svn-src/sys/kern/vfs_subr.c:931 #15 > 0xffffffff80561b1f in getnewvnode (tag=0xffffffff80eae0f3 "zfs", > mp=0xfffffe0010dc3cc0, vops=0xffffffff80eb5fe0, vpp=0xffffff89092d3f88) at > /usr/home/flo/dev/checkouts/svn-src/sys/kern/vfs_subr.c:953 #16 > 0xffffffff80e168b5 in zfs_znode_cache_constructor (buf=0xfffffe019b437af0, > arg=Variable "arg" is not available. ) at > /usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:135 > > #17 0xffffffff80e189cc in zfs_znode_alloc (zfsvfs=0xfffffe0010de4000, > db=0xfffffe048c138000, blksz=0, obj_type=DMU_OT_SA, > hdl=0xfffffe019b441cd0) at > /usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:663 > > #18 0xffffffff80e19b65 in zfs_mknode (dzp=0xfffffe00b84dd7d0, > vap=0xffffff89092d4740, tx=0xfffffe0303916600, cr=0xfffffe000c668e00, > flag=0, zpp=0xffffff89092d46a0, acl_ids=0xffffff89092d4670) at > /usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1012 > > #19 0xffffffff80e46d6f in zfs_freebsd_create (ap=Variable "ap" is not > available. ) at > /usr/home/flo/dev/checkouts/svn-src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1657 > > #20 0xffffffff8070cef1 in VOP_CREATE_APV (vop=0xffffffff80eb5fe0, > a=0xffffff89092d47f0) at vnode_if.c:250 #21 0xffffffff8056f569 in > vn_open_cred (ndp=0xffffff89092d4880, flagp=0xffffff89092d487c, > cmode=Variable "cmode" is not available. ) at vnode_if.h:109 #22 > 0xffffffff80569236 in kern_openat (td=0xfffffe0017bad900, fd=-100, > path=0x801c2b300
, pathseg=Variable > "pathseg" is not available. ) at > /usr/home/flo/dev/checkouts/svn-src/sys/kern/vfs_syscalls.c:1134 #23 > 0xffffffff806b8329 in amd64_syscall (td=0xfffffe0017bad900, traced=0) at > subr_syscall.c:135 #24 0xffffffff806a2eb7 in Xfast_syscall () at > /usr/home/flo/dev/checkouts/svn-src/sys/amd64/amd64/exception.S:387 #25 > 0x00000008017702ec in ?? () Previous frame inner to this frame (corrupt > stack?) > > I have the vmcore and kernel symbols, so if someone wants to know more I > should be able to provide further data. > > Florian > -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Sat Sep 22 16:20:58 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9E25C1065678; Sat, 22 Sep 2012 16:20:58 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 35A438FC22; Sat, 22 Sep 2012 16:20:56 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA07394; Sat, 22 Sep 2012 19:20:55 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1TFSRr-000NSw-H9; Sat, 22 Sep 2012 19:20:55 +0300 Message-ID: <505DE566.2080307@FreeBSD.org> Date: Sat, 22 Sep 2012 19:20:54 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120913 Thunderbird/15.0.1 MIME-Version: 1.0 To: freebsd-fs@FreeBSD.org X-Enigmail-Version: 1.4.3 Content-Type: text/plain; charset=X-VIET-VPS Content-Transfer-Encoding: 7bit Cc: Subject: lszfs command for loader X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Sep 2012 16:20:58 -0000 Please find a patch that implements lszfs loader command. The command can list child filesystems of a specified filesystem (including root dataset). The command is really simplistic, a list goes directly to console, so there is no filtering of hidden filesystem names etc. The command is intended to facilitate recovery on systems that use "Boot Environments" approach for boot/root filesystem. http://people.freebsd.org/~avg/lszfs.diff diff --git a/sys/boot/i386/loader/main.c b/sys/boot/i386/loader/main.c index 80c8178..84ae713 100644 --- a/sys/boot/i386/loader/main.c +++ b/sys/boot/i386/loader/main.c @@ -330,6 +330,29 @@ command_heap(int argc, char *argv[]) return(CMD_OK); } +#ifdef LOADER_ZFS_SUPPORT +COMMAND_SET(lszfs, "lszfs", "list child datasets of a zfs dataset", + command_lszfs); + +static int +command_lszfs(int argc, char *argv[]) +{ + int err; + + if (argc != 2) { + command_errmsg = "wrong number of arguments"; + return (CMD_ERROR); + } + + err = zfs_list(argv[1]); + if (err != 0) { + command_errmsg = strerror(err); + return (CMD_ERROR); + } + return (CMD_OK); +} +#endif + /* ISA bus access functions for PnP. */ static int isa_inb(int port) diff --git a/sys/boot/zfs/libzfs.h b/sys/boot/zfs/libzfs.h index 7ad3a72..6834f8b 100644 --- a/sys/boot/zfs/libzfs.h +++ b/sys/boot/zfs/libzfs.h @@ -61,6 +61,7 @@ int zfs_parsedev(struct zfs_devdesc *dev, const char *devspec, const char **path); char *zfs_fmtdev(void *vdev); int zfs_probe_dev(const char *devname, uint64_t *pool_guid); +int zfs_list(const char *name); extern struct devsw zfs_dev; extern struct fs_ops zfs_fsops; diff --git a/sys/boot/zfs/zfs.c b/sys/boot/zfs/zfs.c index eb8833f..3fc5f50 100644 --- a/sys/boot/zfs/zfs.c +++ b/sys/boot/zfs/zfs.c @@ -658,3 +658,38 @@ zfs_fmtdev(void *vdev) rootname); return (buf); } + +int +zfs_list(const char *name) +{ + static char poolname[ZFS_MAXNAMELEN]; + uint64_t objid; + spa_t *spa; + const char *dsname; + int len; + int rv; + + len = strlen(name); + dsname = strchr(name, '/'); + if (dsname != NULL) { + len = dsname - name; + dsname++; + } + memcpy(poolname, name, len); + poolname[len] = '\0'; + + spa = spa_find_by_name(poolname); + if (!spa) + return (ENXIO); + rv = zfs_spa_init(spa); + if (rv != 0) + return (rv); + if (dsname != NULL) + rv = zfs_lookup_dataset(spa, dsname, &objid); + else + rv = zfs_get_root(spa, &objid); + if (rv != 0) + return (rv); + rv = zfs_list_dataset(spa, objid); + return (0); +} diff --git a/sys/boot/zfs/zfsimpl.c b/sys/boot/zfs/zfsimpl.c index 219d7af..18f5d9a 100644 --- a/sys/boot/zfs/zfsimpl.c +++ b/sys/boot/zfs/zfsimpl.c @@ -1415,8 +1415,6 @@ zap_lookup(const spa_t *spa, const dnode_phys_t *dnode, const char *name, uint64 return (EIO); } -#ifdef BOOT2 - /* * List a microzap directory. Assumes that the zap scratch buffer contains * the directory contents. @@ -1541,8 +1539,6 @@ zap_list(const spa_t *spa, const dnode_phys_t *dnode) return fzap_list(spa, dnode); } -#endif - static int objset_get_dnode(const spa_t *spa, const objset_phys_t *os, uint64_t objnum, dnode_phys_t *dnode) { @@ -1779,6 +1775,38 @@ zfs_lookup_dataset(const spa_t *spa, const char *name, uint64_t *objnum) return (0); } +#ifndef BOOT2 +static int +zfs_list_dataset(const spa_t *spa, uint64_t objnum/*, int pos, char *entry*/) +{ + uint64_t dir_obj, child_dir_zapobj; + dnode_phys_t child_dir_zap, dir, dataset; + dsl_dataset_phys_t *ds; + dsl_dir_phys_t *dd; + + if (objset_get_dnode(spa, &spa->spa_mos, objnum, &dataset)) { + printf("ZFS: can't find dataset %ju\n", (uintmax_t)objnum); + return (EIO); + } + ds = (dsl_dataset_phys_t *) &dataset.dn_bonus; + dir_obj = ds->ds_dir_obj; + + if (objset_get_dnode(spa, &spa->spa_mos, dir_obj, &dir)) { + printf("ZFS: can't find dirobj %ju\n", (uintmax_t)dir_obj); + return (EIO); + } + dd = (dsl_dir_phys_t *)&dir.dn_bonus; + + child_dir_zapobj = dd->dd_child_dir_zapobj; + if (objset_get_dnode(spa, &spa->spa_mos, child_dir_zapobj, &child_dir_zap) != 0) { + printf("ZFS: can't find child zap %ju\n", (uintmax_t)dir_obj); + return (EIO); + } + + return (zap_list(spa, &child_dir_zap) != 0); +} +#endif + /* * Find the object set given the object number of its dataset object * and return its details in *objset -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Sat Sep 22 16:28:09 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5194A106564A; Sat, 22 Sep 2012 16:28:09 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 1B2998FC08; Sat, 22 Sep 2012 16:28:07 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA07410; Sat, 22 Sep 2012 19:28:06 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1TFSYo-000NTB-7M; Sat, 22 Sep 2012 19:28:06 +0300 Message-ID: <505DE715.8020806@FreeBSD.org> Date: Sat, 22 Sep 2012 19:28:05 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120913 Thunderbird/15.0.1 MIME-Version: 1.0 To: freebsd-fs@FreeBSD.org X-Enigmail-Version: 1.4.3 Content-Type: text/plain; charset=X-VIET-VPS Content-Transfer-Encoding: 7bit Cc: Subject: zfs: allow to mount root from a pool not in zpool.cache X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Sep 2012 16:28:09 -0000 Currently FreeBSD ZFS kernel code doesn't allow to mount root filesystem on a pool that is not listed in zpool.cache as only pools from the cache are known to ZFS at that time. This patch is an attempt to improve the behavior: http://people.freebsd.org/~avg/spa_import_rootpool.diff This could be useful when importing pools that were exported from other systems. There is a tunable vfs.zfs.rootpool.prefer_cached_config which is set to 1 by default. 1 means just use a cached pool config if it's found in the cache, 0 means to re-probe disks and read supposedly latest/actual config in any case. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Sat Sep 22 16:49:38 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3C056106564A; Sat, 22 Sep 2012 16:49:38 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 587368FC08; Sat, 22 Sep 2012 16:49:36 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA07478; Sat, 22 Sep 2012 19:49:35 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1TFStb-000NTq-6o; Sat, 22 Sep 2012 19:49:35 +0300 Message-ID: <505DEC1C.4000305@FreeBSD.org> Date: Sat, 22 Sep 2012 19:49:32 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120913 Thunderbird/15.0.1 MIME-Version: 1.0 To: freebsd-fs@FreeBSD.org X-Enigmail-Version: 1.4.3 Content-Type: text/plain; charset=X-VIET-VPS Content-Transfer-Encoding: 7bit Cc: Subject: znextboot: nextboot-like tool for zfs at zfsboot level X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Sep 2012 16:49:38 -0000 Please find here a patchset that implement znextboot, a nextboot-like tool for zfs at zfsboot level: http://people.freebsd.org/~avg/znextboot.diff Theory of operation. zfsboot, through loader, exports to kernel environment the GUIDs of the very first pool it found ("primary pool") and the very first leaf vdev of that pool ("primary vdev"). Note that the primary pool is not necessarily a boot pool or a root pool, since a user can switch between pools and filesystems at various stages: zfsboot, zfsloader, rootfs specification. znextboot is a new tool that simply passes zfsboot/boot2 options to kernel ZFS via ioctl. Kernel ZFS writes the options as a NUL terminated ASCII string to the Pad2 area of the primary vdev of the primary pool. The Pad2 area has been known as "Boot Block Header" before. Its use was never formalized. Peviously it used to contain a special header (with zero useful information), now ZFS just zeroes it out. So, upon reboot zfsboot reads options from that area and zeros the area. The tool is intended for remote management of systems that use approaches similar to "Boot Environments". It is implemented at zfsboot level as opposed to loader level, because it was easier. My skills weren't sufficient to integrate the ZFS logic with loader's nextboot logic implemented in Forth. Some problematic areas in the current patchset: - I used just the next number for the nextboot ioctl. This will result in conflict when a new ioctl is added upstream. We need to think about reserving a range for OS-specific ioctls. - znextboot userland utility currently lacks any documentation. - znextboot lacks any sanity checking / validation for arguments that are passed to it. - probably more... -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Sat Sep 22 17:03:30 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E3E4F1065673 for ; Sat, 22 Sep 2012 17:03:30 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 205738FC08 for ; Sat, 22 Sep 2012 17:03:29 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id UAA07530 for ; Sat, 22 Sep 2012 20:03:28 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1TFT72-000NUP-7o for freebsd-fs@FreeBSD.ORG; Sat, 22 Sep 2012 20:03:28 +0300 Message-ID: <505DEF5F.8060401@FreeBSD.org> Date: Sat, 22 Sep 2012 20:03:27 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120913 Thunderbird/15.0.1 MIME-Version: 1.0 To: freebsd-fs@FreeBSD.org X-Enigmail-Version: 1.4.3 Content-Type: text/plain; charset=X-VIET-VPS Content-Transfer-Encoding: 7bit Cc: Subject: zfsboot and zfsloader: normalization of filesystem names X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Sep 2012 17:03:31 -0000 Currently zfsboot uses the following format to specify a ZFS filesystem name in a full file path: poolname:filesystem/name:/path/to/file ZFS loader uses this format: zfs:poolname/filesystemname:/path/to/file The following patchset: http://people.freebsd.org/~avg/zfs-boot-naming.diff unifies the naming. zfsboot format will be: poolname/filesystemname:/path/to/file Note that it is still different from zfsloader - "zfs:" prefix is missing. This is because unlike the loader zfsboot supports only ZFS filesystem, so the prefix is redundant. But I can still add support for it if there is a popular request. Also, current code treats lone pool name as a pool's boot data set name. That is, whatever is specified in bootfs property. If the property is unset, then the root dataset is the boot dataset. I want to change this to always mean the root dataset. boot dataset is selected by default anyways and its name is expanded to the actual name when it is printed. Also, lsdev -v for a zfs pool will print bootfs property. The same goes for zfsboot's "status" command. A final note. All this stuff really needs to be documented. Currently the documentation on boot blocks seems to totally miss on ZFS boot. -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Sat Sep 22 17:13:11 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 27BD4106566B; Sat, 22 Sep 2012 17:13:11 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id E32BB8FC17; Sat, 22 Sep 2012 17:13:09 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id UAA07567; Sat, 22 Sep 2012 20:13:08 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1TFTGN-000NUn-W4; Sat, 22 Sep 2012 20:13:08 +0300 Message-ID: <505DF1A3.1020809@FreeBSD.org> Date: Sat, 22 Sep 2012 20:13:07 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:15.0) Gecko/20120913 Thunderbird/15.0.1 MIME-Version: 1.0 To: freebsd-fs@FreeBSD.org X-Enigmail-Version: 1.4.3 Content-Type: text/plain; charset=X-VIET-VPS Content-Transfer-Encoding: 7bit Cc: freebsd-geom@FreeBSD.org Subject: zfs zvol: set geom mediasize right at creation time X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Sep 2012 17:13:11 -0000 Please review the following patch. In addition to what the description says I almost by accident sneaked another change into the patch. It's setting of stripesize to volblocksize. I think that the change should make sense, but it is really a different change. A side note: setting sectorsize to volblocksize seemed like an overkill and it would certainly mess the existing zvols in use. Maybe there should be another property like reportedblocksize or something. commit 1585e6cfb602c2a2647b9f802445bb174bc430a4 Author: Andriy Gapon Date: Wed Sep 19 20:49:28 2012 +0300 zvol: set mediasize in geom provider right upon its creation ... instead of deferring the action until first open. Unlike upstream this has no benefit on FreeBSD. We know that as soon as the provider is created it is going to be tasted and thus opened. Initial mediasize of zero causes tasting failure and subsequent retasting because of the size change. diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c index d47d270..6e9e7a3 100644 --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c @@ -475,6 +475,7 @@ zvol_create_minor(const char *name) zvol_state_t *zv; objset_t *os; dmu_object_info_t doi; + uint64_t volblocksize, volsize; int error; ZFS_LOG(1, "Creating ZVOL %s...", name); @@ -535,9 +536,20 @@ zvol_create_minor(const char *name) zv = zs->zss_data = kmem_zalloc(sizeof (zvol_state_t), KM_SLEEP); #else /* !sun */ + error = zap_lookup(os, ZVOL_ZAP_OBJ, "size", 8, 1, &volsize); + if (error) { + ASSERT(error == 0); + dmu_objset_disown(os, zvol_tag); + mutex_exit(&spa_namespace_lock); + return (error); + } + DROP_GIANT(); g_topology_lock(); zv = zvol_geom_create(name); + zv->zv_volsize = volsize; + zv->zv_provider->mediasize = zv->zv_volsize; + #endif /* !sun */ (void) strlcpy(zv->zv_name, name, MAXPATHLEN); @@ -554,6 +566,7 @@ zvol_create_minor(const char *name) error = dmu_object_info(os, ZVOL_OBJ, &doi); ASSERT(error == 0); zv->zv_volblocksize = doi.doi_data_block_size; + zv->zv_provider->stripesize = zv->zv_volblocksize; if (spa_writeable(dmu_objset_spa(os))) { if (zil_replay_disable) -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Sat Sep 22 18:24:27 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3801D106564A for ; Sat, 22 Sep 2012 18:24:27 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from smtprelay01.ispgateway.de (smtprelay01.ispgateway.de [80.67.31.39]) by mx1.freebsd.org (Postfix) with ESMTP id E18418FC08 for ; Sat, 22 Sep 2012 18:24:26 +0000 (UTC) Received: from [87.79.193.113] (helo=fabiankeil.de) by smtprelay01.ispgateway.de with esmtpsa (SSLv3:AES128-SHA:128) (Exim 4.68) (envelope-from ) id 1TFUNI-00007P-4V for freebsd-fs@freebsd.org; Sat, 22 Sep 2012 20:24:20 +0200 Date: Sat, 22 Sep 2012 20:24:14 +0200 From: Fabian Keil To: freebsd-fs@freebsd.org Message-ID: <20120922202414.7ed96a21@fabiankeil.de> In-Reply-To: <20110625134031.3cbc5952@fabiankeil.de> References: <20110227202957.GD1992@garage.freebsd.pl> <20110228192129.119cac0c@r500.local> <20110307200634.3c0f92df@r500.local> <20110307202531.2c90ff5a@r500.local> <20110625134031.3cbc5952@fabiankeil.de> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/cewI3g88P4=WQ__Y4mACAgb"; protocol="application/pgp-signature" X-Df-Sender: Nzc1MDY3 Subject: Re: g_wither_washer() called 470000 times per second X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Sep 2012 18:24:27 -0000 --Sig_/cewI3g88P4=WQ__Y4mACAgb Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Fabian Keil wrote: > Apparently what's eating the cpu is the kernel calling > g_wither_washer() about 470000 time per second which > seems a bit excessive: >=20 > r500# dtrace -n 'fbt:kernel:g_*:entry { @[probefunc, stack()] =3D count()= ; } tick-1sec { trunc(@, 15); printa(@); trunc(@)}' > dtrace: description 'fbt:kernel:g_*:entry ' matched 232 probes > CPU ID FUNCTION:NAME > [...] > g_wither_washer =20 > kernel`g_run_events+0x358 > kernel`fork_exit+0x11f > kernel`0xffffffff808debde > 475959 >=20 This is now kern/171865: http://www.freebsd.org/cgi/query-pr.cgi?pr=3D171865 Fabian --Sig_/cewI3g88P4=WQ__Y4mACAgb Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlBeAlQACgkQBYqIVf93VJ3z0gCdFKfwM97OYGIOvd+RHr++LyyZ 6BwAn0FfFyF35ycj5jYwT2nsqlhqrEyC =0kU6 -----END PGP SIGNATURE----- --Sig_/cewI3g88P4=WQ__Y4mACAgb--