From owner-freebsd-fs@freebsd.org Sun Mar 6 03:06:50 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3D2E8A92FF3 for ; Sun, 6 Mar 2016 03:06:50 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 1DFAFC24 for ; Sun, 6 Mar 2016 03:06:50 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: by mailman.ysv.freebsd.org (Postfix) id 1A75DA92FF1; Sun, 6 Mar 2016 03:06:50 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 19E15A92FEF; Sun, 6 Mar 2016 03:06:50 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 6DD9FC20; Sun, 6 Mar 2016 03:06:48 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) IronPort-PHdr: 9a23:y0MephYh0UWE+WqeSx8+bf3/LSx+4OfEezUN459isYplN5qZpcu6bnLW6fgltlLVR4KTs6sC0LqJ9f+wEjVQsN6oizMrTt9lb1c9k8IYnggtUoauKHbQC7rUVRE8B9lIT1R//nu2YgB/Ecf6YEDO8DXptWZBUiv2OQc9HOnpAIma153xjLDtvcKDKFwY1XKUWvBbElaflU3prM4YgI9veO4a6yDihT92QdlQ3n5iPlmJnhzxtY+a9Z9n9DlM6bp6r5YTGY2zRakzTKRZATI6KCh1oZSz7ViQBTeIs1gRVC0znwBSEkCR7xz8dpnrvybwreY73zOVa57YV7cxDA6j5KQjbRbjiyMKMnZt6mTegc90gadzvRWuuhF7246Sa4jDZ6k2Rb/UYd5PHTkJZc1WTSEUR9rkN4Y= X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DQAQB/ndtW/61jaINdFoN2bQa6OgENgWkXCoUkSgKBVBQBAQEBAQEBAWMngi2CFAEBAQMBAQEBIAQnIAsFCwIBCBgCAg0ZAgInAQkmAgQIBwQBHAICh3kIDrA+jlABAQEBAQEEAQEBAQEBAQEUBHuFHIF3gkaEGwEBG4MCgToFh1iGTD2ISYVjhSGETIREiFOOUwIeAQFChAIeLgEBAQSIBzR+AQEB X-IronPort-AV: E=Sophos;i="5.22,544,1449550800"; d="scan'208";a="270953938" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-annu.net.uoguelph.ca with ESMTP; 05 Mar 2016 22:06:41 -0500 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 450B015F565; Sat, 5 Mar 2016 22:06:41 -0500 (EST) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id YFPGxz6bFf3n; Sat, 5 Mar 2016 22:06:40 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 7F3BC15F56D; Sat, 5 Mar 2016 22:06:40 -0500 (EST) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id SFuSwJoICiuZ; Sat, 5 Mar 2016 22:06:40 -0500 (EST) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 61E4915F565; Sat, 5 Mar 2016 22:06:40 -0500 (EST) Date: Sat, 5 Mar 2016 22:06:40 -0500 (EST) From: Rick Macklem To: Ken Merry Cc: fs@freebsd.org, scsi@freebsd.org, Robert Watson Message-ID: <800018199.6694281.1457233600357.JavaMail.zimbra@uoguelph.ca> In-Reply-To: References: Subject: Re: FUSE extended attribute patches available MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [172.17.95.12] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF44 (Win)/8.0.9_GA_6191) Thread-Topic: FUSE extended attribute patches available Thread-Index: hQcBf8vLnYvQAb3eet2rDypFlNoDHA== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Mar 2016 03:06:50 -0000 Ken Merry wrote: > I have patches for FreeBSD=E2=80=99s FUSE filesystem kernel module to sup= port > extended attributes: >=20 > https://people.freebsd.org/~ken/fuse_extattr.20160229.1.txt >=20 The only bit of code I have that might be useful for this patch is: =09case FUSE_GETXATTR: =09case FUSE_LISTXATTR: ! =09=09/* ! =09=09 * These can have varying response lengths, and 0 length ! =09=09 * isn't necessarily invalid. ! =09=09 */ ! =09=09err =3D 0; *** I came up with this: =09=09fgin =3D (struct fuse_getxattr_in *) =09=09 ((char *)ftick->tk_ms_fiov.base + =09=09 sizeof(struct fuse_in_header)); =09=09if (fgin->size =3D=3D 0) =09=09=09err =3D (blen =3D=3D sizeof(struct fuse_getxattr_out)) ? 0 : =09=09=09 EINVAL; =09=09else =09=09=09err =3D (blen <=3D fgin->size) ? 0 : EINVAL; =09=09break; I think I got the size check right? The big question is... What to do with the NAMESPACE? - My code fails for SYSTEM and does USER without prepending "user.". (That seemed to be what rwatson@ felt was reasonable. I thought our discussion was on a mailing list, but I can't find it.) I've cc'd him. Maybe he can comment again. - If you stick with prepending "user." or "system." there needs to be some way to bypass this so that attributes that don't start in "user." or "system." can be accessed. I've seen "trusted." and "glusterfs." on GlusterFS. --> Maybe a new namespace called something like "nil" that just bypasses any USER or SYSTEM checks? rick > The patch implements the get/set/delete/list extended attribute methods. = The > listing code also converts extended attribute lists from the Linux/FUSE > format to the FreeBSD format. For example: >=20 > # touch foo > # ls -la foo > -rwxrwxrwx 1 root wheel 0 Feb 29 21:40 foo > # lsextattr user foo > foo > # setextattr user testattr1 "12345678" foo > # lsextattr user foo > foo testattr1 > # getextattr user testattr1 foo > foo 12345678 > # setextattr user testattr2 "87654321" foo > # lsextattr user foo > foo testattr2 testattr1 > # rmextattr user testattr1 foo > # lsextattr user foo > foo testattr2 > # getextattr user testattr1 foo > getextattr: foo: failed: Attribute not found > # getextattr user testattr2 foo > foo 87654321 >=20 >=20 > Just to be clear on what this does, it only provides extended attribute > support to FreeBSD applications if the underlying FUSE filesystem impleme= nts > FUSE extended attribute support. Many FUSE filesystems don=E2=80=99t sup= port the > extended attribute VFS operations. >=20 > I have tested this out on IBM=E2=80=99s LTFS implementation, but I have n= ot yet found > another FUSE filesystem that supports extended attributes. If anyone kno= ws > of one, please let me know so I can try it out. (I looked through a numb= er > of the filesystems in sysutils/fusefs* in the ports tree.) >=20 > Any feedback is welcome. I=E2=80=99m planning to check this into FreeBSD= /head in the > next week or so. >=20 > Obviously, I=E2=80=99ve also ported IBM=E2=80=99s LTFS implementation to = FreeBSD. It works > in the standard FUSE mode, and you can also link it into an application a= s a > library if you don=E2=80=99t want to incur the overhead of running throug= h FUSE. I > haven=E2=80=99t gotten around to packaging it up to go out for testing / = review. >=20 > If anyone has IBM LTO-5 or newer tape drives, or IBM TS1140 or newer tape > drives, and wants to try it out, let me know. I=E2=80=99ll send you the = code when > I=E2=80=99ve got it at least somewhat ready. This is IBM-specific, and w= on=E2=80=99t work > on HP tape drives. >=20 > Ken > =E2=80=94 > Ken Merry > ken@FreeBSD.ORG >=20 >=20 >=20 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@freebsd.org Sun Mar 6 07:21:00 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4EB6CA95E57 for ; Sun, 6 Mar 2016 07:21:00 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3FB10E94 for ; Sun, 6 Mar 2016 07:21:00 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u267KwB1092157 for ; Sun, 6 Mar 2016 07:21:00 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 176449] zfs(1): ZFS NFS export went wrong with special hostname character Date: Sun, 06 Mar 2016 07:20:59 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: bin X-Bugzilla-Version: 9.1-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: la5lbtyi@aon.at X-Bugzilla-Status: In Progress X-Bugzilla-Resolution: X-Bugzilla-Priority: Normal X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc attachments.created Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Mar 2016 07:21:00 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D176449 Martin Birgmeier changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |la5lbtyi@aon.at --- Comment #3 from Martin Birgmeier --- Created attachment 167756 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=3D167756&action= =3Dedit Patch cddl/compat/opensolaris/misc/fsshare.c for saner options parsing Please use the attached patch. -- Martin --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Sun Mar 6 14:48:58 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E0471AC08AE for ; Sun, 6 Mar 2016 14:48:57 +0000 (UTC) (envelope-from richard.elling@richardelling.com) Received: from mail-pf0-x230.google.com (mail-pf0-x230.google.com [IPv6:2607:f8b0:400e:c00::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B611ED7 for ; Sun, 6 Mar 2016 14:48:57 +0000 (UTC) (envelope-from richard.elling@richardelling.com) Received: by mail-pf0-x230.google.com with SMTP id 63so64052518pfe.3 for ; Sun, 06 Mar 2016 06:48:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=richardelling.com; s=google; h=from:mime-version:subject:in-reply-to:date:cc:message-id:references :to; bh=bbzXCrDE6syxSdN4o9Ccw/KWeGKeKT1V1RnlzODgOeU=; b=aBcRt9PKAWh0DEtXL1rJ/C1wmJGBYv0yhbhmcu6CCJZw4OHcrLEl93VPjATK80Q2iv Zii92g4Wwl8xsAXmWanK28f6gZ8BG2yag4xJbvXgWGw9DrGKgJan0TRwlyfletci6F/U p2n6qCTvsfHlbAE/FKM+AGk1S+Ajs8HiNJxPQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:mime-version:subject:in-reply-to:date:cc :message-id:references:to; bh=bbzXCrDE6syxSdN4o9Ccw/KWeGKeKT1V1RnlzODgOeU=; b=j+08m5d1T5tXor9jfLJTGqxym+W2mI7Qe2EnGTI6jMYa33t8opW7MosaPEutal+b32 CqYpOBXPNB6DKAqOpQsJ/8incriLf6dxY+qRjmtvyLC4KOTo88llJraUjmsD5fC9csnb WD0vK1WRNHcyY+NMbzg9juDOEtMa3OgF4+gR/ZGEyaGSKpS2PiW8wjUzv67cqruJFwl9 5yWgs39JGCfqvCsG4IyLv/qb5qAGP+vMpiG0Q4FeMYg3869VXz++vP4nG1P2N8u0/7Ef PAFKb0iL7RI5aF+fvzjhDOQyN77QkGEhh4Aujeg/A0OdrCPSAgl6Y5Q2BDvLNgH7ZIlo 3S1g== X-Gm-Message-State: AD7BkJLBr+R+iCzs/bPfOs/UjhDBSW9tXL68yfhG+JJ+4+VsNBgYeEWFPQtZg2rHBE1U2Q== X-Received: by 10.98.7.11 with SMTP id b11mr26986582pfd.38.1457275737079; Sun, 06 Mar 2016 06:48:57 -0800 (PST) Received: from [192.168.129.105] ([162.250.162.10]) by smtp.gmail.com with ESMTPSA id e79sm18206855pfb.76.2016.03.06.06.48.55 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sun, 06 Mar 2016 06:48:56 -0800 (PST) From: Richard Elling X-Google-Original-From: Richard Elling Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\)) Subject: Re: [smartos-discuss] an interesting survey -- the zpool with most disks you have ever built In-Reply-To: Date: Sun, 6 Mar 2016 06:49:36 -0800 Cc: developer , "developer@lists.open-zfs.org" , illumos-developer , omnios-discuss , Discussion list for OpenIndiana , illumos-zfs , "zfs-discuss@list.zfsonlinux.org" , "freebsd-fs@FreeBSD.org" , "zfs-devel@freebsd.org" Message-Id: <5158F354-9636-4031-9536-E99450F312B3@RichardElling.com> References: <95563acb-d27b-4d4b-b8f3-afeb87a3d599@me.com> <56D87784.4090103@broken.net> To: smartos-discuss@lists.smartos.org X-Mailer: Apple Mail (2.3112) X-Mailman-Approved-At: Sun, 06 Mar 2016 15:11:00 +0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.21 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Mar 2016 14:48:58 -0000 > On Mar 3, 2016, at 8:35 PM, Fred Liu wrote: >=20 > Hi, >=20 > Today when I was reading Jeff's new nuclear weapon -- DSSD D5's CUBIC = RAID introduction, > the interesting survey -- the zpool with most disks you have ever = built popped in my brain. We test to 2,000 drives. Beyond 2,000 there are some scalability issues = that impact failover times. We=E2=80=99ve identified these and know what to fix, but need a real = customer at this scale to bump it to the top of the priority queue. >=20 > For zfs doesn't support nested vdev, the maximum fault tolerance = should be three(from raidz3). Pedantically, it is N, because you can have N-way mirroring. > It is stranded if you want to build a very huge pool. Scaling redundancy by increasing parity improves data loss protection by = about 3 orders of=20 magnitude. Adding capacity by striping reduces data loss protection by = 1/N. This is why there is not much need to go beyond raidz3. However, if you do want to go there, = adding raidz4+ is=20 relatively easy. =E2=80=94 richard -- Richard.Elling@RichardElling.com +1-760-896-4422 From owner-freebsd-fs@freebsd.org Sun Mar 6 21:00:04 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1C587AC28C7 for ; Sun, 6 Mar 2016 21:00:04 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id EFB9F77B for ; Sun, 6 Mar 2016 21:00:03 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u26L01Ht008958 for ; Sun, 6 Mar 2016 21:00:03 GMT (envelope-from bugzilla-noreply@FreeBSD.org) Message-Id: <201603062100.u26L01Ht008958@kenobi.freebsd.org> From: bugzilla-noreply@FreeBSD.org To: freebsd-fs@FreeBSD.org Subject: Problem reports for freebsd-fs@FreeBSD.org that need special attention Date: Sun, 06 Mar 2016 21:00:03 +0000 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Mar 2016 21:00:04 -0000 To view an individual PR, use: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=(Bug Id). The following is a listing of current problems submitted by FreeBSD users, which need special attention. These represent problem reports covering all versions including experimental development code and obsolete releases. Status | Bug Id | Description ------------+-----------+--------------------------------------------------- New | 203492 | mount_unionfs -o below causes panic Open | 136470 | [nfs] Cannot mount / in read-only, over NFS Open | 139651 | [nfs] mount(8): read-only remount of NFS volume d Open | 144447 | [zfs] sharenfs fsunshare() & fsshare_main() non f 4 problems total for which you should take action. From owner-freebsd-fs@freebsd.org Mon Mar 7 02:46:21 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0C499AC20A0 for ; Mon, 7 Mar 2016 02:46:21 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id F144D91F for ; Mon, 7 Mar 2016 02:46:20 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u272kKuV038296 for ; Mon, 7 Mar 2016 02:46:20 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 207432] panic: nvme_ctrlr_intx_handler Date: Mon, 07 Mar 2016 02:46:20 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.3-BETA2 X-Bugzilla-Keywords: crash, needs-qa, regression X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: jimharris@FreeBSD.org X-Bugzilla-Status: Closed X-Bugzilla-Resolution: FIXED X-Bugzilla-Priority: Normal X-Bugzilla-Assigned-To: jimharris@FreeBSD.org X-Bugzilla-Flags: mfc-stable10? X-Bugzilla-Changed-Fields: resolution bug_status Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2016 02:46:21 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D207432 Jim Harris changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Status|Open |Closed --=20 You are receiving this mail because: You are on the CC list for the bug.= From owner-freebsd-fs@freebsd.org Mon Mar 7 05:30:21 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D54AFAC18DD; Mon, 7 Mar 2016 05:30:21 +0000 (UTC) (envelope-from fred.fliu@gmail.com) Received: from mail-lb0-x22d.google.com (mail-lb0-x22d.google.com [IPv6:2a00:1450:4010:c04::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3D6B2BB4; Mon, 7 Mar 2016 05:30:21 +0000 (UTC) (envelope-from fred.fliu@gmail.com) Received: by mail-lb0-x22d.google.com with SMTP id xr8so21278811lbb.1; Sun, 06 Mar 2016 21:30:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=KiUTDCWMZHiucS1Az4x26aP72DLMMi4Mc2CA63uITfk=; b=cfhY897cjAr1f+SaVaLM03Oaaav7RQqsQK5ZK4N5J1Cg1RHjXKxsfDOLssvbfSz595 BEOCIFNbOmKuTD0e06wFAZYJukZx+bgRHAGK600lQpOJZxroB0kk2tIjyj+/7SljWPoH sw0uYTjZ56U88SdHyXH0K7ngv07hvoYqvFgNo0Xrmz7297XXnSO1+NSZ1HWqNyTTuYCv aY+n01CAo/LaUZ8qervQOBu2fgLoEZLavpETTUGtrkTAIbKDAIntFtcpGc1khmikvC71 jeLRvKMJf3b+43mIJeGluqfzMBgDdKw9rraajc1pQP2HiLvfv/0Ogv9u9X7etJvF3aAj aiKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=KiUTDCWMZHiucS1Az4x26aP72DLMMi4Mc2CA63uITfk=; b=ctiOGgrerI8z0y88ftgDMvUONN/J6T51q9NFtIyDd4PSnnTbnvrBl4iQq21uu+Jpoa EyIU2o7odDZxPYy5VdHNL8XSEmVZoneN8PJ21H7VDG2gDi5kxBGCqmSi2YRuebhHeuxL v6R4+wWI0zvI2g0Fp18H1ptahHsKLeB9aSfSLKKDy6VEBctPTTzlXTz2waR2OTD0ajkx Gl0DfC1EhJ1Tk0Bzi7mfh+lr5baRKiejKstT4yxL/4YwDpIpzSYT26ZnZlYS/f55VK0J INhoaAd9S6ao+3AJk0jKUcww1CdmiKm1+KNb+mBkSpXjS8xyajbQBS0f8S3eNMjn6ev9 YFAg== X-Gm-Message-State: AD7BkJKkIgDGVhoBFyLdBdPENlshWAUxR1JaKCnWK3zp8rfZW9t2Ad9fa6OEYlLxIctB6J3Wh9bgnDsiHVYRFQ== MIME-Version: 1.0 X-Received: by 10.25.218.148 with SMTP id r142mr6807289lfg.154.1457328618990; Sun, 06 Mar 2016 21:30:18 -0800 (PST) Received: by 10.25.20.164 with HTTP; Sun, 6 Mar 2016 21:30:18 -0800 (PST) In-Reply-To: References: <95563acb-d27b-4d4b-b8f3-afeb87a3d599@me.com> <56D87784.4090103@broken.net> Date: Mon, 7 Mar 2016 13:30:18 +0800 Message-ID: Subject: Re: [zfs] an interesting survey -- the zpool with most disks you have ever built From: Fred Liu To: illumos-zfs Cc: "smartos-discuss@lists.smartos.org" , developer , illumos-developer , omnios-discuss , Discussion list for OpenIndiana , "zfs-discuss@list.zfsonlinux.org" , "freebsd-fs@FreeBSD.org" , "zfs-devel@freebsd.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.21 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2016 05:30:22 -0000 2016-03-05 0:01 GMT+08:00 Freddie Cash : > On Mar 4, 2016 2:05 AM, "Fred Liu" wrote: > > 2016-03-04 13:47 GMT+08:00 Freddie Cash : > >> > >> Currently, I just use a simple coordinate system. Columns are letters, > rows are numbers. > >> "smartos-discuss@lists.smartos.org" >=E3=80=81 > > developer =E3=80=81 > > illumos-developer =E3=80=81 > > omnios-discuss =E3=80=81 > > Discussion list for OpenIndiana =E3= =80=81 > > illumos-zfs =E3=80=81 > > "zfs-discuss@list.zfsonlinux.org" =E3=80= =81 > > "freebsd-fs@FreeBSD.org" =E3=80=81 > > "zfs-devel@freebsd.org" > > >> Each disk is partitioned using GPT with the first (only) partition > starting at 1 MB and covering the whole disk, and labelled with the > column/row where it is located (disk-a1, disk-g6, disk-p3, etc). > > > > [Fred]: So you manually pull off all the drives one by one to locate > them? > > =E2=80=8BWhen putting the system together for the first time, I insert ea= ch disk > one at a time, wait for it to be detected, partition it, then label it > based on physical location.=E2=80=8B Then do the next one. It's just pa= rt of the > normal server build process, whether it has 2 drives, 20 drives, or 200 > drives. > > =E2=80=8BWe build all our own servers from off-the-shelf parts; we don't = buy > anything pre-built from any of the large OEMs.=E2=80=8B > [Fred]: Gotcha! > >> The pool is created using the GPT labels, so the label shows in "zpool > list" output. > > > > [Fred]: What will the output look like? > > =E2=80=8BFrom our smaller backups server, with just 24 drive bays: > > $ zpool status storage > > pool: storage > > state: ONLINE > > status: Some supported features are not enabled on the pool. The pool can > > still be used, but some features are unavailable. > > action: Enable all features using 'zpool upgrade'. Once this is done, > > the pool may no longer be accessible by software that does not support > > the features. See zpool-features(7) for details. > > scan: scrub canceled on Wed Feb 17 12:02:20 2016 > > config: > > > NAME STATE READ WRITE CKSUM > > storage ONLINE 0 0 0 > > raidz2-0 ONLINE 0 0 0 > > gpt/disk-a1 ONLINE 0 0 0 > > gpt/disk-a2 ONLINE 0 0 0 > > gpt/disk-a3 ONLINE 0 0 0 > > gpt/disk-a4 ONLINE 0 0 0 > > gpt/disk-a5 ONLINE 0 0 0 > > gpt/disk-a6 ONLINE 0 0 0 > > raidz2-1 ONLINE 0 0 0 > > gpt/disk-b1 ONLINE 0 0 0 > > gpt/disk-b2 ONLINE 0 0 0 > > gpt/disk-b3 ONLINE 0 0 0 > > gpt/disk-b4 ONLINE 0 0 0 > > gpt/disk-b5 ONLINE 0 0 0 > > gpt/disk-b6 ONLINE 0 0 0 > > raidz2-2 ONLINE 0 0 0 > > gpt/disk-c1 ONLINE 0 0 0 > > gpt/disk-c2 ONLINE 0 0 0 > > gpt/disk-c3 ONLINE 0 0 0 > > gpt/disk-c4 ONLINE 0 0 0 > > gpt/disk-c5 ONLINE 0 0 0 > > gpt/disk-c6 ONLINE 0 0 0 > > raidz2-3 ONLINE 0 0 0 > > gpt/disk-d1 ONLINE 0 0 0 > > gpt/disk-d2 ONLINE 0 0 0 > > gpt/disk-d3 ONLINE 0 0 0 > > gpt/disk-d4 ONLINE 0 0 0 > > gpt/disk-d5 ONLINE 0 0 0 > > gpt/disk-d6 ONLINE 0 0 0 > > cache > > gpt/cache0 ONLINE 0 0 0 > > gpt/cache1 ONLINE 0 0 0 > > > errors: No known data errors > > The 90-bay systems look the same, just that the letters go all the way to > p (so disk-p1 through disk-p6). And there's one vdev that uses 3 drives > from each chassis (7x 6-disk vdev only uses 42 drives of the 45-bay > chassis, so there's lots of spares if using a single chassis; using two > chassis, there's enough drives to add an extra 6-disk vdev). > [Fred]: It looks like the gpt label shown in "zpool status" only works in FreeBSD/FreeNAS. Are you using FreeBSD/FreeNAS? I can't find the similar possibilities in Illumos/Linux. Thanks, Fred > > *illumos-zfs* | Archives > > | > Modify > > Your Subscription > From owner-freebsd-fs@freebsd.org Mon Mar 7 06:09:19 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 78281AC25E3 for ; Mon, 7 Mar 2016 06:09:19 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-ig0-x22d.google.com (mail-ig0-x22d.google.com [IPv6:2607:f8b0:4001:c05::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3E9C8FCA for ; Mon, 7 Mar 2016 06:09:19 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: by mail-ig0-x22d.google.com with SMTP id ig19so11698011igb.1 for ; Sun, 06 Mar 2016 22:09:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=LqH+guyl8SRdNq9AElXj0Rh8AFN+Rf7y9Saoz7sase8=; b=WbUCbHZJNFs7+s+6KFyGBnkaeK/C92qW+wQn5rgIYE9x5vJPK3IJJ4ZS9TVIkldZlI HmdevvEbhUY2DG9JFe7NmYSDCoA20D4fO+6vvzOxMyPbC7QRwOoeDcONVCeFlVQxNj4w hgnsTdVae9GFpi56i4JrsLTr12Z2ukjee7ZZ75g3mr8dxHQr4yBjh46Q8iLkrDc8pcwT 5V9pXdQfaWkr47kA+qSZQmfi83XrfLk8xCcHhvgMbRqsS6XqQGb5YYXEm6n6532swSHq 2ErfQbmPXyuLLiltHipAB8KWuv4NEJKBb5URX01yfYygTODjbgCB9WPzcEnBdNsqwfF2 JPjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=LqH+guyl8SRdNq9AElXj0Rh8AFN+Rf7y9Saoz7sase8=; b=W8aG32TM66XADKps4BR524/9ue62jzF5fPSO5MUBmI0MvPsASgKRM1ihuLEUtMkMMw eUFO8h1KParM9YGIExrLgA1puHFCeC42oiYj0w80tiqPBK+lB7u+WQPin2jDQhc3qtit WcTc4+N9XAgtXskn0W4nIeYt2Cunkg8QAw6RJggNeyAJrPOZ+4RDiFtV0i6WrWN2ae58 Ad7TyTyr2jWfn9Mf/K4xNJNRW98oFRDdV6drJVcWZMKcHqupwKYqPSb60A2MtLjQXdaV ibE+LoYT4uglS5qSKOB6+Q9GRa/z3Eojr91a4v95NwZ5EZ2lmZrPdMKY4arjvdDqg3sD Ks9A== X-Gm-Message-State: AD7BkJLY1T/wfOjtBcxy43rU1SU2I9fm7JxZP4G/AiJ4esrfmTnTsQo5q5RMwN/v6Vq5BQ24Ngt4Vn45wMJ3aQ== MIME-Version: 1.0 X-Received: by 10.50.3.70 with SMTP id a6mr9985875iga.40.1457330958589; Sun, 06 Mar 2016 22:09:18 -0800 (PST) Received: by 10.107.140.129 with HTTP; Sun, 6 Mar 2016 22:09:18 -0800 (PST) Received: by 10.107.140.129 with HTTP; Sun, 6 Mar 2016 22:09:18 -0800 (PST) In-Reply-To: References: <95563acb-d27b-4d4b-b8f3-afeb87a3d599@me.com> <56D87784.4090103@broken.net> Date: Sun, 6 Mar 2016 22:09:18 -0800 Message-ID: Subject: Re: [zfs] an interesting survey -- the zpool with most disks you have ever built From: Freddie Cash To: Fred Liu Cc: zfs@lists.illumos.org, FreeBSD Filesystems Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.21 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2016 06:09:19 -0000 On Mar 6, 2016 9:30 PM, "Fred Liu" wrote: > [Fred]: It looks like the gpt label shown in "zpool status" only works in > FreeBSD/FreeNAS. Are you using FreeBSD/FreeNAS? I can't find the similar > possibilities in Illumos/Linux. Yes, we use FreeBSD for our storage systems. Cheers, Freddie From owner-freebsd-fs@freebsd.org Mon Mar 7 06:11:46 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4E750AC27D1 for ; Mon, 7 Mar 2016 06:11:46 +0000 (UTC) (envelope-from fred.fliu@gmail.com) Received: from mail-lb0-x235.google.com (mail-lb0-x235.google.com [IPv6:2a00:1450:4010:c04::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C65181E5 for ; Mon, 7 Mar 2016 06:11:45 +0000 (UTC) (envelope-from fred.fliu@gmail.com) Received: by mail-lb0-x235.google.com with SMTP id k15so118693291lbg.0 for ; Sun, 06 Mar 2016 22:11:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=kc8wvBx1vK2/RNIejHy3DMfDNlFkcfaQfuyfPuCziuM=; b=SwxVFKfD8Uz0NmymNg3S25HJIVC53b2L9v2t2G8deiTkX9RlMRqX6pmp7EJPQ9b1gZ vGIdRpr9qTR3zJsbsiXHg6DGCYVrCdKMyuI7pR7XpBRrxhPaxL1UchX8vX1cjoNIIjpl vzj9mXo4q7e5GEpEgd1q9aIO4UheB3mMgStAvU9lpAh7ecfTY9ModLljpdkn17LM6omf QlhyblFDdZDhuYCBLJRhBoj/DdZQg6OmS7OGrIytAJCx6+29ridNHjdfv5yin6Ke5OSZ a+lbF9xfbGrhFAvOSTRT0us0ST+Eoj3JEF2Fw1QM8oeYPneFi33yoBzzwNm2tgKQMzkS 0vSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=kc8wvBx1vK2/RNIejHy3DMfDNlFkcfaQfuyfPuCziuM=; b=Nnyp213hQ+KnjJjeGacr6UgmWjuUvikxv2wumDxNPPxdoKehJeNeJlCUTygCjCl9QR apQuOCuZmP0CoA5fd8cePCnKMtBoasqNPKN5Tz/rNirdyt5SzOgoQwukJh7sEZrfW0Ly CvkhwWxyB2sLYeFPfn4ybYJBWmdh2MncBocAHTzMo86pUzcgzHMzbmTvddW/pGp1Khs7 OeKhRLNPxzMOjx0VDCApgfAdivUIzdRIRkJHncmq9LctPapPz1KUeUlofr5Alk3pUVmV CKdb8eBURJaKjfwZRG9HOLZdlzVsChpjG1h6mnr0rQKWQlUOim/XdX4T7rvv48YIVkVq d/1Q== X-Gm-Message-State: AD7BkJKQkgoqg0V/SsYTTPCb7vLXBvlMJMiBd/BdtJ6SfwE50OJHd6NTotrBo69uLE/A2i1qPpBHgiMArdG1ZQ== MIME-Version: 1.0 X-Received: by 10.112.150.165 with SMTP id uj5mr1936378lbb.95.1457331103650; Sun, 06 Mar 2016 22:11:43 -0800 (PST) Received: by 10.25.20.164 with HTTP; Sun, 6 Mar 2016 22:11:43 -0800 (PST) In-Reply-To: References: <95563acb-d27b-4d4b-b8f3-afeb87a3d599@me.com> <56D87784.4090103@broken.net> Date: Mon, 7 Mar 2016 14:11:43 +0800 Message-ID: Subject: Re: [zfs] an interesting survey -- the zpool with most disks you have ever built From: Fred Liu To: Freddie Cash Cc: illumos-zfs , FreeBSD Filesystems Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2016 06:11:46 -0000 2016-03-07 14:09 GMT+08:00 Freddie Cash : > > On Mar 6, 2016 9:30 PM, "Fred Liu" wrote: > >> [Fred]: It looks like the gpt label shown in "zpool status" only works in >> FreeBSD/FreeNAS. Are you using FreeBSD/FreeNAS? I can't find the similar >> possibilities in Illumos/Linux. > > Yes, we use FreeBSD for our storage systems. > > Gotcha! This workaround is really cool when there is no SES solution! Thanks! Fred From owner-freebsd-fs@freebsd.org Mon Mar 7 07:16:14 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D8147AC3E5D for ; Mon, 7 Mar 2016 07:16:14 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id C4FE7E02 for ; Mon, 7 Mar 2016 07:16:14 +0000 (UTC) (envelope-from julian@freebsd.org) Received: by mailman.ysv.freebsd.org (Postfix) id C0798AC3E5B; Mon, 7 Mar 2016 07:16:14 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BFF22AC3E5A; Mon, 7 Mar 2016 07:16:14 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "vps1.elischer.org", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 98542E01; Mon, 7 Mar 2016 07:16:14 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from julian-mbp3.pixel8networks.com (50-196-156-133-static.hfc.comcastbusiness.net [50.196.156.133]) (authenticated bits=0) by vps1.elischer.org (8.15.2/8.15.2) with ESMTPSA id u277GBoT069594 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Sun, 6 Mar 2016 23:16:12 -0800 (PST) (envelope-from julian@freebsd.org) Subject: Re: FUSE extended attribute patches available To: Rick Macklem , Ken Merry References: <800018199.6694281.1457233600357.JavaMail.zimbra@uoguelph.ca> Cc: Robert Watson , fs@freebsd.org, scsi@freebsd.org From: Julian Elischer Message-ID: <56DD2AB6.1030407@freebsd.org> Date: Sun, 6 Mar 2016 23:16:06 -0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <800018199.6694281.1457233600357.JavaMail.zimbra@uoguelph.ca> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2016 07:16:14 -0000 On 5/03/2016 7:06 PM, Rick Macklem wrote: > Ken Merry wrote: >> I have patches for FreeBSD’s FUSE filesystem kernel module to support >> extended attributes: oh showing off your masochistic side eh? >> https://people.freebsd.org/~ken/fuse_extattr.20160229.1.txt >> I spent an hour beating my head against fuse yesterday. then realised that it's an old version on our product. We really have to get off 8.0 (hopefully a matter of weeks now to a 10.x switch) Now all I need is to find a FreeBSD filesystem expert (ZFS/NFS/CIFS/GFS) to hire. > The only bit of code I have that might be useful for this patch is: > case FUSE_GETXATTR: > case FUSE_LISTXATTR: > ! /* > ! * These can have varying response lengths, and 0 length > ! * isn't necessarily invalid. > ! */ > ! err = 0; > *** I came up with this: > fgin = (struct fuse_getxattr_in *) > ((char *)ftick->tk_ms_fiov.base + > sizeof(struct fuse_in_header)); > if (fgin->size == 0) > err = (blen == sizeof(struct fuse_getxattr_out)) ? 0 : > EINVAL; > else > err = (blen <= fgin->size) ? 0 : EINVAL; > break; > I think I got the size check right? > > The big question is... > What to do with the NAMESPACE? > - My code fails for SYSTEM and does USER without prepending "user.". > (That seemed to be what rwatson@ felt was reasonable. I thought our > discussion was on a mailing list, but I can't find it.) > I've cc'd him. Maybe he can comment again. Is there a standard for extended attributes I should knwo about? It seems to me that it's a bit like the wild west. Extended attributes seem to be "every OS for himself". > > - If you stick with prepending "user." or "system." there needs to be > some way to bypass this so that attributes that don't start in "user." > or "system." can be accessed. I've seen "trusted." and "glusterfs." > on GlusterFS. > --> Maybe a new namespace called something like "nil" that just bypasses > any USER or SYSTEM checks? > > rick > >> The patch implements the get/set/delete/list extended attribute methods. The >> listing code also converts extended attribute lists from the Linux/FUSE >> format to the FreeBSD format. For example: >> >> # touch foo >> # ls -la foo >> -rwxrwxrwx 1 root wheel 0 Feb 29 21:40 foo >> # lsextattr user foo >> foo >> # setextattr user testattr1 "12345678" foo >> # lsextattr user foo >> foo testattr1 >> # getextattr user testattr1 foo >> foo 12345678 >> # setextattr user testattr2 "87654321" foo >> # lsextattr user foo >> foo testattr2 testattr1 >> # rmextattr user testattr1 foo >> # lsextattr user foo >> foo testattr2 >> # getextattr user testattr1 foo >> getextattr: foo: failed: Attribute not found >> # getextattr user testattr2 foo >> foo 87654321 >> >> >> Just to be clear on what this does, it only provides extended attribute >> support to FreeBSD applications if the underlying FUSE filesystem implements >> FUSE extended attribute support. Many FUSE filesystems don’t support the >> extended attribute VFS operations. >> >> I have tested this out on IBM’s LTFS implementation, but I have not yet found >> another FUSE filesystem that supports extended attributes. If anyone knows >> of one, please let me know so I can try it out. (I looked through a number >> of the filesystems in sysutils/fusefs* in the ports tree.) >> >> Any feedback is welcome. I’m planning to check this into FreeBSD/head in the >> next week or so. >> >> Obviously, I’ve also ported IBM’s LTFS implementation to FreeBSD. It works >> in the standard FUSE mode, and you can also link it into an application as a >> library if you don’t want to incur the overhead of running through FUSE. I >> haven’t gotten around to packaging it up to go out for testing / review. >> >> If anyone has IBM LTO-5 or newer tape drives, or IBM TS1140 or newer tape >> drives, and wants to try it out, let me know. I’ll send you the code when >> I’ve got it at least somewhat ready. This is IBM-specific, and won’t work >> on HP tape drives. >> >> Ken >> — >> Ken Merry >> ken@FreeBSD.ORG >> >> >> >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > From owner-freebsd-fs@freebsd.org Mon Mar 7 07:59:42 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9D069AC2706 for ; Mon, 7 Mar 2016 07:59:42 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 8CB4EE75 for ; Mon, 7 Mar 2016 07:59:42 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: by mailman.ysv.freebsd.org (Postfix) id 87C6CAC2701; Mon, 7 Mar 2016 07:59:42 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6EDACAC2700; Mon, 7 Mar 2016 07:59:42 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [198.74.231.69]) by mx1.freebsd.org (Postfix) with ESMTP id 2F557E73; Mon, 7 Mar 2016 07:59:42 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from [10.0.1.12] (host81-157-243-217.range81-157.btcentralplus.com [81.157.243.217]) by cyrus.watson.org (Postfix) with ESMTPSA id CC86A46B2E; Mon, 7 Mar 2016 02:59:34 -0500 (EST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: FUSE extended attribute patches available From: Robert Watson In-Reply-To: <56DD2AB6.1030407@freebsd.org> Date: Mon, 7 Mar 2016 07:59:33 +0000 Cc: Rick Macklem , Ken Merry , fs@freebsd.org, scsi@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <6AF0FC23-CC34-43EA-A008-9FB82FB21558@FreeBSD.org> References: <800018199.6694281.1457233600357.JavaMail.zimbra@uoguelph.ca> <56DD2AB6.1030407@freebsd.org> To: Julian Elischer X-Mailer: Apple Mail (2.2104) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2016 07:59:42 -0000 FreeBSD and Linux=E2=80=99s extended-attribute models were inherited = from IRIX, as they were introduced to solve the same problems: a place = to metadata such as ACLs, MAC labels, capability masks, etc. IRIX had = three namespaces: one each for =E2=80=9Cuser=E2=80=9D, =E2=80=9Croot=E2=80= =9D, and =E2=80=9Csecure=E2=80=9D, reflecting whether or not they were = managed by the file owner (or permissions), the privileged root user, or = part of the TCB protection mechanism (e.g., for integrity labels). These extended attributes should not be confused with the filesystem = feature of the same name in NFSv4, which is sometimes known by the name = =E2=80=9Cfile fork=E2=80=9D or =E2=80=9Cdata streams=E2=80=9D. EAs in = IRIX/FreeBSD/Linux/HPFS/etc are tuple pairs of names and values intended = to be written atomically or updated in place specifically for (shortish) = metadata such as ACLs, rather than being complete separate data spaces = for I/O (e.g., that could be memory mapped). In FreeBSD=E2=80=99s design, we incorporated the disjoint namespace = model, providing USER and SYSTEM, the former being managed by the file = owner (and those given suitable permission), and the latter being used = for TCB mechanisms such as the implementations of MAC labels, ACLs, etc. In Linux, they adopted a more free-form mechanism based on a single = combined namespace with a prefix =E2=80=94 e.g., user.FOO, and = system.BAR. Over time it looks like that namespace has been expanded in = various filesystem-specific ways. We also have room to expand our = namespace, but from the description below, it=E2=80=99s not clear quite = what the right mechanism is. One path would be to introduce a new namespace for filesystem-specific = attributes =E2=80=94 e.g., EXTATTR_NAMESPACE_FS? But I think the key question here is whether the existing namespaces can = provide the semantics you need. If not, then we likely need a new = namespace. But then we get the question as to who controls use of the = namespace. Certainly =E2=80=9Cthe filesystem=E2=80=9D is one option, but = then you will get inconsistency in approaches between filesystems and = applications =E2=80=94 across various dimensions including protection = (who can read/modify them?), allocation (who decides what names should = be used for what?), and semantics (what applications can use them, and = who backs them up?). For example: who should be responsible for backing up those attributes? = For =E2=80=98system=E2=80=99 attributes in FreeBSD, it is assumed that = backup tools will be aware of the services layered over the attributes = =E2=80=94 e.g., that they will back up ACLs using the ACL API, rather = than backing up the binary EAs holding the ACLs. For =E2=80=98user=E2=80=99= attributes, it is assumed that backup tools (e.g., tar) must explicitly = preserve them, since they are user-defined and user-managed. For = filesystem-specific attributes, some other choice will need to be made = =E2=80=94 perhaps filesystem-specific backup tools need to know about = them? Note that in the Linux EA model, ACLs are actually accessed via the EA = system calls, whereas in FreeBSD, ACLs are a first-class citizen in the = system-call API/ABI, and so user applications don=E2=80=99t treat them = as EAs. We made that choice as filesystems may choose themselves not to = represent ACLs as EAs, and they have real semantics visible to the VFS = layer. In Linux, I believe they chose to pass them via EAs to narrow the = system-call interface for filesystem metadata. Both are legitimate = choices, but this could also trigger discussions about whether new = attributes are best accessed via the EA interface, or new system calls. = For filesystem-specific attributes, EAs are likely the better way to go. Robert > On 7 Mar 2016, at 07:16, Julian Elischer wrote: >=20 > On 5/03/2016 7:06 PM, Rick Macklem wrote: >> Ken Merry wrote: >>> I have patches for FreeBSD=E2=80=99s FUSE filesystem kernel module = to support >>> extended attributes: > oh showing off your masochistic side eh? >=20 >>> https://people.freebsd.org/~ken/fuse_extattr.20160229.1.txt >>>=20 > I spent an hour beating my head against fuse yesterday. > then realised that it's an old version on our product. We really have = to get off 8.0 > (hopefully a matter of weeks now to a 10.x switch) > Now all I need is to find a FreeBSD filesystem expert = (ZFS/NFS/CIFS/GFS) to hire. >=20 >=20 >> The only bit of code I have that might be useful for this patch is: >> case FUSE_GETXATTR: >> case FUSE_LISTXATTR: >> ! /* >> ! * These can have varying response lengths, and 0 length >> ! * isn't necessarily invalid. >> ! */ >> ! err =3D 0; >> *** I came up with this: >> fgin =3D (struct fuse_getxattr_in *) >> ((char *)ftick->tk_ms_fiov.base + >> sizeof(struct fuse_in_header)); >> if (fgin->size =3D=3D 0) >> err =3D (blen =3D=3D sizeof(struct = fuse_getxattr_out)) ? 0 : >> EINVAL; >> else >> err =3D (blen <=3D fgin->size) ? 0 : EINVAL; >> break; >> I think I got the size check right? >>=20 >> The big question is... >> What to do with the NAMESPACE? >> - My code fails for SYSTEM and does USER without prepending "user.". >> (That seemed to be what rwatson@ felt was reasonable. I thought our >> discussion was on a mailing list, but I can't find it.) >> I've cc'd him. Maybe he can comment again. > Is there a standard for extended attributes I should knwo about? > It seems to me that it's a bit like the wild west. > Extended attributes seem to be "every OS for himself". >=20 >>=20 >> - If you stick with prepending "user." or "system." there needs to be >> some way to bypass this so that attributes that don't start in = "user." >> or "system." can be accessed. I've seen "trusted." and "glusterfs." >> on GlusterFS. >> --> Maybe a new namespace called something like "nil" that just = bypasses >> any USER or SYSTEM checks? >>=20 >> rick >>=20 >>> The patch implements the get/set/delete/list extended attribute = methods. The >>> listing code also converts extended attribute lists from the = Linux/FUSE >>> format to the FreeBSD format. For example: >>>=20 >>> # touch foo >>> # ls -la foo >>> -rwxrwxrwx 1 root wheel 0 Feb 29 21:40 foo >>> # lsextattr user foo >>> foo >>> # setextattr user testattr1 "12345678" foo >>> # lsextattr user foo >>> foo testattr1 >>> # getextattr user testattr1 foo >>> foo 12345678 >>> # setextattr user testattr2 "87654321" foo >>> # lsextattr user foo >>> foo testattr2 testattr1 >>> # rmextattr user testattr1 foo >>> # lsextattr user foo >>> foo testattr2 >>> # getextattr user testattr1 foo >>> getextattr: foo: failed: Attribute not found >>> # getextattr user testattr2 foo >>> foo 87654321 >>>=20 >>>=20 >>> Just to be clear on what this does, it only provides extended = attribute >>> support to FreeBSD applications if the underlying FUSE filesystem = implements >>> FUSE extended attribute support. Many FUSE filesystems don=E2=80=99t = support the >>> extended attribute VFS operations. >>>=20 >>> I have tested this out on IBM=E2=80=99s LTFS implementation, but I = have not yet found >>> another FUSE filesystem that supports extended attributes. If = anyone knows >>> of one, please let me know so I can try it out. (I looked through a = number >>> of the filesystems in sysutils/fusefs* in the ports tree.) >>>=20 >>> Any feedback is welcome. I=E2=80=99m planning to check this into = FreeBSD/head in the >>> next week or so. >>>=20 >>> Obviously, I=E2=80=99ve also ported IBM=E2=80=99s LTFS = implementation to FreeBSD. It works >>> in the standard FUSE mode, and you can also link it into an = application as a >>> library if you don=E2=80=99t want to incur the overhead of running = through FUSE. I >>> haven=E2=80=99t gotten around to packaging it up to go out for = testing / review. >>>=20 >>> If anyone has IBM LTO-5 or newer tape drives, or IBM TS1140 or newer = tape >>> drives, and wants to try it out, let me know. I=E2=80=99ll send you = the code when >>> I=E2=80=99ve got it at least somewhat ready. This is IBM-specific, = and won=E2=80=99t work >>> on HP tape drives. >>>=20 >>> Ken >>> =E2=80=94 >>> Ken Merry >>> ken@FreeBSD.ORG >>>=20 >>>=20 >>>=20 >>> _______________________________________________ >>> freebsd-fs@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> To unsubscribe, send any mail to = "freebsd-fs-unsubscribe@freebsd.org" >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>=20 >>=20 >=20 From owner-freebsd-fs@freebsd.org Mon Mar 7 08:21:15 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A099BAC320F for ; Mon, 7 Mar 2016 08:21:15 +0000 (UTC) (envelope-from bra@fsn.hu) Received: from dg.fsn.hu (dg.fsn.hu [84.2.225.196]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "dg.fsn.hu", Issuer "dg.fsn.hu" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 628F595B; Mon, 7 Mar 2016 08:21:14 +0000 (UTC) (envelope-from bra@fsn.hu) Received: by dg.fsn.hu (Postfix, from userid 1003) id A59D820D0; Mon, 7 Mar 2016 09:21:05 +0100 (CET) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MF-ACE0E1EA [pR: 12.2839] X-CRM114-CacheID: sfid-20160307_09210_3E1C7CED X-CRM114-Status: Good ( pR: 12.2839 ) X-DSPAM-Result: Whitelisted X-DSPAM-Processed: Mon Mar 7 09:21:05 2016 X-DSPAM-Confidence: 0.9899 X-DSPAM-Probability: 0.0000 X-DSPAM-Signature: 56dd39f1568874377112279 X-DSPAM-Factors: 27, but, 0.01000, It's+possible, 0.01000, still+get, 0.01000, Also, 0.01000, Date*21+05, 0.01000, looks, 0.01000, looks, 0.01000, a+real, 0.01000, Received*online.co.hu+[195.228.243.99]), 0.01000, this+I've, 0.01000, Subject*at, 0.01000, >>+It, 0.01000, joy, 0.01000, linux, 0.01000, clear+why, 0.01000, >>+Correct, 0.01000, st_nlink, 0.01000, that+calls, 0.01000, FreeBSD+versions, 0.01000, Hartland, 0.01000, resolve, 0.01000, of, 0.01000, stat+st_nlink, 0.01000, From*"Nagy, Attila" , 0.01000, User-Agent*Mozilla/5.0, 0.01000, that+>, 0.01000, X-Spambayes-Classification: ham; 0.00 Received: from [IPv6:::1] (japan.t-online.co.hu [195.228.243.99]) by dg.fsn.hu (Postfix) with ESMTPSA id 59AB420CE; Mon, 7 Mar 2016 09:21:05 +0100 (CET) Subject: Re: zfs and st_nlink limit at 32767 To: Don Lewis , killing@multiplay.co.uk References: <201603052316.u25NGOaT079417@gw.catspoiler.org> Cc: freebsd-fs@freebsd.org From: "Nagy, Attila" Message-ID: <56DD39F1.8040302@fsn.hu> Date: Mon, 7 Mar 2016 09:21:05 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: <201603052316.u25NGOaT079417@gw.catspoiler.org> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2016 08:21:15 -0000 Hi, On 03/06/16 00:16, Don Lewis wrote: > On 5 Mar, Steven Hartland wrote: >> Correct stat st_nlink is a nlink_t which is defined as uint16_t, its not >> clear why its clamping at what looks like int16_t max. >> >> It looks like the kernel version in nstat is a uint32_t so internally it >> should be correct. >> >> You may have some joy changing it to uint32_t but is likely everything >> will rebuilding and even then there may be some edge cases which break >> one that sticks out is linux compat support which doesn't use nlink_t. > Yeah, changing it would change the stat() ABI, so you would have to > recompile everything that calls stat(). Also the syscall would have to > be versioned so that executables built on previous FreeBSD versions > would still get the old version of struct stat. > > Something else to look out for is archive formats. It's possible that > nlinks is embedded in them. Breaking the ability to read your old > backup tapes would be a real bummer. In the hope that somebody will eventually resolve this, I've filed a bug report: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=207763 From owner-freebsd-fs@freebsd.org Mon Mar 7 05:07:02 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 26590AC2F31; Mon, 7 Mar 2016 05:07:02 +0000 (UTC) (envelope-from fred.fliu@gmail.com) Received: from mail-lb0-x22b.google.com (mail-lb0-x22b.google.com [IPv6:2a00:1450:4010:c04::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8E45A21C; Mon, 7 Mar 2016 05:07:01 +0000 (UTC) (envelope-from fred.fliu@gmail.com) Received: by mail-lb0-x22b.google.com with SMTP id k15so117597516lbg.0; Sun, 06 Mar 2016 21:07:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=eqNWEbnG4F0PhqztdQJB0RydrbkLOpX+3fztwzTCtkI=; b=LF5yQfs5tY+iniRnhnEzxVD5j1D35gcRu9uHbXVCg34I5+UHAn5O8RODsLjCbdg1aS xLDGPJlyuT+dbPQdL3gdKlyPDKrwLKhufxj7lgzp2vqXu/w9eIMZoyT7pg6u7jzqYwlm b0+xtW5GsxY1hK1GCx0OfH3dwajaEISkpQVzMRtFwMPjSqDkL+FOWQe2voWMSYRgmjr/ yzNs3j5E4HPJfyBYUvEjgrMlaa2hD7DhBtw6lZzze7qlgHANjTEF9m9F9DvvxiWpGZwV Qx8JdjUVpYf1eZ6ZGNa2V3d1WYaKkvXzAUMivq6dc+8FnCk48D/HMTAJjs3Mr7mP+OiK /rQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=eqNWEbnG4F0PhqztdQJB0RydrbkLOpX+3fztwzTCtkI=; b=ERkZwiEFbWCwk0LslH0NR83/d3V36/DgOQM4GOE75yL9jKzChp2JFxrpwScjPKsUM7 aDkWS4d7zt8ZP+VEtcASqnqi+WO6pv89Ep3EY/aoa/O9mQ0a1v1idejTVcZith3FoD1f dlrUH6ZKYT0TXGZLeFhUXnwsF6NXvJV95qCdEXLd2g+B67PIxoiK4hQS281dDSfsTADP OynSnQfkd0NLle9uiUyEKsoqpmv/cDyIRTAY2sw71kgsNtgLP0CE/Lde2CtCCmlShhwS SghrkfreUUmDwtM9U5DkSA9rsXBh0s5TTYTEoC9txvl5NE2MrGke7niSO8AHsMrh0TX9 dbeg== X-Gm-Message-State: AD7BkJLHdTDzza0QnFs5Ch+egjKBdbc1hQiHja9qA6PTFiI14NHLobBemDoc7Ejg6k0ewIaQtqNulMVeogcrAA== MIME-Version: 1.0 X-Received: by 10.25.161.131 with SMTP id k125mr7052392lfe.83.1457327219683; Sun, 06 Mar 2016 21:06:59 -0800 (PST) Received: by 10.25.20.164 with HTTP; Sun, 6 Mar 2016 21:06:59 -0800 (PST) In-Reply-To: <5158F354-9636-4031-9536-E99450F312B3@RichardElling.com> References: <95563acb-d27b-4d4b-b8f3-afeb87a3d599@me.com> <56D87784.4090103@broken.net> <5158F354-9636-4031-9536-E99450F312B3@RichardElling.com> Date: Mon, 7 Mar 2016 13:06:59 +0800 Message-ID: Subject: Re: [developer] Re: [smartos-discuss] an interesting survey -- the zpool with most disks you have ever built From: Fred Liu To: developer@lists.open-zfs.org Cc: "smartos-discuss@lists.smartos.org" , developer , illumos-developer , omnios-discuss , Discussion list for OpenIndiana , illumos-zfs , "zfs-discuss@list.zfsonlinux.org" , "freebsd-fs@FreeBSD.org" , "zfs-devel@freebsd.org" X-Mailman-Approved-At: Mon, 07 Mar 2016 12:08:47 +0000 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.21 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2016 05:07:02 -0000 2016-03-06 22:49 GMT+08:00 Richard Elling : > > On Mar 3, 2016, at 8:35 PM, Fred Liu wrote: > > Hi, > > Today when I was reading Jeff's new nuclear weapon -- DSSD D5's CUBIC RAI= D > introduction, > the interesting survey -- the zpool with most disks you have ever built > popped in my brain. > > > We test to 2,000 drives. Beyond 2,000 there are some scalability issues > that impact failover times. > We=E2=80=99ve identified these and know what to fix, but need a real cust= omer at > this scale to bump it to > the top of the priority queue. > > [Fred]: Wow! 2000 drives almost need 4~5 whole racks! > > For zfs doesn't support nested vdev, the maximum fault tolerance should b= e > three(from raidz3). > > > Pedantically, it is N, because you can have N-way mirroring. > [Fred]: Yeah. That is just pedantic. N-way mirroring of every disk works in theory and rarely happens in reality. > > It is stranded if you want to build a very huge pool. > > > Scaling redundancy by increasing parity improves data loss protection by > about 3 orders of > magnitude. Adding capacity by striping reduces data loss protection by > 1/N. This is why there is > not much need to go beyond raidz3. However, if you do want to go there, > adding raidz4+ is > relatively easy. > [Fred]: I assume you used stripped raidz3 vedvs in your storage mesh of 2000 drives. If that is true, the possibility of 4/2000 will be not so low. Plus, reslivering takes longer time if single disk has bigger capacity. And further, the cost of over-provisioning spare disks vs raidz4+ will be an deserved trade-off when the storage mesh at the scale of 2000 drives. Thanks. Fred > > > -- > > Richard.Elling@RichardElling.com > +1-760-896-4422 > > > > *openzfs-developer* | Archives > > | > Modify > > Your Subscription > From owner-freebsd-fs@freebsd.org Mon Mar 7 05:55:42 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3A651AC212A for ; Mon, 7 Mar 2016 05:55:42 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "vps1.elischer.org", Issuer "CA Cert Signing Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 18AD0ADB for ; Mon, 7 Mar 2016 05:55:42 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from julian-mbp3.pixel8networks.com (50-196-156-133-static.hfc.comcastbusiness.net [50.196.156.133]) (authenticated bits=0) by vps1.elischer.org (8.15.2/8.15.2) with ESMTPSA id u275tRle069295 (version=TLSv1.2 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Sun, 6 Mar 2016 21:55:28 -0800 (PST) (envelope-from julian@freebsd.org) Subject: Re: [zfs] an interesting survey -- the zpool with most disks you have ever built To: Fred Liu , illumos-zfs References: <95563acb-d27b-4d4b-b8f3-afeb87a3d599@me.com> <56D87784.4090103@broken.net> Cc: Discussion list for OpenIndiana , omnios-discuss , developer , "zfs-devel@freebsd.org" , illumos-developer , "freebsd-fs@FreeBSD.org" , "smartos-discuss@lists.smartos.org" , "zfs-discuss@list.zfsonlinux.org" From: Julian Elischer Message-ID: <56DD17CA.90200@freebsd.org> Date: Sun, 6 Mar 2016 21:55:22 -0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Mailman-Approved-At: Mon, 07 Mar 2016 12:36:04 +0000 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2016 05:55:42 -0000 On 6/03/2016 9:30 PM, Fred Liu wrote: > 2016-03-05 0:01 GMT+08:00 Freddie Cash : > >> On Mar 4, 2016 2:05 AM, "Fred Liu" wrote: >>> 2016-03-04 13:47 GMT+08:00 Freddie Cash : >>>> Currently, I just use a simple coordinate system. Columns are letters, >> rows are numbers. >>>> "smartos-discuss@lists.smartos.org" >> 、 >> developer 、 >> >> illumos-developer 、 >> >> omnios-discuss 、 >> >> Discussion list for OpenIndiana 、 >> >> illumos-zfs 、 >> >> "zfs-discuss@list.zfsonlinux.org" 、 >> >> "freebsd-fs@FreeBSD.org" 、 >> >> "zfs-devel@freebsd.org" >> >>>> Each disk is partitioned using GPT with the first (only) partition >> starting at 1 MB and covering the whole disk, and labelled with the >> column/row where it is located (disk-a1, disk-g6, disk-p3, etc). >>> [Fred]: So you manually pull off all the drives one by one to locate >> them? >> >> ​When putting the system together for the first time, I insert each disk >> one at a time, wait for it to be detected, partition it, then label it >> based on physical location.​ Then do the next one. It's just part of the >> normal server build process, whether it has 2 drives, 20 drives, or 200 >> drives. >> >> ​We build all our own servers from off-the-shelf parts; we don't buy >> anything pre-built from any of the large OEMs.​ >> > [Fred]: Gotcha! > > >>>> The pool is created using the GPT labels, so the label shows in "zpool >> list" output. >>> [Fred]: What will the output look like? >> ​From our smaller backups server, with just 24 drive bays: >> >> $ zpool status storage >> >> pool: storage >> >> state: ONLINE >> >> status: Some supported features are not enabled on the pool. The pool can >> >> still be used, but some features are unavailable. >> >> action: Enable all features using 'zpool upgrade'. Once this is done, >> >> the pool may no longer be accessible by software that does not support >> >> the features. See zpool-features(7) for details. >> >> scan: scrub canceled on Wed Feb 17 12:02:20 2016 >> >> config: >> >> >> NAME STATE READ WRITE CKSUM >> >> storage ONLINE 0 0 0 >> >> raidz2-0 ONLINE 0 0 0 >> >> gpt/disk-a1 ONLINE 0 0 0 >> >> gpt/disk-a2 ONLINE 0 0 0 >> >> gpt/disk-a3 ONLINE 0 0 0 >> >> gpt/disk-a4 ONLINE 0 0 0 >> >> gpt/disk-a5 ONLINE 0 0 0 >> >> gpt/disk-a6 ONLINE 0 0 0 >> >> raidz2-1 ONLINE 0 0 0 >> >> gpt/disk-b1 ONLINE 0 0 0 >> >> gpt/disk-b2 ONLINE 0 0 0 >> >> gpt/disk-b3 ONLINE 0 0 0 >> >> gpt/disk-b4 ONLINE 0 0 0 >> >> gpt/disk-b5 ONLINE 0 0 0 >> >> gpt/disk-b6 ONLINE 0 0 0 >> >> raidz2-2 ONLINE 0 0 0 >> >> gpt/disk-c1 ONLINE 0 0 0 >> >> gpt/disk-c2 ONLINE 0 0 0 >> >> gpt/disk-c3 ONLINE 0 0 0 >> >> gpt/disk-c4 ONLINE 0 0 0 >> >> gpt/disk-c5 ONLINE 0 0 0 >> >> gpt/disk-c6 ONLINE 0 0 0 >> >> raidz2-3 ONLINE 0 0 0 >> >> gpt/disk-d1 ONLINE 0 0 0 >> >> gpt/disk-d2 ONLINE 0 0 0 >> >> gpt/disk-d3 ONLINE 0 0 0 >> >> gpt/disk-d4 ONLINE 0 0 0 >> >> gpt/disk-d5 ONLINE 0 0 0 >> >> gpt/disk-d6 ONLINE 0 0 0 >> >> cache >> >> gpt/cache0 ONLINE 0 0 0 >> >> gpt/cache1 ONLINE 0 0 0 >> >> >> errors: No known data errors >> >> The 90-bay systems look the same, just that the letters go all the way to >> p (so disk-p1 through disk-p6). And there's one vdev that uses 3 drives >> from each chassis (7x 6-disk vdev only uses 42 drives of the 45-bay >> chassis, so there's lots of spares if using a single chassis; using two >> chassis, there's enough drives to add an extra 6-disk vdev). >> > [Fred]: It looks like the gpt label shown in "zpool status" only works in > FreeBSD/FreeNAS. Are you using FreeBSD/FreeNAS? I can't find the similar > possibilities in Illumos/Linux. Ah that's a trick.. FreeBSD exports an actual /dev/gpt/{you-label-goes-here} for each labeled partition it finds. So it's not ZFS doing anything special.. it's what FreeBSD is calling the partition. > > Thanks, > > Fred > >> *illumos-zfs* | Archives >> >> | >> Modify >> >> Your Subscription >> > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > > From owner-freebsd-fs@freebsd.org Mon Mar 7 06:04:13 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C4164AC23CF; Mon, 7 Mar 2016 06:04:13 +0000 (UTC) (envelope-from richard.elling@gmail.com) Received: from mail-pf0-x22c.google.com (mail-pf0-x22c.google.com [IPv6:2607:f8b0:400e:c00::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8EABFE31; Mon, 7 Mar 2016 06:04:13 +0000 (UTC) (envelope-from richard.elling@gmail.com) Received: by mail-pf0-x22c.google.com with SMTP id x188so49825700pfb.2; Sun, 06 Mar 2016 22:04:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:subject:from:in-reply-to:date:cc:message-id:references :to; bh=inFEdDfH3mpHfQnxUD98ZBD5iQPHBkT3TlVx8a+LMKc=; b=Qe7MXPeW1fHNefC6mKPlUO0myUWtk9rFy4YQsDEtJwBtcQrFESb8d5S6JXm39VCRTJ mlnuxqRC9xZ1AtYyLMHsaHOLa44TbY5A5T6fyYhPQzCnHPmtNk3a1ee0mxFycoTR6kNf XL77b304+ZLXPuo2BN1tFGONuG3GXReF+FhSbZlyLexyCkyItW/e2erD5m8Lwb7xx6JW xQXeTFEBBcYpVtZRGK9y2K37xNy3Rw+H71LP4WtKLwzzmryeKzxOsUs+5u/lliBKk+Cp 7+V3iYfxC7lBDZJp5LeWAFhk7I6pCHsYyCgfvP9EiCNjf5spGL7pPZ5z+gBiTxb84coq LKSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to; bh=inFEdDfH3mpHfQnxUD98ZBD5iQPHBkT3TlVx8a+LMKc=; b=iJkjJ8wNEZgFZp2mhIZWJ6cA0NySS9IiJ75rBcqsZAf2sZ2OgtEOeW3RZLuGhbB0z7 sc/BBuE66qbokd99UJVS9zz2QpSIt3EwxYY9Z6PXiueZBaYrtMgSan3Wz4po2l/0F0wk 7oyQRBjIpj+6Afo7cLQo10R5BwdvqJts4xxB9x7wq7KQAAOgAOx+WcGic3QZBjnYagg7 x7Fur6JT4+eMmA0nfLJhc2rvyMZ0nD/htAnKHgbd1kJz1vKCXOVHawjGoqHYnS0PO3Mg iEuOvm+ZQSJmH3Df4WFJ4ieDVWIubJrUI0C2wPtLaDQSIYcmzUOLSspCZlJLgJk/BXBw Z24A== X-Gm-Message-State: AD7BkJJTZyaK5yojyae5gPvbGxJ7paFF+cLs5n1WrlAJdoaib6IzrOVzWNosgZlrx380HQ== X-Received: by 10.98.75.196 with SMTP id d65mr30968996pfj.96.1457330652928; Sun, 06 Mar 2016 22:04:12 -0800 (PST) Received: from [192.168.129.108] ([162.250.162.10]) by smtp.gmail.com with ESMTPSA id n68sm21255445pfj.46.2016.03.06.22.04.10 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sun, 06 Mar 2016 22:04:11 -0800 (PST) Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\)) Subject: Re: [zfs] [developer] Re: [smartos-discuss] an interesting survey -- the zpool with most disks you have ever built From: Richard Elling In-Reply-To: Date: Sun, 6 Mar 2016 22:04:09 -0800 Cc: developer@lists.open-zfs.org, "smartos-discuss@lists.smartos.org" , developer , illumos-developer , omnios-discuss , Discussion list for OpenIndiana , "zfs-discuss@list.zfsonlinux.org" , "freebsd-fs@FreeBSD.org" , "zfs-devel@freebsd.org" Message-Id: <6E2B77D1-E0CA-4901-A6BD-6A22C07536B3@gmail.com> References: <95563acb-d27b-4d4b-b8f3-afeb87a3d599@me.com> <56D87784.4090103@broken.net> <5158F354-9636-4031-9536-E99450F312B3@RichardElling.com> To: zfs@lists.illumos.org X-Mailer: Apple Mail (2.3112) X-Mailman-Approved-At: Mon, 07 Mar 2016 12:36:29 +0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.21 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2016 06:04:14 -0000 > On Mar 6, 2016, at 9:06 PM, Fred Liu wrote: >=20 >=20 >=20 > 2016-03-06 22:49 GMT+08:00 Richard Elling = >: >=20 >> On Mar 3, 2016, at 8:35 PM, Fred Liu > wrote: >>=20 >> Hi, >>=20 >> Today when I was reading Jeff's new nuclear weapon -- DSSD D5's CUBIC = RAID introduction, >> the interesting survey -- the zpool with most disks you have ever = built popped in my brain. >=20 > We test to 2,000 drives. Beyond 2,000 there are some scalability = issues that impact failover times. > We=E2=80=99ve identified these and know what to fix, but need a real = customer at this scale to bump it to > the top of the priority queue. >=20 > [Fred]: Wow! 2000 drives almost need 4~5 whole racks!=20 >>=20 >> For zfs doesn't support nested vdev, the maximum fault tolerance = should be three(from raidz3). >=20 > Pedantically, it is N, because you can have N-way mirroring. > =20 > [Fred]: Yeah. That is just pedantic. N-way mirroring of every disk = works in theory and rarely happens in reality. >=20 >> It is stranded if you want to build a very huge pool. >=20 > Scaling redundancy by increasing parity improves data loss protection = by about 3 orders of=20 > magnitude. Adding capacity by striping reduces data loss protection by = 1/N. This is why there is > not much need to go beyond raidz3. However, if you do want to go = there, adding raidz4+ is=20 > relatively easy. >=20 > [Fred]: I assume you used stripped raidz3 vedvs in your storage mesh = of 2000 drives. If that is true, the possibility of 4/2000 will be not = so low. > Plus, reslivering takes longer time if single disk has = bigger capacity. And further, the cost of over-provisioning spare disks = vs raidz4+ will be an deserved=20 > trade-off when the storage mesh at the scale of 2000 = drives. Please don't assume, you'll just hurt yourself :-) For example, do not assume the only option is striping across raidz3 = vdevs. Clearly, there are many different options. -- richard From owner-freebsd-fs@freebsd.org Mon Mar 7 06:18:28 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 190ADAC2A17; Mon, 7 Mar 2016 06:18:28 +0000 (UTC) (envelope-from fred.fliu@gmail.com) Received: from mail-lb0-x229.google.com (mail-lb0-x229.google.com [IPv6:2a00:1450:4010:c04::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8388D85A; Mon, 7 Mar 2016 06:18:27 +0000 (UTC) (envelope-from fred.fliu@gmail.com) Received: by mail-lb0-x229.google.com with SMTP id k15so118818080lbg.0; Sun, 06 Mar 2016 22:18:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=hRaNaJ1s+015Wsux9tpeW+6Al4cZnb6HjDhJ5AJXYeM=; b=OaXBgVjsiH/6SXCth3iH46uTUuEaakhIpjW/vYBGuX4LmOLz+qk1nKovyk4fb5rVAQ TlyjFavLvNYAoTJ2CaePv04c4Oe46G+KO6nnbFwKborqlXnF0wK2YXLFSdY/+Ufb0gjE gAfMat1VRxrSnojZdSkTP5IGuHVS1pNWBWsw+GCVlht4Bp5Hm+NUd64hEefeC8/wLtYq assCmQnb3dGLzlFeSNv/lC8kDNsaQZl8ys9XTtsIimCA6h/OlRx5+csD49yflkuw7+5/ ezckM/6ya9hPI0QN6O3Ci+RmDkn+pOZEb+TT0CSbBtmsZvZvcOSp8IFYVDTC+3iw3jhj Gs2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=hRaNaJ1s+015Wsux9tpeW+6Al4cZnb6HjDhJ5AJXYeM=; b=KBPD8u9257eB0oijoVnlns2mRsyrEXFz3tAPpn+SbK8ekhiFue1IqZhfCLqEp8RIXW XQE3Q43JiVeRgMI0h7JEZ2ajBcH74hhISO+PQYI2at/GPBIPJ55NVqeTSIRaj8zrhhXE zdMBlyPggAJyJwSxHjck1COndXpgw3Zyg65TOOUn2v+FZg1aA7CgmyjshdD4S0F7YtjU dSNuZdqIdOY6yp6x6ehepgYBc7K+aqwpn1Wk/MGCEJZIqVJ6snvDnenUpgiru5cczlxg poErzc6iFXFBypTzadB8TBe7PgEZ46tAvaYPxzD4h52hOB081BBTDzmWM7ZaV12TYlct hP3w== X-Gm-Message-State: AD7BkJIjlRtLbTkstNgxxxpEK/uKwqo1Vr9xiYWvxzISwhloFda0No8u4EfY37nV5Y0i2en+j8kCwKzzc5Gmgg== MIME-Version: 1.0 X-Received: by 10.112.149.73 with SMTP id ty9mr5563688lbb.48.1457331505362; Sun, 06 Mar 2016 22:18:25 -0800 (PST) Received: by 10.25.20.164 with HTTP; Sun, 6 Mar 2016 22:18:24 -0800 (PST) In-Reply-To: <6E2B77D1-E0CA-4901-A6BD-6A22C07536B3@gmail.com> References: <95563acb-d27b-4d4b-b8f3-afeb87a3d599@me.com> <56D87784.4090103@broken.net> <5158F354-9636-4031-9536-E99450F312B3@RichardElling.com> <6E2B77D1-E0CA-4901-A6BD-6A22C07536B3@gmail.com> Date: Mon, 7 Mar 2016 14:18:24 +0800 Message-ID: Subject: Re: [zfs] [developer] Re: [smartos-discuss] an interesting survey -- the zpool with most disks you have ever built From: Fred Liu To: "smartos-discuss@lists.smartos.org" Cc: illumos-zfs , developer@lists.open-zfs.org, developer , illumos-developer , omnios-discuss , Discussion list for OpenIndiana , "zfs-discuss@list.zfsonlinux.org" , "freebsd-fs@FreeBSD.org" , "zfs-devel@freebsd.org" X-Mailman-Approved-At: Mon, 07 Mar 2016 12:42:35 +0000 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.21 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2016 06:18:28 -0000 2016-03-07 14:04 GMT+08:00 Richard Elling : > > On Mar 6, 2016, at 9:06 PM, Fred Liu wrote: > > > > 2016-03-06 22:49 GMT+08:00 Richard Elling < > richard.elling@richardelling.com>: > >> >> On Mar 3, 2016, at 8:35 PM, Fred Liu wrote: >> >> Hi, >> >> Today when I was reading Jeff's new nuclear weapon -- DSSD D5's CUBIC >> RAID introduction, >> the interesting survey -- the zpool with most disks you have ever built >> popped in my brain. >> >> >> We test to 2,000 drives. Beyond 2,000 there are some scalability issues >> that impact failover times. >> We=E2=80=99ve identified these and know what to fix, but need a real cus= tomer at >> this scale to bump it to >> the top of the priority queue. >> >> [Fred]: Wow! 2000 drives almost need 4~5 whole racks! > >> >> For zfs doesn't support nested vdev, the maximum fault tolerance should >> be three(from raidz3). >> >> >> Pedantically, it is N, because you can have N-way mirroring. >> > > [Fred]: Yeah. That is just pedantic. N-way mirroring of every disk works > in theory and rarely happens in reality. > >> >> It is stranded if you want to build a very huge pool. >> >> >> Scaling redundancy by increasing parity improves data loss protection by >> about 3 orders of >> magnitude. Adding capacity by striping reduces data loss protection by >> 1/N. This is why there is >> not much need to go beyond raidz3. However, if you do want to go there, >> adding raidz4+ is >> relatively easy. >> > > [Fred]: I assume you used stripped raidz3 vedvs in your storage mesh of > 2000 drives. If that is true, the possibility of 4/2000 will be not so lo= w. > Plus, reslivering takes longer time if single disk has bigger > capacity. And further, the cost of over-provisioning spare disks vs raidz= 4+ > will be an deserved > trade-off when the storage mesh at the scale of 2000 drives. > > > Please don't assume, you'll just hurt yourself :-) > For example, do not assume the only option is striping across raidz3 > vdevs. Clearly, there are many > different options. > [Fred]: Yeah. Assumptions always go far way from facts! ;-) Is designing a storage mesh with 2000 drives biz secret? Or it is just too complicate to elaborate? Never mind. ;-) Thanks. Fred > > *smartos-discuss* | Archives > > | > Modify > > Your Subscription > From owner-freebsd-fs@freebsd.org Mon Mar 7 14:31:53 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E4DC5AC087E; Mon, 7 Mar 2016 14:31:53 +0000 (UTC) (envelope-from ronald-lists@klop.ws) Received: from smarthost1.greenhost.nl (smarthost1.greenhost.nl [195.190.28.81]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id AF52E77D; Mon, 7 Mar 2016 14:31:53 +0000 (UTC) (envelope-from ronald-lists@klop.ws) Received: from smtp.greenhost.nl ([213.108.104.138]) by smarthost1.greenhost.nl with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from ) id 1acwCB-0008IZ-Ki; Mon, 07 Mar 2016 15:31:45 +0100 Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes To: freebsd-arm@freebsd.org, freebsd-fs@freebsd.org Subject: Re: Unstable NFS on recent CURRENT References: <3DAB3639-8FB8-43D3-9517-94D46EDEC19E@gromit.dlib.vt.edu> Date: Mon, 07 Mar 2016 15:31:37 +0100 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: "Ronald Klop" Message-ID: In-Reply-To: <3DAB3639-8FB8-43D3-9517-94D46EDEC19E@gromit.dlib.vt.edu> User-Agent: Opera Mail/1.0 (Win32) X-Authenticated-As-Hash: 398f5522cb258ce43cb679602f8cfe8b62a256d1 X-Virus-Scanned: by clamav at smarthost1.samage.net X-Spam-Level: / X-Spam-Score: 0.8 X-Spam-Status: No, score=0.8 required=5.0 tests=ALL_TRUSTED, BAYES_50, FUZZY_VPILL autolearn=disabled version=3.4.0 X-Scan-Signature: c09395f469c52153b963e4ff2d10f427 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2016 14:31:54 -0000 On Sun, 06 Mar 2016 02:57:03 +0100, Paul Mather wrote: > On my BeagleBone Black running 11-CURRENT (r296162) lately I have been > having trouble with NFS. I have been doing a buildworld and buildkernel > with /usr/src and /usr/obj mounted via NFS. Recently, this process has > resulted in the buildworld failing at some point, with a variety of > errors (Segmentation fault; Permission denied; etc.). Even a "ls -alR" > of /usr/src doesn't manage to complete. It errors out thus: > > ===== > [[...]] > total 0 > ls: ./.svn/pristine/fe: Permission denied > > ./.svn/pristine/ff: > total 0 > ls: ./.svn/pristine/ff: Permission denied > ls: fts_read: Permission denied > ===== > > On the console, I get the following: > > newnfs: server 'chumby.chumby.lan' error: fileid changed. fsid > 94790777:a4385de: expected fileid 0x4, got 0x2. (BROKEN NFS SERVER OR > MIDDLEWARE) > > > I am using a FreeBSD/amd64 10.3-PRERELEASE (r296412) as the NFS server. > On the BeagleBone Black, I am mounting /usr/src and /usr/obj via > /etc/fstab as follows: > > chumby.chumby.lan:/build/src/head /usr/src nfs rw,nfsv4 0 0 > chumby.chumby.lan:/build/obj/bbb /usr/obj nfs rw,nfsv4 0 0 > > > /build/src/head and /build/obj/bbb are both ZFS file systems. > > Has anyone else encountered this? It has only started happening > recently for me, it seems. Prior to this, I have been able to do a > buildworld and buildkernel successfully over NFS. > > Cheers, > > Paul. I cc this to freebsd-fs for you. Ronald. From owner-freebsd-fs@freebsd.org Mon Mar 7 16:15:03 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C3D83AC3B45 for ; Mon, 7 Mar 2016 16:15:03 +0000 (UTC) (envelope-from ken@kdm.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id A8DE06B1 for ; Mon, 7 Mar 2016 16:15:03 +0000 (UTC) (envelope-from ken@kdm.org) Received: by mailman.ysv.freebsd.org (Postfix) id A80A7AC3B44; Mon, 7 Mar 2016 16:15:03 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8D91AAC3B42; Mon, 7 Mar 2016 16:15:03 +0000 (UTC) (envelope-from ken@kdm.org) Received: from mithlond.kdm.org (mithlond.kdm.org [96.89.93.250]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "A1-33714", Issuer "A1-33714" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4D2B96AE; Mon, 7 Mar 2016 16:15:02 +0000 (UTC) (envelope-from ken@kdm.org) Received: from mithlond.kdm.org (localhost [127.0.0.1]) by mithlond.kdm.org (8.15.2/8.14.9) with ESMTPS id u27GEsSm004385 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 7 Mar 2016 11:14:54 -0500 (EST) (envelope-from ken@mithlond.kdm.org) Received: (from ken@localhost) by mithlond.kdm.org (8.15.2/8.14.9/Submit) id u27GEsD8004384; Mon, 7 Mar 2016 11:14:54 -0500 (EST) (envelope-from ken) Date: Mon, 7 Mar 2016 11:14:54 -0500 From: "Kenneth D. Merry" To: Rick Macklem Cc: fs@freebsd.org, scsi@freebsd.org, Robert Watson Subject: Re: FUSE extended attribute patches available Message-ID: <20160307161454.GA3501@mithlond.kdm.org> References: <800018199.6694281.1457233600357.JavaMail.zimbra@uoguelph.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <800018199.6694281.1457233600357.JavaMail.zimbra@uoguelph.ca> User-Agent: Mutt/1.5.23 (2014-03-12) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (mithlond.kdm.org [127.0.0.1]); Mon, 07 Mar 2016 11:14:55 -0500 (EST) X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS autolearn=ham autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on mithlond.kdm.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2016 16:15:04 -0000 On Sat, Mar 05, 2016 at 22:06:40 -0500, Rick Macklem wrote: > Ken Merry wrote: > > I have patches for FreeBSD???s FUSE filesystem kernel module to support > > extended attributes: > > > > https://people.freebsd.org/~ken/fuse_extattr.20160229.1.txt > > > The only bit of code I have that might be useful for this patch is: > case FUSE_GETXATTR: > case FUSE_LISTXATTR: > ! /* > ! * These can have varying response lengths, and 0 length > ! * isn't necessarily invalid. > ! */ > ! err = 0; > *** I came up with this: > fgin = (struct fuse_getxattr_in *) > ((char *)ftick->tk_ms_fiov.base + > sizeof(struct fuse_in_header)); > if (fgin->size == 0) > err = (blen == sizeof(struct fuse_getxattr_out)) ? 0 : > EINVAL; > else > err = (blen <= fgin->size) ? 0 : EINVAL; > break; > I think I got the size check right? I think that is correct, yes. > The big question is... > What to do with the NAMESPACE? > - My code fails for SYSTEM and does USER without prepending "user.". > (That seemed to be what rwatson@ felt was reasonable. I thought our > discussion was on a mailing list, but I can't find it.) > I've cc'd him. Maybe he can comment again. IBM's LTFS at least seems to require the "user." prefix on Linux. For context, this code supports Windows, Linux and MacOS X. So the "#else" case is Linux. Here's the code in question: /** * Strip a Linux namespace prefix from the given xattr name and return the position of the suffix. * If the name is "user.X", return the "X" portion. Otherwise, return an error. * This function does nothing on Mac OS X. * @param name Name to strip. * @return A pointer to the name suffix, or NULL to indicate an invalid name. On Mac OS X, * always returns @name. */ const char *_xattr_strip_name(const char *name) { #if (defined (__APPLE__) || defined (mingw_PLATFORM)) return name; #else if (strstr(name, "user.") == name) return name + 5; else return NULL; #endif } I can certainly change it to do whatever is the correct answer on FreeBSD. It looks like for FUSE with MacOS and Windows, they expect just the attribute name without a namespace prefix. > - If you stick with prepending "user." or "system." there needs to be > some way to bypass this so that attributes that don't start in "user." > or "system." can be accessed. I've seen "trusted." and "glusterfs." > on GlusterFS. > --> Maybe a new namespace called something like "nil" that just bypasses > any USER or SYSTEM checks? > I'll respond to rwatson's email on this part... Ken -- Kenneth Merry ken@FreeBSD.ORG From owner-freebsd-fs@freebsd.org Mon Mar 7 16:19:50 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E8CBEAC3DA8 for ; Mon, 7 Mar 2016 16:19:49 +0000 (UTC) (envelope-from ken@kdm.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id CC05DB79 for ; Mon, 7 Mar 2016 16:19:49 +0000 (UTC) (envelope-from ken@kdm.org) Received: by mailman.ysv.freebsd.org (Postfix) id C43E1AC3DA6; Mon, 7 Mar 2016 16:19:49 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C3AEAAC3DA4; Mon, 7 Mar 2016 16:19:49 +0000 (UTC) (envelope-from ken@kdm.org) Received: from mithlond.kdm.org (mithlond.kdm.org [96.89.93.250]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "A1-33714", Issuer "A1-33714" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 77301B75; Mon, 7 Mar 2016 16:19:49 +0000 (UTC) (envelope-from ken@kdm.org) Received: from mithlond.kdm.org (localhost [127.0.0.1]) by mithlond.kdm.org (8.15.2/8.14.9) with ESMTPS id u27GJlr8004486 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 7 Mar 2016 11:19:47 -0500 (EST) (envelope-from ken@mithlond.kdm.org) Received: (from ken@localhost) by mithlond.kdm.org (8.15.2/8.14.9/Submit) id u27GJl2A004485; Mon, 7 Mar 2016 11:19:47 -0500 (EST) (envelope-from ken) Date: Mon, 7 Mar 2016 11:19:47 -0500 From: "Kenneth D. Merry" To: Julian Elischer Cc: Rick Macklem , Robert Watson , fs@freebsd.org, scsi@freebsd.org Subject: Re: FUSE extended attribute patches available Message-ID: <20160307161947.GB3501@mithlond.kdm.org> References: <800018199.6694281.1457233600357.JavaMail.zimbra@uoguelph.ca> <56DD2AB6.1030407@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <56DD2AB6.1030407@freebsd.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (mithlond.kdm.org [127.0.0.1]); Mon, 07 Mar 2016 11:19:47 -0500 (EST) X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS autolearn=ham autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on mithlond.kdm.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2016 16:19:50 -0000 On Sun, Mar 06, 2016 at 23:16:06 -0800, Julian Elischer wrote: > On 5/03/2016 7:06 PM, Rick Macklem wrote: > > Ken Merry wrote: > >> I have patches for FreeBSD???s FUSE filesystem kernel module to support > >> extended attributes: > oh showing off your masochistic side eh? I suppose so; it was rather slow going to figure out the interface. More documentation on the interface would be helpful. Even a more clearly structured interface would be helpful. > >> https://people.freebsd.org/~ken/fuse_extattr.20160229.1.txt > >> > I spent an hour beating my head against fuse yesterday. > then realised that it's an old version on our product. We really have > to get off 8.0 > (hopefully a matter of weeks now to a 10.x switch) > Now all I need is to find a FreeBSD filesystem expert > (ZFS/NFS/CIFS/GFS) to hire. I'm sure you know a few, you'll just have to persuade someone. > > The only bit of code I have that might be useful for this patch is: > > case FUSE_GETXATTR: > > case FUSE_LISTXATTR: > > ! /* > > ! * These can have varying response lengths, and 0 length > > ! * isn't necessarily invalid. > > ! */ > > ! err = 0; > > *** I came up with this: > > fgin = (struct fuse_getxattr_in *) > > ((char *)ftick->tk_ms_fiov.base + > > sizeof(struct fuse_in_header)); > > if (fgin->size == 0) > > err = (blen == sizeof(struct fuse_getxattr_out)) ? 0 : > > EINVAL; > > else > > err = (blen <= fgin->size) ? 0 : EINVAL; > > break; > > I think I got the size check right? > > > > The big question is... > > What to do with the NAMESPACE? > > - My code fails for SYSTEM and does USER without prepending "user.". > > (That seemed to be what rwatson@ felt was reasonable. I thought our > > discussion was on a mailing list, but I can't find it.) > > I've cc'd him. Maybe he can comment again. > Is there a standard for extended attributes I should knwo about? > It seems to me that it's a bit like the wild west. > Extended attributes seem to be "every OS for himself". It does appear to be somewhat OS-dependent. Ken -- Kenneth Merry ken@FreeBSD.ORG From owner-freebsd-fs@freebsd.org Mon Mar 7 16:44:12 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 09F41AC38CD for ; Mon, 7 Mar 2016 16:44:12 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id EEF79BD for ; Mon, 7 Mar 2016 16:44:11 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u27GiBoX058883 for ; Mon, 7 Mar 2016 16:44:11 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 176449] zfs(1): ZFS NFS export went wrong with special hostname character Date: Mon, 07 Mar 2016 16:44:11 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: bin X-Bugzilla-Version: 9.1-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: eborisch+FreeBSD@gmail.com X-Bugzilla-Status: In Progress X-Bugzilla-Resolution: X-Bugzilla-Priority: Normal X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2016 16:44:12 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D176449 eborisch+FreeBSD@gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |eborisch+FreeBSD@gmail.com --- Comment #4 from eborisch+FreeBSD@gmail.com --- Duplicate of bug #168158? --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Mon Mar 7 20:55:48 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 368BCAC2594; Mon, 7 Mar 2016 20:55:48 +0000 (UTC) (envelope-from lslusser@gmail.com) Received: from mail-vk0-x22c.google.com (mail-vk0-x22c.google.com [IPv6:2607:f8b0:400c:c05::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id DCE601CC6; Mon, 7 Mar 2016 20:55:47 +0000 (UTC) (envelope-from lslusser@gmail.com) Received: by mail-vk0-x22c.google.com with SMTP id c3so131819638vkb.3; Mon, 07 Mar 2016 12:55:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=DAhuTpXsSg6UCNBJKnagMJx92BEndeEQ0wRXhWRr/9o=; b=kOQWYCAWLGZ/k0dYMPNZDS1CMXMiL7MGjFIsvCSPnuWZcvZos6SqJFhg9l8TA9NxRM RYjf8cHlrYGFoBEOtiJKH8gguoulSrFw8DRGzd0lHvHg+WQ94W/vEALijLQvV3LvGX+S 0f2AwOhVAVDJ0gYnuuPn+g3qTXra1Bk7auF6r52RA634XmvA7erO0tzuUbGBC7OJ34lt QCSBkuy9h8GrLAax23XSiFU5yCH24RHpmsqHTd6JQqEp0eaxlmHF108UTei8H1AIm+k1 7GxNEUqg/FhngQt8lAwSz3geAtIGaNY13agGJshbDMWKp8yjwG88d2V8YMXJRicUNVlz djaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=DAhuTpXsSg6UCNBJKnagMJx92BEndeEQ0wRXhWRr/9o=; b=i/tWMkf2vl4Q0DVj+ikWF0O9vhczKg6FWkrxI5e6WD3HgxMOY03yL9yxwJ8JGZfvIn BeYjXA8Maa9wbmqObHZnQsKMEfnCORAMLxziIuzU3rl6rJ8NUs5JOtg5cLVa678PR9XH ZSSSE/rqPVQe6wHlSAPVJ5vE4wCkOlhUOJE7ipoq4vU/oPBmvb8RekQIaGeMjE7h+G7v mkZhpSUpubB6FbTYvDRaJT1e3iATCFS8ZafzL4TPPbf2ptLTEkcD+npoIXsodRh9/fO4 NLuqoH3wk8vlVAv0GC50qVgqUc+x1WSuvV6cI689n8oh59KGlGCXilFQw3VZuEm1nLrS XYJg== X-Gm-Message-State: AD7BkJLIjb4LJEWKPqsH2IZtPmrLHwaLBwEa3Gdw8OkJyxJUSwX4lg+Gm3piTZq0OFV4WGHx4zKlLAZgy8dtVA== MIME-Version: 1.0 X-Received: by 10.31.174.23 with SMTP id x23mr20192994vke.136.1457384146585; Mon, 07 Mar 2016 12:55:46 -0800 (PST) Received: by 10.176.1.166 with HTTP; Mon, 7 Mar 2016 12:55:46 -0800 (PST) In-Reply-To: References: <95563acb-d27b-4d4b-b8f3-afeb87a3d599@me.com> <56D87784.4090103@broken.net> <5158F354-9636-4031-9536-E99450F312B3@RichardElling.com> <6E2B77D1-E0CA-4901-A6BD-6A22C07536B3@gmail.com> Date: Mon, 7 Mar 2016 12:55:46 -0800 Message-ID: Subject: Re: [zfs] [developer] Re: [smartos-discuss] an interesting survey -- the zpool with most disks you have ever built From: Liam Slusser To: zfs@lists.illumos.org Cc: "smartos-discuss@lists.smartos.org" , developer@lists.open-zfs.org, developer , illumos-developer , omnios-discuss , Discussion list for OpenIndiana , "zfs-discuss@list.zfsonlinux.org" , "freebsd-fs@FreeBSD.org" , "zfs-devel@freebsd.org" X-Mailman-Approved-At: Mon, 07 Mar 2016 21:59:51 +0000 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.21 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2016 20:55:48 -0000 I don't have a 2000 drive array (thats amazing!) but I do have two 280 drive arrays which are in production. Here are the generic stats: server setup: OpenIndiana oi_151 1 server rack Dell r720xd 64g ram with mirrored 250g boot disks 5 x LSI 9207-8e dualport SAS pci-e host bus adapters Intel 10g fibre ethernet (dual port) 2 x SSD for log cache 2 x SSD for cache 23 x Dell MD1200 with 3T,4T, or 6T NLSAS disks (a mix of Toshiba, Western Digital, and Seagate drives - basically whatever Dell sends) zpool setup: 23 x 12-disk raidz2 glued together. 276 total disks. Basically each new 12 disk MD1200 is a new raidz2 added to the pool. Total size: ~797T We have an identical server which we replicate changes via zfs snapshots every few minutes. The whole setup as been up and running for a few years now, no issues. As we run low on space we purchase two additional MD1200 shelfs (one for each system) and add the new raidz2 into pool on-the-fly. The only real issues we've had is sometimes a disk fails in such a way (think Monty Python and the holy grail i'm not dead yet) where the disk hasn't failed but is timing out and slows the whole array to a standstill until we can manual find and remove the disk. Other problems are once a disk has been replaced sometimes the resilver process can take an eternity. We have also found the snapshot replication process can interfere with the resilver process - resilver gets stuck at 99% and never ends - so we end up stopping or only doing one replication a day until the resilver process is done. The last helpful hint I have was lowering all the drive timeouts, see http://everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-o= n-solaris-or-openindiana/ for info. thanks, liam On Sun, Mar 6, 2016 at 10:18 PM, Fred Liu wrote: > > > 2016-03-07 14:04 GMT+08:00 Richard Elling : > >> >> On Mar 6, 2016, at 9:06 PM, Fred Liu wrote: >> >> >> >> 2016-03-06 22:49 GMT+08:00 Richard Elling < >> richard.elling@richardelling.com>: >> >>> >>> On Mar 3, 2016, at 8:35 PM, Fred Liu wrote: >>> >>> Hi, >>> >>> Today when I was reading Jeff's new nuclear weapon -- DSSD D5's CUBIC >>> RAID introduction, >>> the interesting survey -- the zpool with most disks you have ever built >>> popped in my brain. >>> >>> >>> We test to 2,000 drives. Beyond 2,000 there are some scalability issues >>> that impact failover times. >>> We=E2=80=99ve identified these and know what to fix, but need a real cu= stomer at >>> this scale to bump it to >>> the top of the priority queue. >>> >>> [Fred]: Wow! 2000 drives almost need 4~5 whole racks! >> >>> >>> For zfs doesn't support nested vdev, the maximum fault tolerance should >>> be three(from raidz3). >>> >>> >>> Pedantically, it is N, because you can have N-way mirroring. >>> >> >> [Fred]: Yeah. That is just pedantic. N-way mirroring of every disk works >> in theory and rarely happens in reality. >> >>> >>> It is stranded if you want to build a very huge pool. >>> >>> >>> Scaling redundancy by increasing parity improves data loss protection b= y >>> about 3 orders of >>> magnitude. Adding capacity by striping reduces data loss protection by >>> 1/N. This is why there is >>> not much need to go beyond raidz3. However, if you do want to go there, >>> adding raidz4+ is >>> relatively easy. >>> >> >> [Fred]: I assume you used stripped raidz3 vedvs in your storage mesh of >> 2000 drives. If that is true, the possibility of 4/2000 will be not so l= ow. >> Plus, reslivering takes longer time if single disk has bigger >> capacity. And further, the cost of over-provisioning spare disks vs raid= z4+ >> will be an deserved >> trade-off when the storage mesh at the scale of 2000 drives. >> >> >> Please don't assume, you'll just hurt yourself :-) >> For example, do not assume the only option is striping across raidz3 >> vdevs. Clearly, there are many >> different options. >> > > [Fred]: Yeah. Assumptions always go far way from facts! ;-) Is designing > a storage mesh with 2000 drives biz secret? Or it is just too complicate = to > elaborate? > Never mind. ;-) > > Thanks. > > Fred > > >> >> > *illumos-zfs* | Archives > > | > Modify > > Your Subscription > From owner-freebsd-fs@freebsd.org Mon Mar 7 22:28:23 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C68FAAC7C67 for ; Mon, 7 Mar 2016 22:28:23 +0000 (UTC) (envelope-from ken@freebsd.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id B1B5B818 for ; Mon, 7 Mar 2016 22:28:23 +0000 (UTC) (envelope-from ken@freebsd.org) Received: by mailman.ysv.freebsd.org (Postfix) id AD372AC7C65; Mon, 7 Mar 2016 22:28:23 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 94305AC7C64; Mon, 7 Mar 2016 22:28:23 +0000 (UTC) (envelope-from ken@freebsd.org) Received: from mithlond.kdm.org (mithlond.kdm.org [96.89.93.250]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "A1-33714", Issuer "A1-33714" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 65C9C817; Mon, 7 Mar 2016 22:28:23 +0000 (UTC) (envelope-from ken@freebsd.org) Received: from [10.0.0.27] (mbp2013-wired.int.kdm.org [10.0.0.27]) (authenticated bits=0) by mithlond.kdm.org (8.15.2/8.14.9) with ESMTPSA id u27MSGvD009626 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 7 Mar 2016 17:28:21 -0500 (EST) (envelope-from ken@freebsd.org) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\)) Subject: Re: FUSE extended attribute patches available From: Ken Merry In-Reply-To: <6AF0FC23-CC34-43EA-A008-9FB82FB21558@FreeBSD.org> Date: Mon, 7 Mar 2016 17:28:16 -0500 Cc: Julian Elischer , Rick Macklem , fs@freebsd.org, scsi@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <800018199.6694281.1457233600357.JavaMail.zimbra@uoguelph.ca> <56DD2AB6.1030407@freebsd.org> <6AF0FC23-CC34-43EA-A008-9FB82FB21558@FreeBSD.org> To: Robert Watson X-Mailer: Apple Mail (2.3112) X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (mithlond.kdm.org [96.89.93.250]); Mon, 07 Mar 2016 17:28:21 -0500 (EST) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2016 22:28:23 -0000 > On Mar 7, 2016, at 2:59 AM, Robert Watson wrote: >=20 > FreeBSD and Linux=E2=80=99s extended-attribute models were inherited = from IRIX, as they were introduced to solve the same problems: a place = to metadata such as ACLs, MAC labels, capability masks, etc. IRIX had = three namespaces: one each for =E2=80=9Cuser=E2=80=9D, =E2=80=9Croot=E2=80= =9D, and =E2=80=9Csecure=E2=80=9D, reflecting whether or not they were = managed by the file owner (or permissions), the privileged root user, or = part of the TCB protection mechanism (e.g., for integrity labels). >=20 > These extended attributes should not be confused with the filesystem = feature of the same name in NFSv4, which is sometimes known by the name = =E2=80=9Cfile fork=E2=80=9D or =E2=80=9Cdata streams=E2=80=9D. EAs in = IRIX/FreeBSD/Linux/HPFS/etc are tuple pairs of names and values intended = to be written atomically or updated in place specifically for (shortish) = metadata such as ACLs, rather than being complete separate data spaces = for I/O (e.g., that could be memory mapped). It would be nice to have NFSv4 / Solaris style alternate data streams. = ZFS handles them already, but I suppose it would take more work to = support them in UFS. > In FreeBSD=E2=80=99s design, we incorporated the disjoint namespace = model, providing USER and SYSTEM, the former being managed by the file = owner (and those given suitable permission), and the latter being used = for TCB mechanisms such as the implementations of MAC labels, ACLs, etc. >=20 > In Linux, they adopted a more free-form mechanism based on a single = combined namespace with a prefix =E2=80=94 e.g., user.FOO, and = system.BAR. Over time it looks like that namespace has been expanded in = various filesystem-specific ways. We also have room to expand our = namespace, but from the description below, it=E2=80=99s not clear quite = what the right mechanism is. >=20 > One path would be to introduce a new namespace for filesystem-specific = attributes =E2=80=94 e.g., EXTATTR_NAMESPACE_FS? >=20 > But I think the key question here is whether the existing namespaces = can provide the semantics you need. If not, then we likely need a new = namespace. But then we get the question as to who controls use of the = namespace. Certainly =E2=80=9Cthe filesystem=E2=80=9D is one option, but = then you will get inconsistency in approaches between filesystems and = applications =E2=80=94 across various dimensions including protection = (who can read/modify them?), allocation (who decides what names should = be used for what?), and semantics (what applications can use them, and = who backs them up?). >=20 > For example: who should be responsible for backing up those = attributes? For =E2=80=98system=E2=80=99 attributes in FreeBSD, it is = assumed that backup tools will be aware of the services layered over the = attributes =E2=80=94 e.g., that they will back up ACLs using the ACL = API, rather than backing up the binary EAs holding the ACLs. For = =E2=80=98user=E2=80=99 attributes, it is assumed that backup tools = (e.g., tar) must explicitly preserve them, since they are user-defined = and user-managed. For filesystem-specific attributes, some other choice = will need to be made =E2=80=94 perhaps filesystem-specific backup tools = need to know about them? >=20 > Note that in the Linux EA model, ACLs are actually accessed via the EA = system calls, whereas in FreeBSD, ACLs are a first-class citizen in the = system-call API/ABI, and so user applications don=E2=80=99t treat them = as EAs. We made that choice as filesystems may choose themselves not to = represent ACLs as EAs, and they have real semantics visible to the VFS = layer. In Linux, I believe they chose to pass them via EAs to narrow the = system-call interface for filesystem metadata. Both are legitimate = choices, but this could also trigger discussions about whether new = attributes are best accessed via the EA interface, or new system calls. = For filesystem-specific attributes, EAs are likely the better way to go. It may be that for at least the purposes of FUSE, we can adequately live = under the USER namespace. That would allow for arbitrary namespaces = that Linux-centric filesystems create without significant churn in = FreeBSD to support it. And of course this is only for the front/top end of a FUSE filesystem. = What the filesystem actually does with the extended attributes that the = user sets on top is another question altogether. In the case of IBM=E2=80= =99s LTFS, it stores extended attributes (without the =E2=80=9Cuser.=E2=80= =9D prefix) in the LTFS index, which is an XML file that resides on = tape. For other filesystems, the answer could also vary significantly. = A few that I examined in sysutils/fusefs* used extended attributes on = the backend (usually on a backing filesystem) under Linux only, but not = on the front (user facing) end. In order to make arbitrary namespaces in FUSE work in FreeBSD under the = user namespace, we=E2=80=99ll have to do what Rick was talking about and = just not include the namespace as a prefix when we get/set attributes. = This will allow using any sort of namespace or attribute name that the = FUSE filesystem wants to use. The impact of this, from a porting standpoint, is that the FUSE = filesystems will have to know that on FreeBSD, they cannot/should not = expect to see the =E2=80=9Cuser.=E2=80=9D namespace prefix, but they = might see other namespace prefixes. I took a look at the way LTFS and Gluster work with respect to extended = attributes with MacOS, since it seems that is how MacOS works, and = it=E2=80=99s less obvious to me what is going on with Gluster. = They=E2=80=99ve got this function: #ifdef GF_DARWIN_HOST_OS static int set_xattr_user_namespace_mode (struct posix_private *priv, const char = *str) { if (strcmp (str, "none") =3D=3D 0) priv->xattr_user_namespace =3D XATTR_NONE; else if (strcmp (str, "strip") =3D=3D 0) priv->xattr_user_namespace =3D XATTR_STRIP; else if (strcmp (str, "append") =3D=3D 0) priv->xattr_user_namespace =3D XATTR_APPEND; else if (strcmp (str, "both") =3D=3D 0) priv->xattr_user_namespace =3D XATTR_BOTH; else return -1; return 0; } #endif =20 Although it=E2=80=99s not clear that they do anything with values other = than XATTR_STRIP.=20 With LTFS, since they either assume a =E2=80=9Cuser.=E2=80=9D prefix on = Linux, or no prefix on Windows and MacOS X, it=E2=80=99s more = straightforward. Ken >=20 > Robert >=20 >> On 7 Mar 2016, at 07:16, Julian Elischer wrote: >>=20 >> On 5/03/2016 7:06 PM, Rick Macklem wrote: >>> Ken Merry wrote: >>>> I have patches for FreeBSD=E2=80=99s FUSE filesystem kernel module = to support >>>> extended attributes: >> oh showing off your masochistic side eh? >>=20 >>>> https://people.freebsd.org/~ken/fuse_extattr.20160229.1.txt >>>>=20 >> I spent an hour beating my head against fuse yesterday. >> then realised that it's an old version on our product. We really have = to get off 8.0 >> (hopefully a matter of weeks now to a 10.x switch) >> Now all I need is to find a FreeBSD filesystem expert = (ZFS/NFS/CIFS/GFS) to hire. >>=20 >>=20 >>> The only bit of code I have that might be useful for this patch is: >>> case FUSE_GETXATTR: >>> case FUSE_LISTXATTR: >>> ! /* >>> ! * These can have varying response lengths, and 0 length >>> ! * isn't necessarily invalid. >>> ! */ >>> ! err =3D 0; >>> *** I came up with this: >>> fgin =3D (struct fuse_getxattr_in *) >>> ((char *)ftick->tk_ms_fiov.base + >>> sizeof(struct fuse_in_header)); >>> if (fgin->size =3D=3D 0) >>> err =3D (blen =3D=3D sizeof(struct = fuse_getxattr_out)) ? 0 : >>> EINVAL; >>> else >>> err =3D (blen <=3D fgin->size) ? 0 : EINVAL; >>> break; >>> I think I got the size check right? >>>=20 >>> The big question is... >>> What to do with the NAMESPACE? >>> - My code fails for SYSTEM and does USER without prepending "user.". >>> (That seemed to be what rwatson@ felt was reasonable. I thought our >>> discussion was on a mailing list, but I can't find it.) >>> I've cc'd him. Maybe he can comment again. >> Is there a standard for extended attributes I should knwo about? >> It seems to me that it's a bit like the wild west. >> Extended attributes seem to be "every OS for himself". >>=20 >>>=20 >>> - If you stick with prepending "user." or "system." there needs to = be >>> some way to bypass this so that attributes that don't start in = "user." >>> or "system." can be accessed. I've seen "trusted." and "glusterfs." >>> on GlusterFS. >>> --> Maybe a new namespace called something like "nil" that just = bypasses >>> any USER or SYSTEM checks? >>>=20 >>> rick >>>=20 >>>> The patch implements the get/set/delete/list extended attribute = methods. The >>>> listing code also converts extended attribute lists from the = Linux/FUSE >>>> format to the FreeBSD format. For example: >>>>=20 >>>> # touch foo >>>> # ls -la foo >>>> -rwxrwxrwx 1 root wheel 0 Feb 29 21:40 foo >>>> # lsextattr user foo >>>> foo >>>> # setextattr user testattr1 "12345678" foo >>>> # lsextattr user foo >>>> foo testattr1 >>>> # getextattr user testattr1 foo >>>> foo 12345678 >>>> # setextattr user testattr2 "87654321" foo >>>> # lsextattr user foo >>>> foo testattr2 testattr1 >>>> # rmextattr user testattr1 foo >>>> # lsextattr user foo >>>> foo testattr2 >>>> # getextattr user testattr1 foo >>>> getextattr: foo: failed: Attribute not found >>>> # getextattr user testattr2 foo >>>> foo 87654321 >>>>=20 >>>>=20 >>>> Just to be clear on what this does, it only provides extended = attribute >>>> support to FreeBSD applications if the underlying FUSE filesystem = implements >>>> FUSE extended attribute support. Many FUSE filesystems don=E2=80=99t= support the >>>> extended attribute VFS operations. >>>>=20 >>>> I have tested this out on IBM=E2=80=99s LTFS implementation, but I = have not yet found >>>> another FUSE filesystem that supports extended attributes. If = anyone knows >>>> of one, please let me know so I can try it out. (I looked through = a number >>>> of the filesystems in sysutils/fusefs* in the ports tree.) >>>>=20 >>>> Any feedback is welcome. I=E2=80=99m planning to check this into = FreeBSD/head in the >>>> next week or so. >>>>=20 >>>> Obviously, I=E2=80=99ve also ported IBM=E2=80=99s LTFS = implementation to FreeBSD. It works >>>> in the standard FUSE mode, and you can also link it into an = application as a >>>> library if you don=E2=80=99t want to incur the overhead of running = through FUSE. I >>>> haven=E2=80=99t gotten around to packaging it up to go out for = testing / review. >>>>=20 >>>> If anyone has IBM LTO-5 or newer tape drives, or IBM TS1140 or = newer tape >>>> drives, and wants to try it out, let me know. I=E2=80=99ll send = you the code when >>>> I=E2=80=99ve got it at least somewhat ready. This is IBM-specific, = and won=E2=80=99t work >>>> on HP tape drives. >>>>=20 >>>> Ken >>>> =E2=80=94 >>>> Ken Merry >>>> ken@FreeBSD.ORG >>>>=20 >>>>=20 >>>>=20 >>>> _______________________________________________ >>>> freebsd-fs@freebsd.org mailing list >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>> To unsubscribe, send any mail to = "freebsd-fs-unsubscribe@freebsd.org" >>> _______________________________________________ >>> freebsd-fs@freebsd.org mailing list >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> To unsubscribe, send any mail to = "freebsd-fs-unsubscribe@freebsd.org" >>>=20 >>>=20 >>=20 >=20 =E2=80=94=20 Ken Merry ken@FreeBSD.ORG From owner-freebsd-fs@freebsd.org Tue Mar 8 02:39:17 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E4461AC27B7 for ; Tue, 8 Mar 2016 02:39:17 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id C31F49CC for ; Tue, 8 Mar 2016 02:39:17 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: by mailman.ysv.freebsd.org (Postfix) id BA18BAC27B4; Tue, 8 Mar 2016 02:39:17 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9F6ABAC27B2; Tue, 8 Mar 2016 02:39:17 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id E57639C7; Tue, 8 Mar 2016 02:39:16 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) IronPort-PHdr: 9a23:Y3V1BhLIKxFVs8tnhNmcpTZWNBhigK39O0sv0rFitYgVKPTxwZ3uMQTl6Ol3ixeRBMOAu60C27Cd7/2ocFdDyKjCmUhKSIZLWR4BhJdetC0bK+nBN3fGKuX3ZTcxBsVIWQwt1Xi6NU9IBJS2PAWK8TWM5DIfUi/yKRBybrysXNWC0ILnjavuptX6WEZhunmUWftKNhK4rAHc5IE9oLBJDeIP8CbPuWZCYO9MxGlldhq5lhf44dqsrtY4q3wD89pozcNLUL37cqIkVvQYSW1+ayFmrPDtrgTJGAuT+mMHACJRlhtTHxOD4gv3U53qvm39rOU63SCbOcj/S/cwWC++7qFlT1jmkioKPSU1tW/M2fB32YFWplqEqgZl0saAY4yTHPRkc67XZt9cQnBOCJV/TStEV7m9ZIhHKuMKPuJVqsGpvV4Hphi6CAyEGeTg1zJMnn+w1qRsgLdpKh3PwAF1R4FGi3/Tttigcf5KCe0= X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DQAQCeOt5W/61jaINcFoN2bQa6QgENgWkXCoUkSgKBbBQBAQEBAQEBAWMngi2CFAEBAQMBAQEBIAQnHQMLEAIBCBgCAg0ZAgInAQkYAQ0CBAgHBAEcAgKHewgOrxePOAEBAQEBBQEBAQEBARp7hRyBd4FJfYQBGgEBG4MCgToFh1iFWHQ9iEmFY4JwgjKETEuDeYMlhS6OUwIeAQFChAIeLgEBAQSIRjR+AQEB X-IronPort-AV: E=Sophos;i="5.22,554,1449550800"; d="scan'208";a="269546410" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 07 Mar 2016 21:39:08 -0500 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id D1A2515F56D; Mon, 7 Mar 2016 21:39:08 -0500 (EST) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id N14dMD25e0Sp; Mon, 7 Mar 2016 21:39:07 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 5E0F315F571; Mon, 7 Mar 2016 21:39:07 -0500 (EST) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 9Lr9r1Ia1djW; Mon, 7 Mar 2016 21:39:07 -0500 (EST) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 3C7BD15F56D; Mon, 7 Mar 2016 21:39:07 -0500 (EST) Date: Mon, 7 Mar 2016 21:39:07 -0500 (EST) From: Rick Macklem To: Ken Merry Cc: Robert Watson , Julian Elischer , fs@freebsd.org, scsi@freebsd.org Message-ID: <436595384.8930140.1457404747058.JavaMail.zimbra@uoguelph.ca> In-Reply-To: References: <800018199.6694281.1457233600357.JavaMail.zimbra@uoguelph.ca> <56DD2AB6.1030407@freebsd.org> <6AF0FC23-CC34-43EA-A008-9FB82FB21558@FreeBSD.org> Subject: Re: FUSE extended attribute patches available MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [172.17.95.10] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF44 (Win)/8.0.9_GA_6191) Thread-Topic: FUSE extended attribute patches available Thread-Index: 8umvwGT6Jzla+wnY64CVfII3kq5mdA== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Mar 2016 02:39:18 -0000 Ken Merry wrote: >=20 >=20 > > On Mar 7, 2016, at 2:59 AM, Robert Watson wrote: > >=20 > > FreeBSD and Linux=E2=80=99s extended-attribute models were inherited fr= om IRIX, as > > they were introduced to solve the same problems: a place to metadata su= ch > > as ACLs, MAC labels, capability masks, etc. IRIX had three namespaces: = one > > each for =E2=80=9Cuser=E2=80=9D, =E2=80=9Croot=E2=80=9D, and =E2=80=9Cs= ecure=E2=80=9D, reflecting whether or not they were > > managed by the file owner (or permissions), the privileged root user, o= r > > part of the TCB protection mechanism (e.g., for integrity labels). > >=20 > > These extended attributes should not be confused with the filesystem > > feature of the same name in NFSv4, which is sometimes known by the name > > =E2=80=9Cfile fork=E2=80=9D or =E2=80=9Cdata streams=E2=80=9D. EAs in I= RIX/FreeBSD/Linux/HPFS/etc are > > tuple pairs of names and values intended to be written atomically or > > updated in place specifically for (shortish) metadata such as ACLs, rat= her > > than being complete separate data spaces for I/O (e.g., that could be > > memory mapped). >=20 > It would be nice to have NFSv4 / Solaris style alternate data streams. Z= FS > handles them already, but I suppose it would take more work to support th= em > in UFS. >=20 When this was discussed previously, Jordan Hubbard pointed out that most of= the work is making sure the userland utilities (like backup utilities...) know = about them and what to do with them. I am not familiar with the userland issue, but if that was resolved, person= ally, I don't think a lack of support in UFS would be a showstopper. (Assuming that it would be= an addition and not a replacement for extended attributes.) I do recall that someone at Cern is adamant about this. (I think they creat= e file forks with Gbytes of data and can't live without them.) > > In FreeBSD=E2=80=99s design, we incorporated the disjoint namespace mod= el, > > providing USER and SYSTEM, the former being managed by the file owner (= and > > those given suitable permission), and the latter being used for TCB > > mechanisms such as the implementations of MAC labels, ACLs, etc. > >=20 > > In Linux, they adopted a more free-form mechanism based on a single > > combined namespace with a prefix =E2=80=94 e.g., user.FOO, and system.B= AR. Over > > time it looks like that namespace has been expanded in various > > filesystem-specific ways. We also have room to expand our namespace, bu= t > > from the description below, it=E2=80=99s not clear quite what the right= mechanism > > is. > >=20 > > One path would be to introduce a new namespace for filesystem-specific > > attributes =E2=80=94 e.g., EXTATTR_NAMESPACE_FS? > >=20 > > But I think the key question here is whether the existing namespaces ca= n > > provide the semantics you need. If not, then we likely need a new > > namespace. But then we get the question as to who controls use of the > > namespace. Certainly =E2=80=9Cthe filesystem=E2=80=9D is one option, bu= t then you will get > > inconsistency in approaches between filesystems and applications =E2=80= =94 across > > various dimensions including protection (who can read/modify them?), > > allocation (who decides what names should be used for what?), and > > semantics (what applications can use them, and who backs them up?). > >=20 > > For example: who should be responsible for backing up those attributes?= For > > =E2=80=98system=E2=80=99 attributes in FreeBSD, it is assumed that back= up tools will be > > aware of the services layered over the attributes =E2=80=94 e.g., that = they will > > back up ACLs using the ACL API, rather than backing up the binary EAs > > holding the ACLs. For =E2=80=98user=E2=80=99 attributes, it is assumed = that backup tools > > (e.g., tar) must explicitly preserve them, since they are user-defined = and > > user-managed. For filesystem-specific attributes, some other choice wil= l > > need to be made =E2=80=94 perhaps filesystem-specific backup tools need= to know > > about them? > >=20 > > Note that in the Linux EA model, ACLs are actually accessed via the EA > > system calls, whereas in FreeBSD, ACLs are a first-class citizen in the > > system-call API/ABI, and so user applications don=E2=80=99t treat them = as EAs. We > > made that choice as filesystems may choose themselves not to represent > > ACLs as EAs, and they have real semantics visible to the VFS layer. In > > Linux, I believe they chose to pass them via EAs to narrow the system-c= all > > interface for filesystem metadata. Both are legitimate choices, but thi= s > > could also trigger discussions about whether new attributes are best > > accessed via the EA interface, or new system calls. For > > filesystem-specific attributes, EAs are likely the better way to go. >=20 > It may be that for at least the purposes of FUSE, we can adequately live > under the USER namespace. That would allow for arbitrary namespaces that > Linux-centric filesystems create without significant churn in FreeBSD to > support it. >=20 > And of course this is only for the front/top end of a FUSE filesystem. W= hat > the filesystem actually does with the extended attributes that the user s= ets > on top is another question altogether. In the case of IBM=E2=80=99s LTFS= , it stores > extended attributes (without the =E2=80=9Cuser.=E2=80=9D prefix) in the L= TFS index, which is > an XML file that resides on tape. For other filesystems, the answer coul= d > also vary significantly. A few that I examined in sysutils/fusefs* used > extended attributes on the backend (usually on a backing filesystem) unde= r > Linux only, but not on the front (user facing) end. >=20 > In order to make arbitrary namespaces in FUSE work in FreeBSD under the u= ser > namespace, we=E2=80=99ll have to do what Rick was talking about and just = not include > the namespace as a prefix when we get/set attributes. This will allow us= ing > any sort of namespace or attribute name that the FUSE filesystem wants to > use. >=20 > The impact of this, from a porting standpoint, is that the FUSE filesyste= ms > will have to know that on FreeBSD, they cannot/should not expect to see t= he > =E2=80=9Cuser.=E2=80=9D namespace prefix, but they might see other namesp= ace prefixes. >=20 > I took a look at the way LTFS and Gluster work with respect to extended > attributes with MacOS, since it seems that is how MacOS works, and it=E2= =80=99s less > obvious to me what is going on with Gluster. They=E2=80=99ve got this fu= nction: >=20 > #ifdef GF_DARWIN_HOST_OS > static int > set_xattr_user_namespace_mode (struct posix_private *priv, const char *st= r) > { > if (strcmp (str, "none") =3D=3D 0) > priv->xattr_user_namespace =3D XATTR_NONE; > else if (strcmp (str, "strip") =3D=3D 0) > priv->xattr_user_namespace =3D XATTR_STRIP; > else if (strcmp (str, "append") =3D=3D 0) > priv->xattr_user_namespace =3D XATTR_APPEND; > else if (strcmp (str, "both") =3D=3D 0) > priv->xattr_user_namespace =3D XATTR_BOTH; > else > return -1; > return 0; > } > #endif >=20 > Although it=E2=80=99s not clear that they do anything with values other t= han > XATTR_STRIP. >=20 > With LTFS, since they either assume a =E2=80=9Cuser.=E2=80=9D prefix on L= inux, or no prefix > on Windows and MacOS X, it=E2=80=99s more straightforward. >=20 > Ken >=20 >=20 > >=20 > > Robert > >=20 > >> On 7 Mar 2016, at 07:16, Julian Elischer wrote: > >>=20 > >> On 5/03/2016 7:06 PM, Rick Macklem wrote: > >>> Ken Merry wrote: > >>>> I have patches for FreeBSD=E2=80=99s FUSE filesystem kernel module t= o support > >>>> extended attributes: > >> oh showing off your masochistic side eh? > >>=20 > >>>> https://people.freebsd.org/~ken/fuse_extattr.20160229.1.txt > >>>>=20 > >> I spent an hour beating my head against fuse yesterday. > >> then realised that it's an old version on our product. We really have = to > >> get off 8.0 > >> (hopefully a matter of weeks now to a 10.x switch) > >> Now all I need is to find a FreeBSD filesystem expert (ZFS/NFS/CIFS/G= FS) > >> to hire. > >>=20 > >>=20 > >>> The only bit of code I have that might be useful for this patch is: > >>> =09case FUSE_GETXATTR: > >>> =09case FUSE_LISTXATTR: > >>> ! =09=09/* > >>> ! =09=09 * These can have varying response lengths, and 0 length > >>> ! =09=09 * isn't necessarily invalid. > >>> ! =09=09 */ > >>> ! =09=09err =3D 0; > >>> *** I came up with this: > >>> =09=09fgin =3D (struct fuse_getxattr_in *) > >>> =09=09 ((char *)ftick->tk_ms_fiov.base + > >>> =09=09 sizeof(struct fuse_in_header)); > >>> =09=09if (fgin->size =3D=3D 0) > >>> =09=09=09err =3D (blen =3D=3D sizeof(struct fuse_getxattr_out)) ? 0 : > >>> =09=09=09 EINVAL; > >>> =09=09else > >>> =09=09=09err =3D (blen <=3D fgin->size) ? 0 : EINVAL; > >>> =09=09break; > >>> I think I got the size check right? > >>>=20 > >>> The big question is... > >>> What to do with the NAMESPACE? > >>> - My code fails for SYSTEM and does USER without prepending "user.". > >>> (That seemed to be what rwatson@ felt was reasonable. I thought our > >>> discussion was on a mailing list, but I can't find it.) > >>> I've cc'd him. Maybe he can comment again. > >> Is there a standard for extended attributes I should knwo about? > >> It seems to me that it's a bit like the wild west. > >> Extended attributes seem to be "every OS for himself". > >>=20 > >>>=20 > >>> - If you stick with prepending "user." or "system." there needs to be > >>> some way to bypass this so that attributes that don't start in "user= ." > >>> or "system." can be accessed. I've seen "trusted." and "glusterfs." > >>> on GlusterFS. > >>> --> Maybe a new namespace called something like "nil" that just bypa= sses > >>> any USER or SYSTEM checks? > >>>=20 > >>> rick > >>>=20 > >>>> The patch implements the get/set/delete/list extended attribute meth= ods. > >>>> The > >>>> listing code also converts extended attribute lists from the Linux/F= USE > >>>> format to the FreeBSD format. For example: > >>>>=20 > >>>> # touch foo > >>>> # ls -la foo > >>>> -rwxrwxrwx 1 root wheel 0 Feb 29 21:40 foo > >>>> # lsextattr user foo > >>>> foo > >>>> # setextattr user testattr1 "12345678" foo > >>>> # lsextattr user foo > >>>> foo testattr1 > >>>> # getextattr user testattr1 foo > >>>> foo 12345678 > >>>> # setextattr user testattr2 "87654321" foo > >>>> # lsextattr user foo > >>>> foo testattr2 testattr1 > >>>> # rmextattr user testattr1 foo > >>>> # lsextattr user foo > >>>> foo testattr2 > >>>> # getextattr user testattr1 foo > >>>> getextattr: foo: failed: Attribute not found > >>>> # getextattr user testattr2 foo > >>>> foo 87654321 > >>>>=20 > >>>>=20 > >>>> Just to be clear on what this does, it only provides extended attrib= ute > >>>> support to FreeBSD applications if the underlying FUSE filesystem > >>>> implements > >>>> FUSE extended attribute support. Many FUSE filesystems don=E2=80=99= t support > >>>> the > >>>> extended attribute VFS operations. > >>>>=20 > >>>> I have tested this out on IBM=E2=80=99s LTFS implementation, but I h= ave not yet > >>>> found > >>>> another FUSE filesystem that supports extended attributes. If anyon= e > >>>> knows > >>>> of one, please let me know so I can try it out. (I looked through a > >>>> number > >>>> of the filesystems in sysutils/fusefs* in the ports tree.) > >>>>=20 > >>>> Any feedback is welcome. I=E2=80=99m planning to check this into Fr= eeBSD/head > >>>> in the > >>>> next week or so. > >>>>=20 > >>>> Obviously, I=E2=80=99ve also ported IBM=E2=80=99s LTFS implementatio= n to FreeBSD. It > >>>> works > >>>> in the standard FUSE mode, and you can also link it into an applicat= ion > >>>> as a > >>>> library if you don=E2=80=99t want to incur the overhead of running t= hrough FUSE. > >>>> I > >>>> haven=E2=80=99t gotten around to packaging it up to go out for testi= ng / review. > >>>>=20 > >>>> If anyone has IBM LTO-5 or newer tape drives, or IBM TS1140 or newer > >>>> tape > >>>> drives, and wants to try it out, let me know. I=E2=80=99ll send you= the code > >>>> when > >>>> I=E2=80=99ve got it at least somewhat ready. This is IBM-specific, = and won=E2=80=99t > >>>> work > >>>> on HP tape drives. > >>>>=20 > >>>> Ken > >>>> =E2=80=94 > >>>> Ken Merry > >>>> ken@FreeBSD.ORG > >>>>=20 > >>>>=20 > >>>>=20 > >>>> _______________________________________________ > >>>> freebsd-fs@freebsd.org mailing list > >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org= " > >>> _______________________________________________ > >>> freebsd-fs@freebsd.org mailing list > >>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > >>>=20 > >>>=20 > >>=20 > >=20 >=20 >=20 >=20 > =E2=80=94 > Ken Merry > ken@FreeBSD.ORG >=20 >=20 >=20 >=20 From owner-freebsd-fs@freebsd.org Tue Mar 8 02:55:58 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 368ADAC3018; Tue, 8 Mar 2016 02:55:58 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id D9A7984F; Tue, 8 Mar 2016 02:55:57 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) IronPort-PHdr: 9a23:GkQ+Yh85dPzSQP9uRHKM819IXTAuvvDOBiVQ1KB91ukcTK2v8tzYMVDF4r011RmSDdqdu6gP1raempujcFJDyK7JiGoFfp1IWk1NouQttCtkPvS4D1bmJuXhdS0wEZcKflZk+3amLRodQ56mNBXsq3G/pQQfBg/4fVIsYL+lRciC1Y/qi6ibwN76XUZhvHKFe7R8LRG7/036l/I9ps9cEJs30QbDuXBSeu5blitCLFOXmAvgtI/rpMYwu3cYh/V0zclGWKH2N4c8SqQQWC4hNWkx6IjvtALfViOM4nwEFHoNxElmGQ/AuSv7VZS5lyLxte5w3WHOJ8j/RrMwVDGK8qBkVRLskCdBPDdvozKfsdB5kK8O+EHpnBd42YOBOIw= X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DQAQAcPt5W/61jaINchAxtBrpCAQ2BaRcKhSRKAoFsFAEBAQEBAQEBYyeCLYIUAQEBAwEBAQEgKyALEAIBCBgCAg0ZAgInAQkmAgQIBwQBHASHewgOrxWPNwEBAQEBAQEBAgEBAQEBAQEVBHuFHIF3gkaEGwEBBRaDAoE6BYdWhk49iEmFY4JwgjKRY45TAh4BAUKCAxmBZh4uAQaIRjR+AQEB X-IronPort-AV: E=Sophos;i="5.22,554,1449550800"; d="scan'208";a="269547572" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 07 Mar 2016 21:55:56 -0500 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id DCD1115F56D; Mon, 7 Mar 2016 21:55:56 -0500 (EST) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id FlLsQhU7lwah; Mon, 7 Mar 2016 21:55:56 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 4246315F571; Mon, 7 Mar 2016 21:55:56 -0500 (EST) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id z5e-MhSE5FSM; Mon, 7 Mar 2016 21:55:56 -0500 (EST) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 239CB15F56D; Mon, 7 Mar 2016 21:55:56 -0500 (EST) Date: Mon, 7 Mar 2016 21:55:56 -0500 (EST) From: Rick Macklem To: Ronald Klop Cc: freebsd-arm@freebsd.org, freebsd-fs@freebsd.org Message-ID: <1482595660.8940439.1457405756110.JavaMail.zimbra@uoguelph.ca> In-Reply-To: References: <3DAB3639-8FB8-43D3-9517-94D46EDEC19E@gromit.dlib.vt.edu> Subject: Re: Unstable NFS on recent CURRENT MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.12] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF44 (Win)/8.0.9_GA_6191) Thread-Topic: Unstable NFS on recent CURRENT Thread-Index: XMoWvOdVc3ZSKiQh8ntIVyTwG3qPoQ== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Mar 2016 02:55:58 -0000 Paul Mather (forwarded by Ronald Klop) wrote: > On Sun, 06 Mar 2016 02:57:03 +0100, Paul Mather > wrote: > > > On my BeagleBone Black running 11-CURRENT (r296162) lately I have been > > having trouble with NFS. I have been doing a buildworld and buildkernel > > with /usr/src and /usr/obj mounted via NFS. Recently, this process has > > resulted in the buildworld failing at some point, with a variety of > > errors (Segmentation fault; Permission denied; etc.). Even a "ls -alR" > > of /usr/src doesn't manage to complete. It errors out thus: > > > > ===== > > [[...]] > > total 0 > > ls: ./.svn/pristine/fe: Permission denied > > > > ./.svn/pristine/ff: > > total 0 > > ls: ./.svn/pristine/ff: Permission denied > > ls: fts_read: Permission denied > > ===== > > > > On the console, I get the following: > > > > newnfs: server 'chumby.chumby.lan' error: fileid changed. fsid > > 94790777:a4385de: expected fileid 0x4, got 0x2. (BROKEN NFS SERVER OR > > MIDDLEWARE) > > > > > > I am using a FreeBSD/amd64 10.3-PRERELEASE (r296412) as the NFS server. > > On the BeagleBone Black, I am mounting /usr/src and /usr/obj via > > /etc/fstab as follows: > > > > chumby.chumby.lan:/build/src/head /usr/src nfs rw,nfsv4 0 0 > > chumby.chumby.lan:/build/obj/bbb /usr/obj nfs rw,nfsv4 0 0 > > > > > > /build/src/head and /build/obj/bbb are both ZFS file systems. > > Is it possible that a ZFS file system has gotten to the point where the i-node# exceeds 32bits? ZFS does support more than 32bits for i-node#s, but FreeBSD does not (it truncates to the low order 32bits). I know diddly about ZFS, so I don't know if you actually have to create more than 4billion files to get the i-node# to exceed 32bits or ??? There has been work done on making ino_t 64bits, but it hasn't made it into FreeBSD-current and I have no idea when it might. If you could try a build on newly created file systems (or UFS ones instead of ZFS), that would tell you if the above might be the problem. rick > > Has anyone else encountered this? It has only started happening > > recently for me, it seems. Prior to this, I have been able to do a > > buildworld and buildkernel successfully over NFS. > > > > Cheers, > > > > Paul. > > I cc this to freebsd-fs for you. > > Ronald. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@freebsd.org Tue Mar 8 03:50:05 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E8671AC729E; Tue, 8 Mar 2016 03:50:05 +0000 (UTC) (envelope-from fred.fliu@gmail.com) Received: from mail-lb0-x234.google.com (mail-lb0-x234.google.com [IPv6:2a00:1450:4010:c04::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 638D6226; Tue, 8 Mar 2016 03:50:05 +0000 (UTC) (envelope-from fred.fliu@gmail.com) Received: by mail-lb0-x234.google.com with SMTP id k15so3514826lbg.0; Mon, 07 Mar 2016 19:50:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-transfer-encoding; bh=AoYsfHa27fdhsl2RmI5NGmoUqHyo9gJQVgjAupcGz9s=; b=SmDzHUhxcsfiPWKlfWhX7hALhgwDwMQI+5J7yn2ls4eRe+pQYpfhNZHLnad6gp6w4m 9hUBSpC7uGb455smvxMcLhIHrcXMoAJbds8DZrbuYYWJ6tYGmIVfQ79JgBqlO9p3RUsa Dd/6AjIKSNG7hMZo4PK01/aCTftzDJ0rGGRHk5st46MwMIBZTWVGj88u9mbrOCxvVmgt yoP750zTlqYPmT9imQeMVQB9mN52jrj/RgxxXkr9RMdI84mBXNXvuUUQ/gLMQhHvsz2U YGdbjJJ11qNDh24BXXboMAOBmxM6ulYfrys0YkV3zVHO7zANhilgNtwDS40iqybnoA74 /iTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-transfer-encoding; bh=AoYsfHa27fdhsl2RmI5NGmoUqHyo9gJQVgjAupcGz9s=; b=fZWHiH7eSwSQWQlVVjGPI1GWUU73o6UqxmcjeAlP6GHizg+xeJWqnBuVcNEbQ00Wuo bj/7v0evQ5a9Q07Tn+o8UzZzpdbt6kK4q7Qp8inFU5svP9gRldO2BQdOPimd0xzQWEq6 qHLYaodODtYWweCeTgw3SVE7K5ZIzhRjWie4qhAZ1QfM/nEJ6Hsm/Npii5EdgSno4gGL 8AriGs08TlYFqgH1+n3YEPovHLDIB4q1Cb1Qcd4fFS6qQWdfuTbyO49OuhdYjOftQ6YR g4ySEUTPCuxTj50eCB8aV/Uml7Jfr9Ecela/tST4lJCbCbNhnZiAkAyQyDsNpR3VddLv 7k6w== X-Gm-Message-State: AD7BkJIRYiDViWsa2bhIob0AXIs8cZut4C1nOmLW4VQAVSHQkVWvLqR8gL9XelW0Nvvjbsv6SgDieV61T7Tmmg== MIME-Version: 1.0 X-Received: by 10.112.162.231 with SMTP id yd7mr8855063lbb.40.1457409001735; Mon, 07 Mar 2016 19:50:01 -0800 (PST) Received: by 10.25.20.164 with HTTP; Mon, 7 Mar 2016 19:50:01 -0800 (PST) In-Reply-To: <56DD17CA.90200@freebsd.org> References: <95563acb-d27b-4d4b-b8f3-afeb87a3d599@me.com> <56D87784.4090103@broken.net> <56DD17CA.90200@freebsd.org> Date: Tue, 8 Mar 2016 11:50:01 +0800 Message-ID: Subject: Re: [zfs] an interesting survey -- the zpool with most disks you have ever built From: Fred Liu To: Julian Elischer Cc: illumos-zfs , Discussion list for OpenIndiana , omnios-discuss , developer , "zfs-devel@freebsd.org" , illumos-developer , "freebsd-fs@FreeBSD.org" , "smartos-discuss@lists.smartos.org" , "zfs-discuss@list.zfsonlinux.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Mailman-Approved-At: Tue, 08 Mar 2016 05:02:46 +0000 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Mar 2016 03:50:06 -0000 2016-03-07 13:55 GMT+08:00 Julian Elischer : > On 6/03/2016 9:30 PM, Fred Liu wrote: >> >> 2016-03-05 0:01 GMT+08:00 Freddie Cash : >> >>> On Mar 4, 2016 2:05 AM, "Fred Liu" wrote: >>>> >>>> 2016-03-04 13:47 GMT+08:00 Freddie Cash : >>>>> >>>>> Currently, I just use a simple coordinate system. Columns are letters= , >>> >>> rows are numbers. >>>>> >>>>> "smartos-discuss@lists.smartos.org" >>> >>>> =E3=80=81 >>> >>> developer =E3=80=81 >>> >>> illumos-developer =E3=80=81 >>> >>> omnios-discuss =E3=80=81 >>> >>> Discussion list for OpenIndiana = =E3=80=81 >>> >>> illumos-zfs =E3=80=81 >>> >>> "zfs-discuss@list.zfsonlinux.org" =E3= =80=81 >>> >>> "freebsd-fs@FreeBSD.org" =E3=80=81 >>> >>> "zfs-devel@freebsd.org" >>> >>>>> Each disk is partitioned using GPT with the first (only) partition >>> >>> starting at 1 MB and covering the whole disk, and labelled with the >>> column/row where it is located (disk-a1, disk-g6, disk-p3, etc). >>>> >>>> [Fred]: So you manually pull off all the drives one by one to locate >>> >>> them? >>> >>> When putting the system together for the first time, I insert each disk >>> one at a time, wait for it to be detected, partition it, then label it >>> based on physical location. Then do the next one. It's just part of t= he >>> normal server build process, whether it has 2 drives, 20 drives, or 200 >>> drives. >>> >>> We build all our own servers from off-the-shelf parts; we don't buy >>> anything pre-built from any of the large OEMs. >>> >> [Fred]: Gotcha! >> >> >>>>> The pool is created using the GPT labels, so the label shows in "zpoo= l >>> >>> list" output. >>>> >>>> [Fred]: What will the output look like? >>> >>> From our smaller backups server, with just 24 drive bays: >>> >>> $ zpool status storage >>> >>> pool: storage >>> >>> state: ONLINE >>> >>> status: Some supported features are not enabled on the pool. The pool c= an >>> >>> still be used, but some features are unavailable. >>> >>> action: Enable all features using 'zpool upgrade'. Once this is done, >>> >>> the pool may no longer be accessible by software that does not support >>> >>> the features. See zpool-features(7) for details. >>> >>> scan: scrub canceled on Wed Feb 17 12:02:20 2016 >>> >>> config: >>> >>> >>> NAME STATE READ WRITE CKSUM >>> >>> storage ONLINE 0 0 0 >>> >>> raidz2-0 ONLINE 0 0 0 >>> >>> gpt/disk-a1 ONLINE 0 0 0 >>> >>> gpt/disk-a2 ONLINE 0 0 0 >>> >>> gpt/disk-a3 ONLINE 0 0 0 >>> >>> gpt/disk-a4 ONLINE 0 0 0 >>> >>> gpt/disk-a5 ONLINE 0 0 0 >>> >>> gpt/disk-a6 ONLINE 0 0 0 >>> >>> raidz2-1 ONLINE 0 0 0 >>> >>> gpt/disk-b1 ONLINE 0 0 0 >>> >>> gpt/disk-b2 ONLINE 0 0 0 >>> >>> gpt/disk-b3 ONLINE 0 0 0 >>> >>> gpt/disk-b4 ONLINE 0 0 0 >>> >>> gpt/disk-b5 ONLINE 0 0 0 >>> >>> gpt/disk-b6 ONLINE 0 0 0 >>> >>> raidz2-2 ONLINE 0 0 0 >>> >>> gpt/disk-c1 ONLINE 0 0 0 >>> >>> gpt/disk-c2 ONLINE 0 0 0 >>> >>> gpt/disk-c3 ONLINE 0 0 0 >>> >>> gpt/disk-c4 ONLINE 0 0 0 >>> >>> gpt/disk-c5 ONLINE 0 0 0 >>> >>> gpt/disk-c6 ONLINE 0 0 0 >>> >>> raidz2-3 ONLINE 0 0 0 >>> >>> gpt/disk-d1 ONLINE 0 0 0 >>> >>> gpt/disk-d2 ONLINE 0 0 0 >>> >>> gpt/disk-d3 ONLINE 0 0 0 >>> >>> gpt/disk-d4 ONLINE 0 0 0 >>> >>> gpt/disk-d5 ONLINE 0 0 0 >>> >>> gpt/disk-d6 ONLINE 0 0 0 >>> >>> cache >>> >>> gpt/cache0 ONLINE 0 0 0 >>> >>> gpt/cache1 ONLINE 0 0 0 >>> >>> >>> errors: No known data errors >>> >>> The 90-bay systems look the same, just that the letters go all the way = to >>> p (so disk-p1 through disk-p6). And there's one vdev that uses 3 drive= s >>> from each chassis (7x 6-disk vdev only uses 42 drives of the 45-bay >>> chassis, so there's lots of spares if using a single chassis; using two >>> chassis, there's enough drives to add an extra 6-disk vdev). >>> >> [Fred]: It looks like the gpt label shown in "zpool status" only works i= n >> FreeBSD/FreeNAS. Are you using FreeBSD/FreeNAS? I can't find the similar >> possibilities in Illumos/Linux. > > > Ah that's a trick.. FreeBSD exports an actual /dev/gpt/{you-label-goes-he= re} > for each labeled partition it finds. > So it's not ZFS doing anything special.. it's what FreeBSD is calling the > partition. > Super cool! Fred From owner-freebsd-fs@freebsd.org Tue Mar 8 03:56:23 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 89035AC74E3; Tue, 8 Mar 2016 03:56:23 +0000 (UTC) (envelope-from fred.fliu@gmail.com) Received: from mail-lb0-x22a.google.com (mail-lb0-x22a.google.com [IPv6:2a00:1450:4010:c04::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1457D764; Tue, 8 Mar 2016 03:56:23 +0000 (UTC) (envelope-from fred.fliu@gmail.com) Received: by mail-lb0-x22a.google.com with SMTP id k15so3620913lbg.0; Mon, 07 Mar 2016 19:56:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=OaFv1WIP9IS0uwcRqw1S9Sn0qI/rckbXP42M6r28SYU=; b=oGeVYvu0J03Ao//K4hcpZuM9nOLK38PIHyZlLYcgPDAM7kjE5fgsRSFwFRKa1hwtFo +lepalhJXSTfpo7X3RG2yhNYikpREkzZD+kF/LrWySvSdQVXc4vC+A1Ovw1dby9BaUDT QVJEJ+51WvJB6L0zVFEvOtyquK9LjCz0MjcKOU0YaP7xkh4sYmfwFgQ9hyU/8eq5zWV2 BC6ZwG2GYSMIXL5IbfncxYB2geXtNqWtBFnmySycoVTP27N2qUR9Gb7TAxVCOjzTG8an 9T5ZLEF1v1Ve3HP1eoQacQblbGBrIUML+8U46MXe31jcuiXjho5MA47Zf8R8sTFUqIap U/4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=OaFv1WIP9IS0uwcRqw1S9Sn0qI/rckbXP42M6r28SYU=; b=Ksm3drqv4GhDm7pyES63WZJXsxJwzAyfEE9zRftRHvJ5BhTmCzOdiPbssyWJT51s3/ B6yyb32YP40jbGIExuMPsLNY831msu5i+VNLMZDr6/XuPvbil5PHiRrjovik7zQjuaa+ Qvo2VHpi3yB5xtYM/KccuLjUnLzH4t7kXXCWYMzWVCUoZgruyynPg5bw/I1vC4ifVsbL RN5fYV0dYZjUM28QeMv8FiKSIFeRlP9J6SpKXQEndEpQpMEJejexopfgs87Rjoz5XnqK pkfx+yRmhP/OiXAbQJxb/H6OEJhXkIZPvJtmU1d+522zH1DXCQ1T+dEvKfXf/xgFoUVI TpCg== X-Gm-Message-State: AD7BkJI4G93ofocIaYuYHibIaoGVlrPQrOE4zG0U7s7dNePl89gEqpCi5937GQzC55Rh9PPBkbzz+bRAD+xEPA== MIME-Version: 1.0 X-Received: by 10.25.161.131 with SMTP id k125mr9032768lfe.83.1457409380752; Mon, 07 Mar 2016 19:56:20 -0800 (PST) Received: by 10.25.20.164 with HTTP; Mon, 7 Mar 2016 19:56:20 -0800 (PST) In-Reply-To: References: <95563acb-d27b-4d4b-b8f3-afeb87a3d599@me.com> <56D87784.4090103@broken.net> <5158F354-9636-4031-9536-E99450F312B3@RichardElling.com> <6E2B77D1-E0CA-4901-A6BD-6A22C07536B3@gmail.com> Date: Tue, 8 Mar 2016 11:56:20 +0800 Message-ID: Subject: Re: [zfs] [developer] Re: [smartos-discuss] an interesting survey -- the zpool with most disks you have ever built From: Fred Liu To: illumos-zfs Cc: "smartos-discuss@lists.smartos.org" , developer@lists.open-zfs.org, developer , illumos-developer , omnios-discuss , Discussion list for OpenIndiana , "zfs-discuss@list.zfsonlinux.org" , "freebsd-fs@FreeBSD.org" , "zfs-devel@freebsd.org" X-Mailman-Approved-At: Tue, 08 Mar 2016 05:03:16 +0000 Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.21 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Mar 2016 03:56:23 -0000 2016-03-08 4:55 GMT+08:00 Liam Slusser : > I don't have a 2000 drive array (thats amazing!) but I do have two 280 > drive arrays which are in production. Here are the generic stats: > > server setup: > OpenIndiana oi_151 > 1 server rack > Dell r720xd 64g ram with mirrored 250g boot disks > 5 x LSI 9207-8e dualport SAS pci-e host bus adapters > Intel 10g fibre ethernet (dual port) > 2 x SSD for log cache > 2 x SSD for cache > 23 x Dell MD1200 with 3T,4T, or 6T NLSAS disks (a mix of Toshiba, Western > Digital, and Seagate drives - basically whatever Dell sends) > > zpool setup: > 23 x 12-disk raidz2 glued together. 276 total disks. Basically each new > 12 disk MD1200 is a new raidz2 added to the pool. > > Total size: ~797T > > We have an identical server which we replicate changes via zfs snapshots > every few minutes. The whole setup as been up and running for a few years > now, no issues. As we run low on space we purchase two additional MD1200 > shelfs (one for each system) and add the new raidz2 into pool on-the-fly. > > The only real issues we've had is sometimes a disk fails in such a way > (think Monty Python and the holy grail i'm not dead yet) where the disk > hasn't failed but is timing out and slows the whole array to a standstill > until we can manual find and remove the disk. Other problems are once a > disk has been replaced sometimes the resilver process can take > an eternity. We have also found the snapshot replication process can > interfere with the resilver process - resilver gets stuck at 99% and never > ends - so we end up stopping or only doing one replication a day until the > resilver process is done. > > The last helpful hint I have was lowering all the drive timeouts, see > http://everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/ > for info. > > [Fred]: zpool wiith 280 drives in production is pretty big! I think 2000 > drives were just in test. It is true that huge pools have lots of operation > challenges. I have met the similar sluggish issue caused by a > will-die disk. Just curious, what is the cluster software implemented in http://everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/ ? Thanks. Fred > > > >>> >>> >> > *illumos-zfs* | Archives > > | > Modify > > Your Subscription > From owner-freebsd-fs@freebsd.org Tue Mar 8 05:26:44 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6387AAC21FF for ; Tue, 8 Mar 2016 05:26:44 +0000 (UTC) (envelope-from jkh@ixsystems.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 43CB9A98 for ; Tue, 8 Mar 2016 05:26:44 +0000 (UTC) (envelope-from jkh@ixsystems.com) Received: by mailman.ysv.freebsd.org (Postfix) id 40B0DAC21FD; Tue, 8 Mar 2016 05:26:44 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 26534AC21FB for ; Tue, 8 Mar 2016 05:26:44 +0000 (UTC) (envelope-from jkh@ixsystems.com) Received: from barracuda.ixsystems.com (barracuda.ixsystems.com [12.229.62.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.ixsystems.com", Issuer "Go Daddy Secure Certificate Authority - G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id ED39EA93 for ; Tue, 8 Mar 2016 05:26:43 +0000 (UTC) (envelope-from jkh@ixsystems.com) X-ASG-Debug-ID: 1457414802-08ca041787e9c60001-ETJDRd Received: from zimbra.ixsystems.com ([10.246.0.20]) by barracuda.ixsystems.com with ESMTP id ReAiXcRFNUXtr106 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 07 Mar 2016 21:26:42 -0800 (PST) X-Barracuda-Envelope-From: jkh@ixsystems.com X-Barracuda-RBL-Trusted-Forwarder: 10.246.0.20 X-ASG-Whitelist: Client Received: from localhost (localhost [127.0.0.1]) by zimbra.ixsystems.com (Postfix) with ESMTP id 3E021A24AAF; Mon, 7 Mar 2016 21:26:42 -0800 (PST) Received: from zimbra.ixsystems.com ([127.0.0.1]) by localhost (zimbra.ixsystems.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id TWPhxT75OqFO; Mon, 7 Mar 2016 21:26:42 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.ixsystems.com (Postfix) with ESMTP id E80E7A24AB0; Mon, 7 Mar 2016 21:26:41 -0800 (PST) X-Virus-Scanned: amavisd-new at ixsystems.com Received: from zimbra.ixsystems.com ([127.0.0.1]) by localhost (zimbra.ixsystems.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id L6ilLv6SRbv8; Mon, 7 Mar 2016 21:26:41 -0800 (PST) Received: from [10.8.0.34] (unknown [10.8.0.34]) by zimbra.ixsystems.com (Postfix) with ESMTPSA id 702ABA24AA9; Mon, 7 Mar 2016 21:26:41 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: FUSE extended attribute patches available From: Jordan Hubbard X-ASG-Orig-Subj: Re: FUSE extended attribute patches available In-Reply-To: <6AF0FC23-CC34-43EA-A008-9FB82FB21558@FreeBSD.org> Date: Mon, 7 Mar 2016 21:26:42 -0800 Cc: Julian Elischer , Ken Merry , fs@freebsd.org, scsi@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <536B3B67-E8F6-472C-8A2C-8884533B4CB6@ixsystems.com> References: <800018199.6694281.1457233600357.JavaMail.zimbra@uoguelph.ca> <56DD2AB6.1030407@freebsd.org> <6AF0FC23-CC34-43EA-A008-9FB82FB21558@FreeBSD.org> To: Robert Watson X-Mailer: Apple Mail (2.3124) X-Barracuda-Connect: UNKNOWN[10.246.0.20] X-Barracuda-Start-Time: 1457414802 X-Barracuda-Encrypted: ECDHE-RSA-AES256-GCM-SHA384 X-Barracuda-URL: https://10.246.0.26:443/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at ixsystems.com X-Barracuda-BRTS-Status: 1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Mar 2016 05:26:44 -0000 > On Mar 6, 2016, at 11:59 PM, Robert Watson = wrote: >=20 > For example: who should be responsible for backing up those = attributes? For =E2=80=98system=E2=80=99 attributes in FreeBSD, it is = assumed that backup tools will be aware of the services layered over the = attributes =E2=80=94 e.g., that they will back up ACLs using the ACL = API, rather than backing up the binary EAs holding the ACLs. For = =E2=80=98user=E2=80=99 attributes, it is assumed that backup tools = (e.g., tar) must explicitly preserve them, since they are user-defined = and user-managed. For filesystem-specific attributes, some other choice = will need to be made =E2=80=94 perhaps filesystem-specific backup tools = need to know about them? As Rick observed, the last time this came up, I pointed out that = teaching all possible =E2=80=9Carchivers=E2=80=9D of filesystem = information (which includes serializers, like rsync / scp / etc) how to = cope gracefully with EAs and ACLs, even if you stick ACLs in an EA = system namespace and make them opaque blobs, is =E2=80=9Chard=E2=80=9D = and it=E2=80=99s the edge/legacy cases that really hose you. This is why Apple chose to split the problem space between =E2=80=9Cthings= that are capable of dealing with the problem natively=E2=80=9D = (abstracting that native understanding behind the copyfile(3) APIs), = which is a comparatively small number of tools, and =E2=80=9Ceverything = else=E2=80=9D which gets all of its metadata serialized into an = AppleDouble file. Yes, those are the ._* files you see when you copy a = bunch of files off your Mac onto a filesystem that doesn=E2=80=99t know = how to cope with the metadata. The extra AppleDouble files just sit = around passively on that NTFS or ext2fs filesystem, minding their own = business, until they get copied back to to Mac, at which point something = at the VFS layer (I=E2=80=99m guessing now, I never actually looked) = re-unites the ._Foo file with its corresponding Foo file and the = AppleDouble file disappears. I used to think this was kind of ghetto, then I observed how many = problems it solved in terms of turning data-loss scenarios into =E2=80=9Cn= o big deal, it=E2=80=99ll all work out=E2=80=9D scenarios and I changed = my mind. TL;DR summary: This can be handled in such a way that =E2=80=9Cno one = need be responsible=E2=80=9D for backing up those attributes if you=E2=80=99= re willing to pay the freight. - Jordan From owner-freebsd-fs@freebsd.org Tue Mar 8 06:38:04 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7D52AAC3952 for ; Tue, 8 Mar 2016 06:38:04 +0000 (UTC) (envelope-from rwatson@freebsd.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 6A1DC14C6 for ; Tue, 8 Mar 2016 06:38:04 +0000 (UTC) (envelope-from rwatson@freebsd.org) Received: by mailman.ysv.freebsd.org (Postfix) id 6A261AC3951; Tue, 8 Mar 2016 06:38:04 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4FBCFAC394F; Tue, 8 Mar 2016 06:38:04 +0000 (UTC) (envelope-from rwatson@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [198.74.231.69]) by mx1.freebsd.org (Postfix) with ESMTP id 0EA5A14C5; Tue, 8 Mar 2016 06:38:04 +0000 (UTC) (envelope-from rwatson@freebsd.org) Received: from [10.0.1.9] (host81-157-243-217.range81-157.btcentralplus.com [81.157.243.217]) by cyrus.watson.org (Postfix) with ESMTPSA id BDAB746B64; Tue, 8 Mar 2016 01:38:02 -0500 (EST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (1.0) Subject: Re: FUSE extended attribute patches available From: "Robert N. M. Watson" X-Mailer: iPhone Mail (13D15) In-Reply-To: Date: Tue, 8 Mar 2016 06:38:00 +0000 Cc: Robert Watson , Julian Elischer , Rick Macklem , fs@freebsd.org, scsi@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <800018199.6694281.1457233600357.JavaMail.zimbra@uoguelph.ca> <56DD2AB6.1030407@freebsd.org> <6AF0FC23-CC34-43EA-A008-9FB82FB21558@FreeBSD.org> To: Ken Merry X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Mar 2016 06:38:04 -0000 Just a quick observation: to avoid application change, you could actually le= ave the 'user.' on the front of the strings? It's not harmful, it just doesn= 't serve the same function. This might keep documentation more in sync, etc.= Sent from my iPhone > On 7 Mar 2016, at 22:28, Ken Merry wrote: >=20 >=20 >=20 >> On Mar 7, 2016, at 2:59 AM, Robert Watson wrote: >>=20 >> FreeBSD and Linux=E2=80=99s extended-attribute models were inherited from= IRIX, as they were introduced to solve the same problems: a place to metada= ta such as ACLs, MAC labels, capability masks, etc. IRIX had three namespace= s: one each for =E2=80=9Cuser=E2=80=9D, =E2=80=9Croot=E2=80=9D, and =E2=80=9C= secure=E2=80=9D, reflecting whether or not they were managed by the file own= er (or permissions), the privileged root user, or part of the TCB protection= mechanism (e.g., for integrity labels). >>=20 >> These extended attributes should not be confused with the filesystem feat= ure of the same name in NFSv4, which is sometimes known by the name =E2=80=9C= file fork=E2=80=9D or =E2=80=9Cdata streams=E2=80=9D. EAs in IRIX/FreeBSD/Li= nux/HPFS/etc are tuple pairs of names and values intended to be written atom= ically or updated in place specifically for (shortish) metadata such as ACLs= , rather than being complete separate data spaces for I/O (e.g., that could b= e memory mapped). >=20 > It would be nice to have NFSv4 / Solaris style alternate data streams. ZFS= handles them already, but I suppose it would take more work to support them= in UFS. >=20 >> In FreeBSD=E2=80=99s design, we incorporated the disjoint namespace model= , providing USER and SYSTEM, the former being managed by the file owner (and= those given suitable permission), and the latter being used for TCB mechani= sms such as the implementations of MAC labels, ACLs, etc. >>=20 >> In Linux, they adopted a more free-form mechanism based on a single combi= ned namespace with a prefix =E2=80=94 e.g., user.FOO, and system.BAR. Over t= ime it looks like that namespace has been expanded in various filesystem-spe= cific ways. We also have room to expand our namespace, but from the descript= ion below, it=E2=80=99s not clear quite what the right mechanism is. >>=20 >> One path would be to introduce a new namespace for filesystem-specific at= tributes =E2=80=94 e.g., EXTATTR_NAMESPACE_FS? >>=20 >> But I think the key question here is whether the existing namespaces can p= rovide the semantics you need. If not, then we likely need a new namespace. B= ut then we get the question as to who controls use of the namespace. Certain= ly =E2=80=9Cthe filesystem=E2=80=9D is one option, but then you will get inc= onsistency in approaches between filesystems and applications =E2=80=94 acro= ss various dimensions including protection (who can read/modify them?), allo= cation (who decides what names should be used for what?), and semantics (wha= t applications can use them, and who backs them up?). >>=20 >> For example: who should be responsible for backing up those attributes? Fo= r =E2=80=98system=E2=80=99 attributes in FreeBSD, it is assumed that backup t= ools will be aware of the services layered over the attributes =E2=80=94 e.g= ., that they will back up ACLs using the ACL API, rather than backing up the= binary EAs holding the ACLs. For =E2=80=98user=E2=80=99 attributes, it is a= ssumed that backup tools (e.g., tar) must explicitly preserve them, since th= ey are user-defined and user-managed. For filesystem-specific attributes, so= me other choice will need to be made =E2=80=94 perhaps filesystem-specific b= ackup tools need to know about them? >>=20 >> Note that in the Linux EA model, ACLs are actually accessed via the EA sy= stem calls, whereas in FreeBSD, ACLs are a first-class citizen in the system= -call API/ABI, and so user applications don=E2=80=99t treat them as EAs. We m= ade that choice as filesystems may choose themselves not to represent ACLs a= s EAs, and they have real semantics visible to the VFS layer. In Linux, I be= lieve they chose to pass them via EAs to narrow the system-call interface fo= r filesystem metadata. Both are legitimate choices, but this could also trig= ger discussions about whether new attributes are best accessed via the EA in= terface, or new system calls. For filesystem-specific attributes, EAs are li= kely the better way to go. >=20 > It may be that for at least the purposes of FUSE, we can adequately live u= nder the USER namespace. That would allow for arbitrary namespaces that Lin= ux-centric filesystems create without significant churn in FreeBSD to suppor= t it. >=20 > And of course this is only for the front/top end of a FUSE filesystem. Wh= at the filesystem actually does with the extended attributes that the user s= ets on top is another question altogether. In the case of IBM=E2=80=99s LTFS= , it stores extended attributes (without the =E2=80=9Cuser.=E2=80=9D prefix)= in the LTFS index, which is an XML file that resides on tape. For other fi= lesystems, the answer could also vary significantly. A few that I examined i= n sysutils/fusefs* used extended attributes on the backend (usually on a bac= king filesystem) under Linux only, but not on the front (user facing) end. >=20 > In order to make arbitrary namespaces in FUSE work in FreeBSD under the us= er namespace, we=E2=80=99ll have to do what Rick was talking about and just n= ot include the namespace as a prefix when we get/set attributes. This will a= llow using any sort of namespace or attribute name that the FUSE filesystem w= ants to use. >=20 > The impact of this, from a porting standpoint, is that the FUSE filesystem= s will have to know that on FreeBSD, they cannot/should not expect to see th= e =E2=80=9Cuser.=E2=80=9D namespace prefix, but they might see other namespa= ce prefixes. >=20 > I took a look at the way LTFS and Gluster work with respect to extended at= tributes with MacOS, since it seems that is how MacOS works, and it=E2=80=99= s less obvious to me what is going on with Gluster. They=E2=80=99ve got thi= s function: >=20 > #ifdef GF_DARWIN_HOST_OS > static int > set_xattr_user_namespace_mode (struct posix_private *priv, const char *str= ) > { > if (strcmp (str, "none") =3D=3D 0) > priv->xattr_user_namespace =3D XATTR_NONE; > else if (strcmp (str, "strip") =3D=3D 0) > priv->xattr_user_namespace =3D XATTR_STRIP; > else if (strcmp (str, "append") =3D=3D 0) > priv->xattr_user_namespace =3D XATTR_APPEND; > else if (strcmp (str, "both") =3D=3D 0) > priv->xattr_user_namespace =3D XATTR_BOTH; > else > return -1; > return 0; > } > #endif =20 >=20 > Although it=E2=80=99s not clear that they do anything with values other th= an XATTR_STRIP.=20 >=20 > With LTFS, since they either assume a =E2=80=9Cuser.=E2=80=9D prefix on Li= nux, or no prefix on Windows and MacOS X, it=E2=80=99s more straightforward.= >=20 > Ken >=20 >=20 >>=20 >> Robert >>=20 >>> On 7 Mar 2016, at 07:16, Julian Elischer wrote: >>>=20 >>> On 5/03/2016 7:06 PM, Rick Macklem wrote: >>>> Ken Merry wrote: >>>>> I have patches for FreeBSD=E2=80=99s FUSE filesystem kernel module to s= upport >>>>> extended attributes: >>> oh showing off your masochistic side eh? >>>=20 >>>>> https://people.freebsd.org/~ken/fuse_extattr.20160229.1.txt >>> I spent an hour beating my head against fuse yesterday. >>> then realised that it's an old version on our product. We really have to= get off 8.0 >>> (hopefully a matter of weeks now to a 10.x switch) >>> Now all I need is to find a FreeBSD filesystem expert (ZFS/NFS/CIFS/GFS= ) to hire. >>>=20 >>>=20 >>>> The only bit of code I have that might be useful for this patch is: >>>> case FUSE_GETXATTR: >>>> case FUSE_LISTXATTR: >>>> ! /* >>>> ! * These can have varying response lengths, and 0 length >>>> ! * isn't necessarily invalid. >>>> ! */ >>>> ! err =3D 0; >>>> *** I came up with this: >>>> fgin =3D (struct fuse_getxattr_in *) >>>> ((char *)ftick->tk_ms_fiov.base + >>>> sizeof(struct fuse_in_header)); >>>> if (fgin->size =3D=3D 0) >>>> err =3D (blen =3D=3D sizeof(struct fuse_getxattr_out)) ? 0 := >>>> EINVAL; >>>> else >>>> err =3D (blen <=3D fgin->size) ? 0 : EINVAL; >>>> break; >>>> I think I got the size check right? >>>>=20 >>>> The big question is... >>>> What to do with the NAMESPACE? >>>> - My code fails for SYSTEM and does USER without prepending "user.". >>>> (That seemed to be what rwatson@ felt was reasonable. I thought our >>>> discussion was on a mailing list, but I can't find it.) >>>> I've cc'd him. Maybe he can comment again. >>> Is there a standard for extended attributes I should knwo about? >>> It seems to me that it's a bit like the wild west. >>> Extended attributes seem to be "every OS for himself". >>>=20 >>>>=20 >>>> - If you stick with prepending "user." or "system." there needs to be >>>> some way to bypass this so that attributes that don't start in "user." >>>> or "system." can be accessed. I've seen "trusted." and "glusterfs." >>>> on GlusterFS. >>>> --> Maybe a new namespace called something like "nil" that just bypasse= s >>>> any USER or SYSTEM checks? >>>>=20 >>>> rick >>>>=20 >>>>> The patch implements the get/set/delete/list extended attribute method= s. The >>>>> listing code also converts extended attribute lists from the Linux/FUS= E >>>>> format to the FreeBSD format. For example: >>>>>=20 >>>>> # touch foo >>>>> # ls -la foo >>>>> -rwxrwxrwx 1 root wheel 0 Feb 29 21:40 foo >>>>> # lsextattr user foo >>>>> foo >>>>> # setextattr user testattr1 "12345678" foo >>>>> # lsextattr user foo >>>>> foo testattr1 >>>>> # getextattr user testattr1 foo >>>>> foo 12345678 >>>>> # setextattr user testattr2 "87654321" foo >>>>> # lsextattr user foo >>>>> foo testattr2 testattr1 >>>>> # rmextattr user testattr1 foo >>>>> # lsextattr user foo >>>>> foo testattr2 >>>>> # getextattr user testattr1 foo >>>>> getextattr: foo: failed: Attribute not found >>>>> # getextattr user testattr2 foo >>>>> foo 87654321 >>>>>=20 >>>>>=20 >>>>> Just to be clear on what this does, it only provides extended attribut= e >>>>> support to FreeBSD applications if the underlying FUSE filesystem impl= ements >>>>> FUSE extended attribute support. Many FUSE filesystems don=E2=80=99t s= upport the >>>>> extended attribute VFS operations. >>>>>=20 >>>>> I have tested this out on IBM=E2=80=99s LTFS implementation, but I hav= e not yet found >>>>> another FUSE filesystem that supports extended attributes. If anyone k= nows >>>>> of one, please let me know so I can try it out. (I looked through a n= umber >>>>> of the filesystems in sysutils/fusefs* in the ports tree.) >>>>>=20 >>>>> Any feedback is welcome. I=E2=80=99m planning to check this into Free= BSD/head in the >>>>> next week or so. >>>>>=20 >>>>> Obviously, I=E2=80=99ve also ported IBM=E2=80=99s LTFS implementation t= o FreeBSD. It works >>>>> in the standard FUSE mode, and you can also link it into an applicatio= n as a >>>>> library if you don=E2=80=99t want to incur the overhead of running thr= ough FUSE. I >>>>> haven=E2=80=99t gotten around to packaging it up to go out for testing= / review. >>>>>=20 >>>>> If anyone has IBM LTO-5 or newer tape drives, or IBM TS1140 or newer t= ape >>>>> drives, and wants to try it out, let me know. I=E2=80=99ll send you t= he code when >>>>> I=E2=80=99ve got it at least somewhat ready. This is IBM-specific, an= d won=E2=80=99t work >>>>> on HP tape drives. >>>>>=20 >>>>> Ken >>>>> =E2=80=94 >>>>> Ken Merry >>>>> ken@FreeBSD.ORG >>>>>=20 >>>>>=20 >>>>>=20 >>>>> _______________________________________________ >>>>> freebsd-fs@freebsd.org mailing list >>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>>> _______________________________________________ >>>> freebsd-fs@freebsd.org mailing list >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >=20 >=20 >=20 > =E2=80=94=20 > Ken Merry > ken@FreeBSD.ORG >=20 >=20 >=20 From owner-freebsd-fs@freebsd.org Tue Mar 8 13:07:31 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4EDC7AC785F for ; Tue, 8 Mar 2016 13:07:31 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3F8B1C4F for ; Tue, 8 Mar 2016 13:07:31 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u28D7Uwm045407 for ; Tue, 8 Mar 2016 13:07:31 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 198457] zfs acl lost after zfs send-receive. Kernel panic Date: Tue, 08 Mar 2016 13:07:30 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.1-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: fxguidet@factorfx.com X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Mar 2016 13:07:31 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D198457 fx guidet changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |fxguidet@factorfx.com --- Comment #6 from fx guidet --- Hi there , any update on this bug ? we do have the same issue on FreeBSD 10= .1 , no kernel panic but some ACL are loose. To get access to the file again we need to do a mv and then reconfigure ACL on the file. --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Tue Mar 8 14:01:36 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B1436AC7EA9; Tue, 8 Mar 2016 14:01:36 +0000 (UTC) (envelope-from paul@gromit.dlib.vt.edu) Received: from gromit.dlib.vt.edu (gromit.dlib.vt.edu [128.173.126.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gromit.dlib.vt.edu", Issuer "Chumby Certificate Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 87947224; Tue, 8 Mar 2016 14:01:36 +0000 (UTC) (envelope-from paul@gromit.dlib.vt.edu) Received: from pmather.lib.vt.edu (pmather.lib.vt.edu [128.173.126.193]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by gromit.dlib.vt.edu (Postfix) with ESMTPSA id 45604E07; Tue, 8 Mar 2016 09:01:29 -0500 (EST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\)) Subject: Re: Unstable NFS on recent CURRENT From: Paul Mather In-Reply-To: <1482595660.8940439.1457405756110.JavaMail.zimbra@uoguelph.ca> Date: Tue, 8 Mar 2016 09:01:29 -0500 Cc: Ronald Klop , freebsd-fs@freebsd.org, freebsd-arm@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <08710728-3130-49BE-8BD7-AFE85A31C633@gromit.dlib.vt.edu> References: <3DAB3639-8FB8-43D3-9517-94D46EDEC19E@gromit.dlib.vt.edu> <1482595660.8940439.1457405756110.JavaMail.zimbra@uoguelph.ca> To: Rick Macklem X-Mailer: Apple Mail (2.3112) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Mar 2016 14:01:36 -0000 On Mar 7, 2016, at 9:55 PM, Rick Macklem wrote: > Paul Mather (forwarded by Ronald Klop) wrote: >> On Sun, 06 Mar 2016 02:57:03 +0100, Paul Mather = >> wrote: >>=20 >>> On my BeagleBone Black running 11-CURRENT (r296162) lately I have = been >>> having trouble with NFS. I have been doing a buildworld and = buildkernel >>> with /usr/src and /usr/obj mounted via NFS. Recently, this process = has >>> resulted in the buildworld failing at some point, with a variety of >>> errors (Segmentation fault; Permission denied; etc.). Even a "ls = -alR" >>> of /usr/src doesn't manage to complete. It errors out thus: >>>=20 >>> =3D=3D=3D=3D=3D >>> [[...]] >>> total 0 >>> ls: ./.svn/pristine/fe: Permission denied >>>=20 >>> ./.svn/pristine/ff: >>> total 0 >>> ls: ./.svn/pristine/ff: Permission denied >>> ls: fts_read: Permission denied >>> =3D=3D=3D=3D=3D >>>=20 >>> On the console, I get the following: >>>=20 >>> newnfs: server 'chumby.chumby.lan' error: fileid changed. fsid >>> 94790777:a4385de: expected fileid 0x4, got 0x2. (BROKEN NFS SERVER = OR >>> MIDDLEWARE) >>>=20 >>>=20 >>> I am using a FreeBSD/amd64 10.3-PRERELEASE (r296412) as the NFS = server. >>> On the BeagleBone Black, I am mounting /usr/src and /usr/obj via >>> /etc/fstab as follows: >>>=20 >>> chumby.chumby.lan:/build/src/head /usr/src nfs rw,nfsv4 0 0 >>> chumby.chumby.lan:/build/obj/bbb /usr/obj nfs rw,nfsv4 0 0 >>>=20 >>>=20 >>> /build/src/head and /build/obj/bbb are both ZFS file systems. >>>=20 > Is it possible that a ZFS file system has gotten to the point where = the > i-node# exceeds 32bits? ZFS does support more than 32bits for = i-node#s, > but FreeBSD does not (it truncates to the low order 32bits). > I know diddly about ZFS, so I don't know if you actually have to = create > more than 4billion files to get the i-node# to exceed 32bits or ??? >=20 > There has been work done on making ino_t 64bits, but it hasn't made it > into FreeBSD-current and I have no idea when it might. >=20 > If you could try a build on newly created file systems (or UFS ones > instead of ZFS), that would tell you if the above might be the = problem. I don't think I have that big of a ZFS pool (it's 2 TB). :-) It doesn't seem that there are excessive numbers of inodes, and the = counts match up between the NFS client and server sides. In the information below, chumby is the NFS server and beaglebone the = client: pmather@beaglebone:~ % mount /dev/mmcsd0s2a on / (ufs, local, noatime, soft-updates) devfs on /dev (devfs, local) /dev/mmcsd0s1 on /boot/msdos (msdosfs, local, noatime) tmpfs on /tmp (tmpfs, local) tmpfs on /var/log (tmpfs, local) tmpfs on /var/tmp (tmpfs, local) chumby.chumby.lan:/build/src/head on /usr/src (nfs, nfsv4acls) chumby.chumby.lan:/build/obj/bbb on /usr/obj (nfs, nfsv4acls) pmather@beaglebone:~ % df -i /usr/src /usr/obj Filesystem 1K-blocks Used Avail Capacity = iused ifree %iused Mounted on chumby.chumby.lan:/build/src/head 2097152 1344484 752668 64% = 147835 1505336 9% /usr/src chumby.chumby.lan:/build/obj/bbb 530875884 1949364 528926520 0% = 70814 1057853040 0% /usr/obj paul@chumby:/home/paul> df -i /build/src/head /build/obj/bbb Filesystem 1K-blocks Used Avail Capacity iused = ifree %iused Mounted on zroot/SHARED/build/src/head 2097152 1344484 752668 64% 147835 = 1505336 9% /build/src/head zroot/SHARED/build/obj/bbb 530876268 1949364 528926904 0% 70814 = 1057853808 0% /build/obj/bbb On the NFS client system, these are the only NFS-related settings I have = in /etc/rc.conf: nfsuserd_enable=3D"YES" nfscbd_enable=3D"YES" Would you recommend I try it with nfscbd_enable=3D"NO"? I will try NFS from other clients to see whether it's just this = FreeBSD/arm system that's having problems. Cheers, Paul. >=20 > rick >=20 >>> Has anyone else encountered this? It has only started happening >>> recently for me, it seems. Prior to this, I have been able to do a >>> buildworld and buildkernel successfully over NFS. >>>=20 >>> Cheers, >>>=20 >>> Paul. >>=20 >> I cc this to freebsd-fs for you. >>=20 >> Ronald. >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>=20 > _______________________________________________ > freebsd-arm@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-arm > To unsubscribe, send any mail to "freebsd-arm-unsubscribe@freebsd.org" From owner-freebsd-fs@freebsd.org Wed Mar 9 00:21:05 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 09646AC8CBB; Wed, 9 Mar 2016 00:21:05 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 9EE0630E; Wed, 9 Mar 2016 00:21:04 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) IronPort-PHdr: 9a23:PMRkmxKWSE7wICny8tmcpTZWNBhigK39O0sv0rFitYgVKfXxwZ3uMQTl6Ol3ixeRBMOAu60C27ed6fmocFdDyKjCmUhKSIZLWR4BhJdetC0bK+nBN3fGKuX3ZTcxBsVIWQwt1Xi6NU9IBJS2PAWK8TWM5DIfUi/yKRBybrysXNWC0ILniqvootX6WEZhunmUWftKNhK4rAHc5IE9oLBJDeIP8CbPuWZCYO9MxGlldhq5lhf44dqsrtY4q3wD89pozcNLUL37cqIkVvQYSW1+ayFmrPDtrgTJGAuT+mMHACJRlhtTHxOD4gv3U53qvm39rOU63SCbOcj/S/cwWC++7qFlT1jmkioKPSU1tWjNj59Mi/djqQ+l7zl2347ZesnBLPNjeovSZ9QfRHYHUsJQXWpfHsWxY5ZZXMQbOuMNlYj2pBMrpBC9AQSpTLf1zzZDhXv72IUn1Os8HAXe3EorFoRd4zzvsNzpOfJKAqiOx67SwGCGNqsO1A== X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DQAQARa99W/61jaINcDoN+bQa6WAENgWkXCoUkSgKBfxQBAQEBAQEBAWMngi2CFAEBAQMBAQEBICsgCwULAgEIGAICDRkCAicBCSYCBAgHBAEcBId7CA6vT48oAQEBAQEBAQMBAQEBAQEBARQEe4UcgXuCR4QbAQEFFoMCgToFh1aGTj2ISYVjgnCCMpFkjlQCHgEBQoIDGYENWR4uAQaIRjR+AQEB X-IronPort-AV: E=Sophos;i="5.22,558,1449550800"; d="scan'208";a="269923465" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 08 Mar 2016 19:21:02 -0500 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 50CD315F56D; Tue, 8 Mar 2016 19:21:02 -0500 (EST) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id KJZ75w_GFmJ1; Tue, 8 Mar 2016 19:21:01 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 114FA15F571; Tue, 8 Mar 2016 19:21:01 -0500 (EST) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id sfvBTfNRv_Xb; Tue, 8 Mar 2016 19:21:00 -0500 (EST) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id E6BB715F56D; Tue, 8 Mar 2016 19:21:00 -0500 (EST) Date: Tue, 8 Mar 2016 19:21:00 -0500 (EST) From: Rick Macklem To: Paul Mather Cc: Ronald Klop , freebsd-fs@freebsd.org, freebsd-arm@freebsd.org Message-ID: <913974596.10122887.1457482860865.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <08710728-3130-49BE-8BD7-AFE85A31C633@gromit.dlib.vt.edu> References: <3DAB3639-8FB8-43D3-9517-94D46EDEC19E@gromit.dlib.vt.edu> <1482595660.8940439.1457405756110.JavaMail.zimbra@uoguelph.ca> <08710728-3130-49BE-8BD7-AFE85A31C633@gromit.dlib.vt.edu> Subject: Re: Unstable NFS on recent CURRENT MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.10] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF44 (Win)/8.0.9_GA_6191) Thread-Topic: Unstable NFS on recent CURRENT Thread-Index: wFhsfjvkf8M3TtVVuwbPJgeedqxBkg== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Mar 2016 00:21:05 -0000 Paul Mather wrote: > On Mar 7, 2016, at 9:55 PM, Rick Macklem wrote: > > > Paul Mather (forwarded by Ronald Klop) wrote: > >> On Sun, 06 Mar 2016 02:57:03 +0100, Paul Mather > >> wrote: > >> > >>> On my BeagleBone Black running 11-CURRENT (r296162) lately I have been > >>> having trouble with NFS. I have been doing a buildworld and buildkernel > >>> with /usr/src and /usr/obj mounted via NFS. Recently, this process has > >>> resulted in the buildworld failing at some point, with a variety of > >>> errors (Segmentation fault; Permission denied; etc.). Even a "ls -alR" > >>> of /usr/src doesn't manage to complete. It errors out thus: > >>> > >>> ===== > >>> [[...]] > >>> total 0 > >>> ls: ./.svn/pristine/fe: Permission denied > >>> > >>> ./.svn/pristine/ff: > >>> total 0 > >>> ls: ./.svn/pristine/ff: Permission denied > >>> ls: fts_read: Permission denied > >>> ===== > >>> > >>> On the console, I get the following: > >>> > >>> newnfs: server 'chumby.chumby.lan' error: fileid changed. fsid > >>> 94790777:a4385de: expected fileid 0x4, got 0x2. (BROKEN NFS SERVER OR > >>> MIDDLEWARE) > >>> I have no idea how the fileid (i-node# in old terminology) could change. I haven't heard anyone else reporting this, so I can't explain it (except for the exceeds 32bits case, which you note below isn't likely). > >>> > >>> I am using a FreeBSD/amd64 10.3-PRERELEASE (r296412) as the NFS server. > >>> On the BeagleBone Black, I am mounting /usr/src and /usr/obj via > >>> /etc/fstab as follows: > >>> > >>> chumby.chumby.lan:/build/src/head /usr/src nfs rw,nfsv4 0 0 > >>> chumby.chumby.lan:/build/obj/bbb /usr/obj nfs rw,nfsv4 0 0 > >>> > >>> > >>> /build/src/head and /build/obj/bbb are both ZFS file systems. > >>> > > Is it possible that a ZFS file system has gotten to the point where the > > i-node# exceeds 32bits? ZFS does support more than 32bits for i-node#s, > > but FreeBSD does not (it truncates to the low order 32bits). > > I know diddly about ZFS, so I don't know if you actually have to create > > more than 4billion files to get the i-node# to exceed 32bits or ??? > > > > There has been work done on making ino_t 64bits, but it hasn't made it > > into FreeBSD-current and I have no idea when it might. > > > > If you could try a build on newly created file systems (or UFS ones > > instead of ZFS), that would tell you if the above might be the problem. > > > I don't think I have that big of a ZFS pool (it's 2 TB). :-) > > It doesn't seem that there are excessive numbers of inodes, and the counts > match up between the NFS client and server sides. > > In the information below, chumby is the NFS server and beaglebone the client: > > pmather@beaglebone:~ % mount > /dev/mmcsd0s2a on / (ufs, local, noatime, soft-updates) > devfs on /dev (devfs, local) > /dev/mmcsd0s1 on /boot/msdos (msdosfs, local, noatime) > tmpfs on /tmp (tmpfs, local) > tmpfs on /var/log (tmpfs, local) > tmpfs on /var/tmp (tmpfs, local) > chumby.chumby.lan:/build/src/head on /usr/src (nfs, nfsv4acls) > chumby.chumby.lan:/build/obj/bbb on /usr/obj (nfs, nfsv4acls) > pmather@beaglebone:~ % df -i /usr/src /usr/obj > Filesystem 1K-blocks Used Avail Capacity iused > ifree %iused Mounted on > chumby.chumby.lan:/build/src/head 2097152 1344484 752668 64% 147835 > 1505336 9% /usr/src > chumby.chumby.lan:/build/obj/bbb 530875884 1949364 528926520 0% 70814 > 1057853040 0% /usr/obj > > > paul@chumby:/home/paul> df -i /build/src/head /build/obj/bbb > Filesystem 1K-blocks Used Avail Capacity iused > ifree %iused Mounted on > zroot/SHARED/build/src/head 2097152 1344484 752668 64% 147835 > 1505336 9% /build/src/head > zroot/SHARED/build/obj/bbb 530876268 1949364 528926904 0% 70814 > 1057853808 0% /build/obj/bbb > > > On the NFS client system, these are the only NFS-related settings I have in > /etc/rc.conf: > > nfsuserd_enable="YES" > nfscbd_enable="YES" > > > Would you recommend I try it with nfscbd_enable="NO"? > Unless you have enabled delegations, nfscbd isn't needed, so you can try this. However, I doubt it will make any difference. (Callbacks are only used for Delegations for NFSv4.0. As such, it doesn't matter if they are working unless delegations are enabled.) > I will try NFS from other clients to see whether it's just this FreeBSD/arm > system that's having problems. > There probably are few people using NFSv4 on arm (you may be the only one), so I wouldn't be surprised if there is some size/alignment bug in the client that arm finds. Trying non-arm clients may certainly be useful. You could also try an NFSv3 mount, since I would think others are using NFSv3 on arm. Good luck with it, rick > Cheers, > > Paul. > > > > > > rick > > > >>> Has anyone else encountered this? It has only started happening > >>> recently for me, it seems. Prior to this, I have been able to do a > >>> buildworld and buildkernel successfully over NFS. > >>> > >>> Cheers, > >>> > >>> Paul. > >> > >> I cc this to freebsd-fs for you. > >> > >> Ronald. > >> _______________________________________________ > >> freebsd-fs@freebsd.org mailing list > >> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > >> > > _______________________________________________ > > freebsd-arm@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-arm > > To unsubscribe, send any mail to "freebsd-arm-unsubscribe@freebsd.org" > > From owner-freebsd-fs@freebsd.org Wed Mar 9 00:27:07 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DF0CEAC8FB8 for ; Wed, 9 Mar 2016 00:27:07 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id BCD9F935 for ; Wed, 9 Mar 2016 00:27:07 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: by mailman.ysv.freebsd.org (Postfix) id B42C6AC8FB5; Wed, 9 Mar 2016 00:27:07 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B3900AC8FB3; Wed, 9 Mar 2016 00:27:07 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id E1C0892F; Wed, 9 Mar 2016 00:27:06 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) IronPort-PHdr: 9a23:bBAKzxzyCsua54jXCy+O+j09IxM/srCxBDY+r6Qd0O4SIJqq85mqBkHD//Il1AaPBtWEraIZwLCP+4nbGkU+or+5+EgYd5JNUxJXwe43pCcHRPC/NEvgMfTxZDY7FskRHHVs/nW8LFQHUJ2mPw6anHS+4HYoFwnlMkItf6KuStGU35n8jbn60qaQSjsLrQL1Wal1IhSyoFeZnegtqqwmFJwMzADUqGBDYeVcyDAgD1uSmxHh+pX4p8Y7oGx48sgs/M9YUKj8Y79wDfkBVGxnYCgI4tb2v0zDUReX/SlbFWEXiQZTRQbf4RzwRZu3tTH18e902S2fNMuxSbEvRTWk4aAsRgXlhS0cO3s36zLrjZk6tqVRrQi97zo5i6uSKL6cKOF5eOmVKckFTHZaWcB5WTZMD4mnY80IFeVXbshCqIyonVoFrlObDAKvAO7qgmtSg3b93qk31sw8Fg7b0Qg4H5QFuSKH/53OKK4OXLXtn+HzxjLZYqYTgG+l5Q== X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DQAQAqbd9W/61jaINcFoN2bQa6WAENgWkXCoUkSgKBfxQBAQEBAQEBAWMngi2CFAEBAQMBAQEBIAQnHQMLBQsCAQgYAgINGQICJwEJGAENAgQIBwQBHAICh3sIDq9QjykBAQEBAQEEAQEBAQEBGnuFHIF7gUl+hAEaAQEbgko4E4EnBYdYhVh0PYhJhWOCcIIyhE1Lg3mDJYUujlQCHgEBQoQCHi4BAQEEiEY0fgEBAQ X-IronPort-AV: E=Sophos;i="5.22,558,1449550800"; d="scan'208";a="271613761" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-annu.net.uoguelph.ca with ESMTP; 08 Mar 2016 19:26:59 -0500 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 12D9015F56D; Tue, 8 Mar 2016 19:26:59 -0500 (EST) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id jn4MzuIk7hlA; Tue, 8 Mar 2016 19:26:57 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 5E91215F571; Tue, 8 Mar 2016 19:26:57 -0500 (EST) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id zZm846OeCGHn; Tue, 8 Mar 2016 19:26:57 -0500 (EST) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 3976515F56D; Tue, 8 Mar 2016 19:26:57 -0500 (EST) Date: Tue, 8 Mar 2016 19:26:57 -0500 (EST) From: Rick Macklem To: "Robert N. M. Watson" Cc: Ken Merry , Robert Watson , Julian Elischer , fs@freebsd.org, scsi@freebsd.org Message-ID: <2091108840.10124858.1457483217137.JavaMail.zimbra@uoguelph.ca> In-Reply-To: References: <800018199.6694281.1457233600357.JavaMail.zimbra@uoguelph.ca> <56DD2AB6.1030407@freebsd.org> <6AF0FC23-CC34-43EA-A008-9FB82FB21558@FreeBSD.org> Subject: Re: FUSE extended attribute patches available MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [172.17.95.10] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF44 (Win)/8.0.9_GA_6191) Thread-Topic: FUSE extended attribute patches available Thread-Index: 8OfjVNO8yB/8BPMtb/zoCjB/754qNA== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Mar 2016 00:27:08 -0000 Robert N. M. Watson wrote: > Just a quick observation: to avoid application change, you could actually > leave the 'user.' on the front of the strings? It's not harmful, it just > doesn't serve the same function. This might keep documentation more in sy= nc, > etc. >=20 Btw, this internet draft was just published. There is work in progress w.r.= t. NFS support for Linux style extended attributes and this draft might have s= omething to say w.r.t. namepsace? (I haven't looked at it.) draft-ietf-nfsv4-xattrs-02.txt Available via anonymous ftp at ftp.ietf.org (I find it amusing that the eng= ineers of the internet still use anonymous ftp;-). rick > Sent from my iPhone >=20 > > On 7 Mar 2016, at 22:28, Ken Merry wrote: > >=20 > >=20 > >=20 > >> On Mar 7, 2016, at 2:59 AM, Robert Watson wrote: > >>=20 > >> FreeBSD and Linux=E2=80=99s extended-attribute models were inherited f= rom IRIX, as > >> they were introduced to solve the same problems: a place to metadata s= uch > >> as ACLs, MAC labels, capability masks, etc. IRIX had three namespaces: > >> one each for =E2=80=9Cuser=E2=80=9D, =E2=80=9Croot=E2=80=9D, and =E2= =80=9Csecure=E2=80=9D, reflecting whether or not they > >> were managed by the file owner (or permissions), the privileged root > >> user, or part of the TCB protection mechanism (e.g., for integrity > >> labels). > >>=20 > >> These extended attributes should not be confused with the filesystem > >> feature of the same name in NFSv4, which is sometimes known by the nam= e > >> =E2=80=9Cfile fork=E2=80=9D or =E2=80=9Cdata streams=E2=80=9D. EAs in = IRIX/FreeBSD/Linux/HPFS/etc are > >> tuple pairs of names and values intended to be written atomically or > >> updated in place specifically for (shortish) metadata such as ACLs, > >> rather than being complete separate data spaces for I/O (e.g., that co= uld > >> be memory mapped). > >=20 > > It would be nice to have NFSv4 / Solaris style alternate data streams. = ZFS > > handles them already, but I suppose it would take more work to support > > them in UFS. > >=20 > >> In FreeBSD=E2=80=99s design, we incorporated the disjoint namespace mo= del, > >> providing USER and SYSTEM, the former being managed by the file owner > >> (and those given suitable permission), and the latter being used for T= CB > >> mechanisms such as the implementations of MAC labels, ACLs, etc. > >>=20 > >> In Linux, they adopted a more free-form mechanism based on a single > >> combined namespace with a prefix =E2=80=94 e.g., user.FOO, and system.= BAR. Over > >> time it looks like that namespace has been expanded in various > >> filesystem-specific ways. We also have room to expand our namespace, b= ut > >> from the description below, it=E2=80=99s not clear quite what the righ= t mechanism > >> is. > >>=20 > >> One path would be to introduce a new namespace for filesystem-specific > >> attributes =E2=80=94 e.g., EXTATTR_NAMESPACE_FS? > >>=20 > >> But I think the key question here is whether the existing namespaces c= an > >> provide the semantics you need. If not, then we likely need a new > >> namespace. But then we get the question as to who controls use of the > >> namespace. Certainly =E2=80=9Cthe filesystem=E2=80=9D is one option, b= ut then you will > >> get inconsistency in approaches between filesystems and applications = =E2=80=94 > >> across various dimensions including protection (who can read/modify > >> them?), allocation (who decides what names should be used for what?), = and > >> semantics (what applications can use them, and who backs them up?). > >>=20 > >> For example: who should be responsible for backing up those attributes= ? > >> For =E2=80=98system=E2=80=99 attributes in FreeBSD, it is assumed that= backup tools will > >> be aware of the services layered over the attributes =E2=80=94 e.g., t= hat they > >> will back up ACLs using the ACL API, rather than backing up the binary > >> EAs holding the ACLs. For =E2=80=98user=E2=80=99 attributes, it is ass= umed that backup > >> tools (e.g., tar) must explicitly preserve them, since they are > >> user-defined and user-managed. For filesystem-specific attributes, som= e > >> other choice will need to be made =E2=80=94 perhaps filesystem-specifi= c backup > >> tools need to know about them? > >>=20 > >> Note that in the Linux EA model, ACLs are actually accessed via the EA > >> system calls, whereas in FreeBSD, ACLs are a first-class citizen in th= e > >> system-call API/ABI, and so user applications don=E2=80=99t treat them= as EAs. We > >> made that choice as filesystems may choose themselves not to represent > >> ACLs as EAs, and they have real semantics visible to the VFS layer. In > >> Linux, I believe they chose to pass them via EAs to narrow the > >> system-call interface for filesystem metadata. Both are legitimate > >> choices, but this could also trigger discussions about whether new > >> attributes are best accessed via the EA interface, or new system calls= . > >> For filesystem-specific attributes, EAs are likely the better way to g= o. > >=20 > > It may be that for at least the purposes of FUSE, we can adequately liv= e > > under the USER namespace. That would allow for arbitrary namespaces th= at > > Linux-centric filesystems create without significant churn in FreeBSD t= o > > support it. > >=20 > > And of course this is only for the front/top end of a FUSE filesystem. > > What the filesystem actually does with the extended attributes that the > > user sets on top is another question altogether. In the case of IBM=E2= =80=99s > > LTFS, it stores extended attributes (without the =E2=80=9Cuser.=E2=80= =9D prefix) in the > > LTFS index, which is an XML file that resides on tape. For other > > filesystems, the answer could also vary significantly. A few that I > > examined in sysutils/fusefs* used extended attributes on the backend > > (usually on a backing filesystem) under Linux only, but not on the fron= t > > (user facing) end. > >=20 > > In order to make arbitrary namespaces in FUSE work in FreeBSD under the > > user namespace, we=E2=80=99ll have to do what Rick was talking about an= d just not > > include the namespace as a prefix when we get/set attributes. This wil= l > > allow using any sort of namespace or attribute name that the FUSE > > filesystem wants to use. > >=20 > > The impact of this, from a porting standpoint, is that the FUSE filesys= tems > > will have to know that on FreeBSD, they cannot/should not expect to see > > the =E2=80=9Cuser.=E2=80=9D namespace prefix, but they might see other = namespace prefixes. > >=20 > > I took a look at the way LTFS and Gluster work with respect to extended > > attributes with MacOS, since it seems that is how MacOS works, and it= =E2=80=99s > > less obvious to me what is going on with Gluster. They=E2=80=99ve got = this > > function: > >=20 > > #ifdef GF_DARWIN_HOST_OS > > static int > > set_xattr_user_namespace_mode (struct posix_private *priv, const char *= str) > > { > > if (strcmp (str, "none") =3D=3D 0) > > priv->xattr_user_namespace =3D XATTR_NONE; > > else if (strcmp (str, "strip") =3D=3D 0) > > priv->xattr_user_namespace =3D XATTR_STRIP; > > else if (strcmp (str, "append") =3D=3D 0) > > priv->xattr_user_namespace =3D XATTR_APPEND; > > else if (strcmp (str, "both") =3D=3D 0) > > priv->xattr_user_namespace =3D XATTR_BOTH; > > else > > return -1; > > return 0; > > } > > #endif > >=20 > > Although it=E2=80=99s not clear that they do anything with values other= than > > XATTR_STRIP. > >=20 > > With LTFS, since they either assume a =E2=80=9Cuser.=E2=80=9D prefix on= Linux, or no prefix > > on Windows and MacOS X, it=E2=80=99s more straightforward. > >=20 > > Ken > >=20 > >=20 > >>=20 > >> Robert > >>=20 > >>> On 7 Mar 2016, at 07:16, Julian Elischer wrote: > >>>=20 > >>> On 5/03/2016 7:06 PM, Rick Macklem wrote: > >>>> Ken Merry wrote: > >>>>> I have patches for FreeBSD=E2=80=99s FUSE filesystem kernel module = to support > >>>>> extended attributes: > >>> oh showing off your masochistic side eh? > >>>=20 > >>>>> https://people.freebsd.org/~ken/fuse_extattr.20160229.1.txt > >>> I spent an hour beating my head against fuse yesterday. > >>> then realised that it's an old version on our product. We really have= to > >>> get off 8.0 > >>> (hopefully a matter of weeks now to a 10.x switch) > >>> Now all I need is to find a FreeBSD filesystem expert (ZFS/NFS/CIFS/= GFS) > >>> to hire. > >>>=20 > >>>=20 > >>>> The only bit of code I have that might be useful for this patch is: > >>>> case FUSE_GETXATTR: > >>>> case FUSE_LISTXATTR: > >>>> ! /* > >>>> ! * These can have varying response lengths, and 0 length > >>>> ! * isn't necessarily invalid. > >>>> ! */ > >>>> ! err =3D 0; > >>>> *** I came up with this: > >>>> fgin =3D (struct fuse_getxattr_in *) > >>>> ((char *)ftick->tk_ms_fiov.base + > >>>> sizeof(struct fuse_in_header)); > >>>> if (fgin->size =3D=3D 0) > >>>> err =3D (blen =3D=3D sizeof(struct fuse_getxattr_out)) ? = 0 : > >>>> EINVAL; > >>>> else > >>>> err =3D (blen <=3D fgin->size) ? 0 : EINVAL; > >>>> break; > >>>> I think I got the size check right? > >>>>=20 > >>>> The big question is... > >>>> What to do with the NAMESPACE? > >>>> - My code fails for SYSTEM and does USER without prepending "user.". > >>>> (That seemed to be what rwatson@ felt was reasonable. I thought our > >>>> discussion was on a mailing list, but I can't find it.) > >>>> I've cc'd him. Maybe he can comment again. > >>> Is there a standard for extended attributes I should knwo about? > >>> It seems to me that it's a bit like the wild west. > >>> Extended attributes seem to be "every OS for himself". > >>>=20 > >>>>=20 > >>>> - If you stick with prepending "user." or "system." there needs to b= e > >>>> some way to bypass this so that attributes that don't start in "user= ." > >>>> or "system." can be accessed. I've seen "trusted." and "glusterfs." > >>>> on GlusterFS. > >>>> --> Maybe a new namespace called something like "nil" that just bypa= sses > >>>> any USER or SYSTEM checks? > >>>>=20 > >>>> rick > >>>>=20 > >>>>> The patch implements the get/set/delete/list extended attribute > >>>>> methods. The > >>>>> listing code also converts extended attribute lists from the Linux/= FUSE > >>>>> format to the FreeBSD format. For example: > >>>>>=20 > >>>>> # touch foo > >>>>> # ls -la foo > >>>>> -rwxrwxrwx 1 root wheel 0 Feb 29 21:40 foo > >>>>> # lsextattr user foo > >>>>> foo > >>>>> # setextattr user testattr1 "12345678" foo > >>>>> # lsextattr user foo > >>>>> foo testattr1 > >>>>> # getextattr user testattr1 foo > >>>>> foo 12345678 > >>>>> # setextattr user testattr2 "87654321" foo > >>>>> # lsextattr user foo > >>>>> foo testattr2 testattr1 > >>>>> # rmextattr user testattr1 foo > >>>>> # lsextattr user foo > >>>>> foo testattr2 > >>>>> # getextattr user testattr1 foo > >>>>> getextattr: foo: failed: Attribute not found > >>>>> # getextattr user testattr2 foo > >>>>> foo 87654321 > >>>>>=20 > >>>>>=20 > >>>>> Just to be clear on what this does, it only provides extended attri= bute > >>>>> support to FreeBSD applications if the underlying FUSE filesystem > >>>>> implements > >>>>> FUSE extended attribute support. Many FUSE filesystems don=E2=80= =99t support > >>>>> the > >>>>> extended attribute VFS operations. > >>>>>=20 > >>>>> I have tested this out on IBM=E2=80=99s LTFS implementation, but I = have not yet > >>>>> found > >>>>> another FUSE filesystem that supports extended attributes. If anyo= ne > >>>>> knows > >>>>> of one, please let me know so I can try it out. (I looked through = a > >>>>> number > >>>>> of the filesystems in sysutils/fusefs* in the ports tree.) > >>>>>=20 > >>>>> Any feedback is welcome. I=E2=80=99m planning to check this into F= reeBSD/head > >>>>> in the > >>>>> next week or so. > >>>>>=20 > >>>>> Obviously, I=E2=80=99ve also ported IBM=E2=80=99s LTFS implementati= on to FreeBSD. It > >>>>> works > >>>>> in the standard FUSE mode, and you can also link it into an applica= tion > >>>>> as a > >>>>> library if you don=E2=80=99t want to incur the overhead of running = through > >>>>> FUSE. I > >>>>> haven=E2=80=99t gotten around to packaging it up to go out for test= ing / > >>>>> review. > >>>>>=20 > >>>>> If anyone has IBM LTO-5 or newer tape drives, or IBM TS1140 or newe= r > >>>>> tape > >>>>> drives, and wants to try it out, let me know. I=E2=80=99ll send yo= u the code > >>>>> when > >>>>> I=E2=80=99ve got it at least somewhat ready. This is IBM-specific,= and won=E2=80=99t > >>>>> work > >>>>> on HP tape drives. > >>>>>=20 > >>>>> Ken > >>>>> =E2=80=94 > >>>>> Ken Merry > >>>>> ken@FreeBSD.ORG > >>>>>=20 > >>>>>=20 > >>>>>=20 > >>>>> _______________________________________________ > >>>>> freebsd-fs@freebsd.org mailing list > >>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.or= g" > >>>> _______________________________________________ > >>>> freebsd-fs@freebsd.org mailing list > >>>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org= " > >=20 > >=20 > >=20 > > =E2=80=94 > > Ken Merry > > ken@FreeBSD.ORG > >=20 > >=20 > >=20 >=20 From owner-freebsd-fs@freebsd.org Wed Mar 9 00:49:33 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4A30CAC77E0; Wed, 9 Mar 2016 00:49:33 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id C58306AF; Wed, 9 Mar 2016 00:49:32 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) IronPort-PHdr: 9a23:C9qBfB2lXbfTlulqsmDT+DRfVm0co7zxezQtwd8ZsekVI/ad9pjvdHbS+e9qxAeQG96LtLQU1qGM7OjJYi8p39WoiDg6aptCVhsI2409vjcLJ4q7M3D9N+PgdCcgHc5PBxdP9nC/NlVJSo6lPwWB6kO74TNaIBjjLw09fr2zQd6NyZTqnLrts7ToICx2xxOFKYtoKxu3qQiD/uI3uqBFbpgL9x3Sv3FTcP5Xz247bXianhL7+9vitMU7q3cYk7sb+sVBSaT3ebgjBfwdVWx+cjN92cvwqBOWTReT/mBOFSISkwFUGE7L9hz3VIz99Czgua140SieOMTwCrQ1Qiij6alsDxHyhSoNLDJ8+XvS2fB32ZpSvRbpghVjw4POKNWNPed6VqzHetYbWSxNWsdbETJdRI6wct1cIfAGOLNiroL+734Hphi6CAzkUPnqwzRLgnLz9bA93PksFRnGmgcpSYFd+E/Ipcn4Yf9BGdu+y7PFmHCaN6tb X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DQAQAjct9W/61jaINcDoN+bQa6WAENgWkXCoUkSgKBfxQBAQEBAQEBAWMngi2CFQEBBAEBASAEJyALEAIBCBgCAg0ZAgInAQkmAgQIBwQBHASIAw6vW48lAQEBAQEBAQECAQEBAQEBAQEUBHuFHIF7gkeEGwEBBRaDAoE6BYdWhVp0PYhJhWOCcIIyhE2HaYUujlQCHgEBQoIDGYENWR4uAQaIRjR+AQEB X-IronPort-AV: E=Sophos;i="5.22,559,1449550800"; d="scan'208";a="269926196" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 08 Mar 2016 19:49:31 -0500 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 4710715F56D; Tue, 8 Mar 2016 19:49:31 -0500 (EST) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 0P5xFr62_XHf; Tue, 8 Mar 2016 19:49:30 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 9418815F571; Tue, 8 Mar 2016 19:49:30 -0500 (EST) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id b3VHMlovJOMe; Tue, 8 Mar 2016 19:49:30 -0500 (EST) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 7366715F56D; Tue, 8 Mar 2016 19:49:30 -0500 (EST) Date: Tue, 8 Mar 2016 19:49:30 -0500 (EST) From: Rick Macklem To: Paul Mather Cc: Ronald Klop , freebsd-fs@freebsd.org, freebsd-arm@freebsd.org Message-ID: <1290552239.10146172.1457484570450.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <08710728-3130-49BE-8BD7-AFE85A31C633@gromit.dlib.vt.edu> References: <3DAB3639-8FB8-43D3-9517-94D46EDEC19E@gromit.dlib.vt.edu> <1482595660.8940439.1457405756110.JavaMail.zimbra@uoguelph.ca> <08710728-3130-49BE-8BD7-AFE85A31C633@gromit.dlib.vt.edu> Subject: Re: Unstable NFS on recent CURRENT MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.12] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF44 (Win)/8.0.9_GA_6191) Thread-Topic: Unstable NFS on recent CURRENT Thread-Index: k8ThePUeTUqowV1bL9aF3N1wliHkwA== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Mar 2016 00:49:33 -0000 Paul Mather wrote: > On Mar 7, 2016, at 9:55 PM, Rick Macklem wrote: > > > Paul Mather (forwarded by Ronald Klop) wrote: > >> On Sun, 06 Mar 2016 02:57:03 +0100, Paul Mather > >> wrote: > >> > >>> On my BeagleBone Black running 11-CURRENT (r296162) lately I have been > >>> having trouble with NFS. I have been doing a buildworld and buildkernel > >>> with /usr/src and /usr/obj mounted via NFS. Recently, this process has > >>> resulted in the buildworld failing at some point, with a variety of > >>> errors (Segmentation fault; Permission denied; etc.). Even a "ls -alR" > >>> of /usr/src doesn't manage to complete. It errors out thus: > >>> > >>> ===== > >>> [[...]] > >>> total 0 > >>> ls: ./.svn/pristine/fe: Permission denied > >>> > >>> ./.svn/pristine/ff: > >>> total 0 > >>> ls: ./.svn/pristine/ff: Permission denied > >>> ls: fts_read: Permission denied > >>> ===== > >>> > >>> On the console, I get the following: > >>> > >>> newnfs: server 'chumby.chumby.lan' error: fileid changed. fsid > >>> 94790777:a4385de: expected fileid 0x4, got 0x2. (BROKEN NFS SERVER OR > >>> MIDDLEWARE) > >>> Oh, I had forgotten this. Here's the comment related to this error. (about line#445 in sys/fs/nfsclient/nfs_clport.c): 446 * BROKEN NFS SERVER OR MIDDLEWARE 447 * 448 * Certain NFS servers (certain old proprietary filers ca. 449 * 2006) or broken middleboxes (e.g. WAN accelerator products) 450 * will respond to GETATTR requests with results for a 451 * different fileid. 452 * 453 * The WAN accelerator we've observed not only serves stale 454 * cache results for a given file, it also occasionally serves 455 * results for wholly different files. This causes surprising 456 * problems; for example the cached size attribute of a file 457 * may truncate down and then back up, resulting in zero 458 * regions in file contents read by applications. We observed 459 * this reliably with Clang and .c files during parallel build. 460 * A pcap revealed packet fragmentation and GETATTR RPC 461 * responses with wholly wrong fileids. If you can connect the client->server with a simple switch (or just an RJ45 cable), it might be worth testing that way. (I don't recall the name of the middleware product, but I think it was shipped by one of the major switch vendors. I also don't know if the product supports NFSv4?) rick > >>> > >>> I am using a FreeBSD/amd64 10.3-PRERELEASE (r296412) as the NFS server. > >>> On the BeagleBone Black, I am mounting /usr/src and /usr/obj via > >>> /etc/fstab as follows: > >>> > >>> chumby.chumby.lan:/build/src/head /usr/src nfs rw,nfsv4 0 0 > >>> chumby.chumby.lan:/build/obj/bbb /usr/obj nfs rw,nfsv4 0 0 > >>> > >>> > >>> /build/src/head and /build/obj/bbb are both ZFS file systems. > >>> > > Is it possible that a ZFS file system has gotten to the point where the > > i-node# exceeds 32bits? ZFS does support more than 32bits for i-node#s, > > but FreeBSD does not (it truncates to the low order 32bits). > > I know diddly about ZFS, so I don't know if you actually have to create > > more than 4billion files to get the i-node# to exceed 32bits or ??? > > > > There has been work done on making ino_t 64bits, but it hasn't made it > > into FreeBSD-current and I have no idea when it might. > > > > If you could try a build on newly created file systems (or UFS ones > > instead of ZFS), that would tell you if the above might be the problem. > > > I don't think I have that big of a ZFS pool (it's 2 TB). :-) > > It doesn't seem that there are excessive numbers of inodes, and the counts > match up between the NFS client and server sides. > > In the information below, chumby is the NFS server and beaglebone the client: > > pmather@beaglebone:~ % mount > /dev/mmcsd0s2a on / (ufs, local, noatime, soft-updates) > devfs on /dev (devfs, local) > /dev/mmcsd0s1 on /boot/msdos (msdosfs, local, noatime) > tmpfs on /tmp (tmpfs, local) > tmpfs on /var/log (tmpfs, local) > tmpfs on /var/tmp (tmpfs, local) > chumby.chumby.lan:/build/src/head on /usr/src (nfs, nfsv4acls) > chumby.chumby.lan:/build/obj/bbb on /usr/obj (nfs, nfsv4acls) > pmather@beaglebone:~ % df -i /usr/src /usr/obj > Filesystem 1K-blocks Used Avail Capacity iused > ifree %iused Mounted on > chumby.chumby.lan:/build/src/head 2097152 1344484 752668 64% 147835 > 1505336 9% /usr/src > chumby.chumby.lan:/build/obj/bbb 530875884 1949364 528926520 0% 70814 > 1057853040 0% /usr/obj > > > paul@chumby:/home/paul> df -i /build/src/head /build/obj/bbb > Filesystem 1K-blocks Used Avail Capacity iused > ifree %iused Mounted on > zroot/SHARED/build/src/head 2097152 1344484 752668 64% 147835 > 1505336 9% /build/src/head > zroot/SHARED/build/obj/bbb 530876268 1949364 528926904 0% 70814 > 1057853808 0% /build/obj/bbb > > > On the NFS client system, these are the only NFS-related settings I have in > /etc/rc.conf: > > nfsuserd_enable="YES" > nfscbd_enable="YES" > > > Would you recommend I try it with nfscbd_enable="NO"? > > I will try NFS from other clients to see whether it's just this FreeBSD/arm > system that's having problems. > > Cheers, > > Paul. > > > > > > rick > > > >>> Has anyone else encountered this? It has only started happening > >>> recently for me, it seems. Prior to this, I have been able to do a > >>> buildworld and buildkernel successfully over NFS. > >>> > >>> Cheers, > >>> > >>> Paul. > >> > >> I cc this to freebsd-fs for you. > >> > >> Ronald. > >> _______________________________________________ > >> freebsd-fs@freebsd.org mailing list > >> https://lists.freebsd.org/mailman/listinfo/freebsd-fs > >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > >> > > _______________________________________________ > > freebsd-arm@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-arm > > To unsubscribe, send any mail to "freebsd-arm-unsubscribe@freebsd.org" > > From owner-freebsd-fs@freebsd.org Wed Mar 9 00:05:15 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 01192AC8800; Wed, 9 Mar 2016 00:05:15 +0000 (UTC) (envelope-from lslusser@gmail.com) Received: from mail-vk0-x236.google.com (mail-vk0-x236.google.com [IPv6:2607:f8b0:400c:c05::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A8741CD4; Wed, 9 Mar 2016 00:05:14 +0000 (UTC) (envelope-from lslusser@gmail.com) Received: by mail-vk0-x236.google.com with SMTP id e185so37352714vkb.1; Tue, 08 Mar 2016 16:05:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=WOk5WWP9pQwt2NV8ukBTmjl6VUtNWJPE1ke8x+HyNvo=; b=faWODljHmdJjv4BdBuQ9loa+2L2kTRfVKRhPPrR4bs1e7Q9C/aCVL+5b+GVnaABeaX MFX+5mwcZYhvmaJw8lW+N4Ru8ArKOvUxQmVdXa+mHF3UbCGSNqAq5MnH3HmTuYLxYpGY NtBOF3uff+QF0rvD4EV4tiTt799GFGuQJjbCuL11VFHauRYb7VnwJ+YiEnWei5p4DaIx vqVqfihbhiy4oLImVuGIWt9/9kJcmwS24VfQkwOgCixv3zvAwQiN7auoyLhEfwNm2lPq LzIdIdtdJUoxH8VEinzX29dKN+h9n0fyELAjq1UhF4vDyDFU84J6xqwf5a6wtl4YBK8n j6LA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=WOk5WWP9pQwt2NV8ukBTmjl6VUtNWJPE1ke8x+HyNvo=; b=b4esiJHjETCB0/BFj3DUqQZKFv6MDvCK1cjJOCS4Ycw6Qw9PycGsQNXdzgJ4IK63VY 7BDuXGBcMiQLnyS8Oz1zQ1w76gylUvYUV4ra626jdy/DpNGcBQsz+6XWmLzi9WnnXpZH pu6mAnevOsrGZFEv/bkEnGVJCjQwOuqCHucJSQxETVoLzPf2SA6ETb11HciF2TJo+w2U kBgF6m2p0CM938Xmuu/AM1s9zEaOAywWNYg++Ga0rP+ZTcldBss4lGRnSr7rh7OPibXZ 0ZfdV5Yqb0KEv18RGmGtlZArMqH3CcA6lOZSlCwNwWKaQ5UcmoeFj957IhYXH761EwsC skQw== X-Gm-Message-State: AD7BkJKSHCullOJPaIzi+3vF3EJjJgb84Yd9sJzc1WjvduRsOKEZOmhFJwUHFbzXeTEoGQZzKrWxCpPSt3XtXQ== MIME-Version: 1.0 X-Received: by 10.31.3.80 with SMTP id 77mr23505443vkd.17.1457481913545; Tue, 08 Mar 2016 16:05:13 -0800 (PST) Received: by 10.176.1.166 with HTTP; Tue, 8 Mar 2016 16:05:13 -0800 (PST) In-Reply-To: References: <95563acb-d27b-4d4b-b8f3-afeb87a3d599@me.com> <56D87784.4090103@broken.net> <5158F354-9636-4031-9536-E99450F312B3@RichardElling.com> <6E2B77D1-E0CA-4901-A6BD-6A22C07536B3@gmail.com> Date: Tue, 8 Mar 2016 16:05:13 -0800 Message-ID: Subject: Re: [zfs] [developer] Re: [smartos-discuss] an interesting survey -- the zpool with most disks you have ever built From: Liam Slusser To: zfs@lists.illumos.org Cc: "smartos-discuss@lists.smartos.org" , developer@lists.open-zfs.org, developer , illumos-developer , omnios-discuss , Discussion list for OpenIndiana , "zfs-discuss@list.zfsonlinux.org" , "freebsd-fs@FreeBSD.org" , "zfs-devel@freebsd.org" X-Mailman-Approved-At: Wed, 09 Mar 2016 01:04:37 +0000 Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.21 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Mar 2016 00:05:15 -0000 Hi Fred - We don't use any cluster software. Our backup server is just a full copy of our data and nothing more. So in the event of a failure of the master our server clients don't automatically fail over or anything nifty like that. This filer isn't customer facing, so in the event of a failure of the master there is no customer impact. We use a slightly modified zrep to handle the replication between the two. thanks, liam > [Fred]: zpool wiith 280 drives in production is pretty big! I think 2000 >> drives were just in test. It is true that huge pools have lots of operation >> challenges. I have met the similar sluggish issue caused by a >> > will-die disk. Just curious, what is the cluster software > implemented in > http://everycity.co.uk/alasdair/2011/05/adjusting-drive-timeouts-with-mdb-on-solaris-or-openindiana/ > ? > > Thanks. > > Fred > From owner-freebsd-fs@freebsd.org Wed Mar 9 04:00:42 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2C0BCAC8D1B for ; Wed, 9 Mar 2016 04:00:42 +0000 (UTC) (envelope-from kaduk@mit.edu) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 0BC62C0A for ; Wed, 9 Mar 2016 04:00:42 +0000 (UTC) (envelope-from kaduk@mit.edu) Received: by mailman.ysv.freebsd.org (Postfix) id 07C1FAC8D18; Wed, 9 Mar 2016 04:00:42 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 07392AC8D17; Wed, 9 Mar 2016 04:00:42 +0000 (UTC) (envelope-from kaduk@mit.edu) Received: from dmz-mailsec-scanner-6.mit.edu (dmz-mailsec-scanner-6.mit.edu [18.7.68.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 95695C08; Wed, 9 Mar 2016 04:00:40 +0000 (UTC) (envelope-from kaduk@mit.edu) X-AuditID: 12074423-b4fff700000048c2-37-56df9fe07bcc Received: from mailhub-auth-1.mit.edu ( [18.9.21.35]) (using TLS with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by (Symantec Messaging Gateway) with SMTP id D4.A8.18626.0EF9FD65; Tue, 8 Mar 2016 23:00:32 -0500 (EST) Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by mailhub-auth-1.mit.edu (8.13.8/8.9.2) with ESMTP id u2940WBu016503; Tue, 8 Mar 2016 23:00:32 -0500 Received: from multics.mit.edu (system-low-sipb.mit.edu [18.187.2.37]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id u2940TH0016280 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Tue, 8 Mar 2016 23:00:31 -0500 Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308) id u2940Sv5007758; Tue, 8 Mar 2016 23:00:28 -0500 (EST) Date: Tue, 8 Mar 2016 23:00:28 -0500 (EST) From: Benjamin Kaduk To: Rick Macklem cc: fs@freebsd.org, scsi@freebsd.org Subject: Re: FUSE extended attribute patches available In-Reply-To: <2091108840.10124858.1457483217137.JavaMail.zimbra@uoguelph.ca> Message-ID: References: <800018199.6694281.1457233600357.JavaMail.zimbra@uoguelph.ca> <56DD2AB6.1030407@freebsd.org> <6AF0FC23-CC34-43EA-A008-9FB82FB21558@FreeBSD.org> <2091108840.10124858.1457483217137.JavaMail.zimbra@uoguelph.ca> User-Agent: Alpine 1.10 (GSO 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrPIsWRmVeSWpSXmKPExsUixCmqrPtg/v0wg1ntFhaHn7hYPFx2jcni 1tH5zA7MHjM+zWfx+L15L1MAUxSXTUpqTmZZapG+XQJXxpwpSxgLlrFXnDhm1sD4gbWLkZND QsBEYv+7S0A2F4eQQBuTxNkHD9kgnA2MEnPXbmCGcA4ySex6vZoNpEVIoF7i8oIV7CA2i4CW xLPOe4wgNpuAisTMNxvBakQE1CU2r+5nBrGZgeKH97wFqxcWMJO4Or2TBcTmFPCReP13FVA9 BwevgKPElIfmEOO/MUns/iUOYosK6Eis3j8FrJxXQFDi5MwnLBAjtSSWT9/GMoFRYBaS1Cwk qQWMTKsYZVNyq3RzEzNzilOTdYuTE/PyUot0zfRyM0v0UlNKNzGCwpLdRXkH48s+70OMAhyM Sjy8ES73w4RYE8uKK3MPMUpyMCmJ8t6RAgrxJeWnVGYkFmfEF5XmpBYfYpTgYFYS4V08EyjH m5JYWZValA+TkuZgURLnZWRgYBASSE8sSc1OTS1ILYLJynBwKEnwXp4H1ChYlJqeWpGWmVOC kGbi4AQZzgM0PB6khre4IDG3ODMdIn+KUZdjwY/ba5mEWPLy81KlxHn9QIoEQIoySvPg5oDT yW4m1VeM4kBvCfNyApOLEA8wFcFNegW0hAloyYvWeyBLShIRUlINjL2PQhlvn9n321Nc6eix qTG2nqWVr95umPnoxocA22PnL5+ykHl/+9qUC9OCKq49ZmVn/LKXP/e4y+OXKmx2B9pPLNJ8 OXPv17WOkR9W3WRcMtczxTWvkbW7eBufyM8V0+O35u0Izo3TneTR4BOeIPDyp5/zxRhJP69e 9tDL7cePr2BzS5H+sEaJpTgj0VCLuag4EQAWZ6/pAgMAAA== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Mar 2016 04:00:42 -0000 On Tue, 8 Mar 2016, Rick Macklem wrote: > Robert N. M. Watson wrote: > > Just a quick observation: to avoid application change, you could actually > > leave the 'user.' on the front of the strings? It's not harmful, it just > > doesn't serve the same function. This might keep documentation more in sync, > > etc. > > > Btw, this internet draft was just published. There is work in progress w.r.t. > NFS support for Linux style extended attributes and this draft might have something > to say w.r.t. namepsace? (I haven't looked at it.) > > draft-ietf-nfsv4-xattrs-02.txt > > Available via anonymous ftp at ftp.ietf.org (I find it amusing that the engineers > of the internet still use anonymous ftp;-). I prefer the version at http://tools.ietf.org/html/draft-ietf-nfsv4-xattrs-02 :) [obligatory note that the above document is to be considered a work in progress, and is not a final specification.] -Ben From owner-freebsd-fs@freebsd.org Wed Mar 9 02:15:39 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 41014AC8A4C; Wed, 9 Mar 2016 02:15:39 +0000 (UTC) (envelope-from rudd-o@rudd-o.com) Received: from mail.rudd-o.com (mail.rudd-o.com [54.255.149.57]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.rudd-o.com", Issuer "COMODO RSA Domain Validation Secure Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id EEC373BA; Wed, 9 Mar 2016 02:15:37 +0000 (UTC) (envelope-from rudd-o@rudd-o.com) Received: from [10.137.9.14] (ip-10-252-104-1.ap-southeast-1.compute.internal [10.252.104.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.rudd-o.com (Postfix) with ESMTPSA id C58166046D; Wed, 9 Mar 2016 02:08:22 +0000 (UTC) Subject: Re: [zfs] [developer] Re: [smartos-discuss] an interesting survey -- the zpool with most disks you have ever built To: developer@lists.open-zfs.org, zfs@lists.illumos.org References: <95563acb-d27b-4d4b-b8f3-afeb87a3d599@me.com> <56D87784.4090103@broken.net> <5158F354-9636-4031-9536-E99450F312B3@RichardElling.com> <6E2B77D1-E0CA-4901-A6BD-6A22C07536B3@gmail.com> Cc: "smartos-discuss@lists.smartos.org" , developer , illumos-developer , omnios-discuss , Discussion list for OpenIndiana , "zfs-discuss@list.zfsonlinux.org" , "freebsd-fs@FreeBSD.org" , "zfs-devel@freebsd.org" From: "Manuel Amador (Rudd-O)" Message-ID: <56DF8595.4090400@rudd-o.com> Date: Wed, 9 Mar 2016 02:08:21 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="F12aOeHdhaeLGxjv5JtNeoWIVE58hV2li" X-Mailman-Approved-At: Wed, 09 Mar 2016 12:54:00 +0000 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Mar 2016 02:15:39 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --F12aOeHdhaeLGxjv5JtNeoWIVE58hV2li Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 03/09/2016 12:05 AM, Liam Slusser wrote: > > We use a slightly modified zrep to handle the replication between the t= wo. zrep? --=20 Rudd-O http://rudd-o.com/ --F12aOeHdhaeLGxjv5JtNeoWIVE58hV2li Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJW34WVAAoJEFmZwbV7vYQ2L7gP/3pFIpy17j3dDEyq+QOInb5+ GNW7ffBGy3spJ+RMh8scQW0yU0DWtlALTy7TL+xIeo76CGtEMcO+bqqvGmzHgUlI W15800gih4Mt/z7buUx3rkUzivkTMFt8z4ElDWnaUUOvvRVHb3igMmg2LG1z3zFo Ntqux6CS77YJPxvCiNxzmP57oGuUvjt8XCptqMQtASrhpE5VA+S1KPdwQl9Rk/bu nr3c2a9Rx15coutNbgInVA6IF6oehVSngQH2twfP3/CzwuI5r3ZhZx+qOzLaUXMz +NPIHRvHyWJnSON1pHUqZYCNpu0g+n1GEQwJI7GX+ciKr1LdIlGcvIZb8Nmi7bWr 8tIE3Y5Hkznc6w0aja5FBkyHNUofb32FQRxdWobtBsEeqIz7p9kTA+j125zCsV6G Q4zKtsKN1j60h56RApW6/DqZtihpjQHNdT0kjLpvvNShGQt6NCgetHtqvNFM7gQo XULlVnlZPxsBUCb8qS8Fzi/Aii6MpeTbR1BKTnIhzwRDUTnHoSn76D6bX/DERGen xCdHjnixWaeve//UPY8hnfwQggGGxGbC8fmxIJfQ3grbRuNQnidS/K0zuLyjbO5w 0dDME3n286yl3eJigN6OX4kizzSOhjrjq0teD/M7i8Z64sksclI/BSYlPcW5i7Jg Gm+GntTYoK8G56VQ51ZK =mYb/ -----END PGP SIGNATURE----- --F12aOeHdhaeLGxjv5JtNeoWIVE58hV2li-- From owner-freebsd-fs@freebsd.org Wed Mar 9 16:12:44 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 869BCAC9755; Wed, 9 Mar 2016 16:12:44 +0000 (UTC) (envelope-from paul@gromit.dlib.vt.edu) Received: from gromit.dlib.vt.edu (gromit.dlib.vt.edu [128.173.126.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gromit.dlib.vt.edu", Issuer "Chumby Certificate Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 5CDA1C50; Wed, 9 Mar 2016 16:12:44 +0000 (UTC) (envelope-from paul@gromit.dlib.vt.edu) Received: from pmather.lib.vt.edu (pmather.lib.vt.edu [128.173.126.193]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by gromit.dlib.vt.edu (Postfix) with ESMTPSA id E5109F21; Wed, 9 Mar 2016 11:12:36 -0500 (EST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\)) Subject: Re: Unstable NFS on recent CURRENT From: Paul Mather In-Reply-To: <1290552239.10146172.1457484570450.JavaMail.zimbra@uoguelph.ca> Date: Wed, 9 Mar 2016 11:12:36 -0500 Cc: Ronald Klop , freebsd-fs@freebsd.org, freebsd-arm@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <60E8006A-F0A8-4284-839E-882FAD7E6A55@gromit.dlib.vt.edu> References: <3DAB3639-8FB8-43D3-9517-94D46EDEC19E@gromit.dlib.vt.edu> <1482595660.8940439.1457405756110.JavaMail.zimbra@uoguelph.ca> <08710728-3130-49BE-8BD7-AFE85A31C633@gromit.dlib.vt.edu> <1290552239.10146172.1457484570450.JavaMail.zimbra@uoguelph.ca> To: Rick Macklem X-Mailer: Apple Mail (2.3112) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Mar 2016 16:12:44 -0000 On Mar 8, 2016, at 7:49 PM, Rick Macklem wrote: > Paul Mather wrote: >> On Mar 7, 2016, at 9:55 PM, Rick Macklem = wrote: >>=20 >>> Paul Mather (forwarded by Ronald Klop) wrote: >>>> On Sun, 06 Mar 2016 02:57:03 +0100, Paul Mather = >>>> wrote: >>>>=20 >>>>> On my BeagleBone Black running 11-CURRENT (r296162) lately I have = been >>>>> having trouble with NFS. I have been doing a buildworld and = buildkernel >>>>> with /usr/src and /usr/obj mounted via NFS. Recently, this = process has >>>>> resulted in the buildworld failing at some point, with a variety = of >>>>> errors (Segmentation fault; Permission denied; etc.). Even a "ls = -alR" >>>>> of /usr/src doesn't manage to complete. It errors out thus: >>>>>=20 >>>>> =3D=3D=3D=3D=3D >>>>> [[...]] >>>>> total 0 >>>>> ls: ./.svn/pristine/fe: Permission denied >>>>>=20 >>>>> ./.svn/pristine/ff: >>>>> total 0 >>>>> ls: ./.svn/pristine/ff: Permission denied >>>>> ls: fts_read: Permission denied >>>>> =3D=3D=3D=3D=3D >>>>>=20 >>>>> On the console, I get the following: >>>>>=20 >>>>> newnfs: server 'chumby.chumby.lan' error: fileid changed. fsid >>>>> 94790777:a4385de: expected fileid 0x4, got 0x2. (BROKEN NFS SERVER = OR >>>>> MIDDLEWARE) >>>>>=20 > Oh, I had forgotten this. Here's the comment related to this error. > (about line#445 in sys/fs/nfsclient/nfs_clport.c): > 446 * BROKEN NFS SERVER OR MIDDLEWARE > 447 * > 448 * Certain NFS servers (certain old proprietary = filers ca. > 449 * 2006) or broken middleboxes (e.g. WAN = accelerator products) > 450 * will respond to GETATTR requests with results = for a > 451 * different fileid. > 452 * > 453 * The WAN accelerator we've observed not only = serves stale > 454 * cache results for a given file, it also = occasionally serves > 455 * results for wholly different files. This = causes surprising > 456 * problems; for example the cached size = attribute of a file > 457 * may truncate down and then back up, resulting = in zero > 458 * regions in file contents read by = applications. We observed > 459 * this reliably with Clang and .c files during = parallel build. > 460 * A pcap revealed packet fragmentation and = GETATTR RPC > 461 * responses with wholly wrong fileids. >=20 > If you can connect the client->server with a simple switch (or just an = RJ45 cable), it > might be worth testing that way. (I don't recall the name of the = middleware product, but > I think it was shipped by one of the major switch vendors. I also = don't know if the product > supports NFSv4?) >=20 > rick Currently, the client is connected to the server via a dumb gigabit = switch, so it is already fairly direct. As for the above error, it appeared on the console only once. (Sorry if = I made it sound like it appears every time.) I just tried another buildworld attempt via NFS and it failed again. = This time, I get this on the BeagleBone Black console: nfs_getpages: error 13 vm_fault: pager read error, pid 5401 (install) The other thing I have noticed is that if I induce heavy load on the NFS = server---e.g., by starting a Poudriere bulk build---then that provokes = the client to crash much more readily. For example, I started a NFS = buildworld on the BeagleBone Black, and it seemed to be chugging along = nicely. The moment I kicked off a Poudriere build update of my packages = on the NFS server, it crashed the buildworld on the NFS client. I have had problems with swap on FreeBSD/arm before. Swapping to a file = does not appear to work for me. As a result, I switched to swapping to = a partition on the SD card. Maybe this is unreliable, too? Cheers, Paul. From owner-freebsd-fs@freebsd.org Wed Mar 9 19:49:50 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A4B05ACA896; Wed, 9 Mar 2016 19:49:50 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from elsa.codelab.cz (elsa.codelab.cz [94.124.105.4]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 63784C5D; Wed, 9 Mar 2016 19:49:49 +0000 (UTC) (envelope-from 000.fbsd@quip.cz) Received: from elsa.codelab.cz (localhost [127.0.0.1]) by elsa.codelab.cz (Postfix) with ESMTP id A5E3128436; Wed, 9 Mar 2016 20:49:40 +0100 (CET) Received: from illbsd.quip.test (ip-86-49-16-209.net.upcbroadband.cz [86.49.16.209]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by elsa.codelab.cz (Postfix) with ESMTPSA id E148528412; Wed, 9 Mar 2016 20:49:37 +0100 (CET) Message-ID: <56E07E51.2010107@quip.cz> Date: Wed, 09 Mar 2016 20:49:37 +0100 From: Miroslav Lachman <000.fbsd@quip.cz> User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:35.0) Gecko/20100101 Firefox/35.0 SeaMonkey/2.32 MIME-Version: 1.0 To: "Manuel Amador (Rudd-O)" , developer@lists.open-zfs.org, zfs@lists.illumos.org CC: Discussion list for OpenIndiana , omnios-discuss , developer , "zfs-devel@freebsd.org" , illumos-developer , "freebsd-fs@FreeBSD.org" , "smartos-discuss@lists.smartos.org" , "zfs-discuss@list.zfsonlinux.org" Subject: Re: [zfs] [developer] Re: [smartos-discuss] an interesting survey -- the zpool with most disks you have ever built References: <95563acb-d27b-4d4b-b8f3-afeb87a3d599@me.com> <56D87784.4090103@broken.net> <5158F354-9636-4031-9536-E99450F312B3@RichardElling.com> <6E2B77D1-E0CA-4901-A6BD-6A22C07536B3@gmail.com> <56DF8595.4090400@rudd-o.com> In-Reply-To: <56DF8595.4090400@rudd-o.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Mailman-Approved-At: Wed, 09 Mar 2016 22:34:22 +0000 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Mar 2016 19:49:50 -0000 Manuel Amador (Rudd-O) wrote on 03/09/2016 03:08: > On 03/09/2016 12:05 AM, Liam Slusser wrote: >> >> We use a slightly modified zrep to handle the replication between the two. > > zrep? In ports sysutils/zrep - ZFS based replication and failover solution WWW: http://www.bolthole.com/solaris/zrep/ From owner-freebsd-fs@freebsd.org Thu Mar 10 02:00:28 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A05B6ACA793; Thu, 10 Mar 2016 02:00:28 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 3D816F75; Thu, 10 Mar 2016 02:00:27 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) IronPort-PHdr: 9a23:xlj03BGc05JqgNxMttjsUJ1GYnF86YWxBRYc798ds5kLTJ75oMSwAkXT6L1XgUPTWs2DsrQf27WQ4/+rBTRIyK3CmU5BWaQEbwUCh8QSkl5oK+++Imq/EsTXaTcnFt9JTl5v8iLzG0FUHMHjew+a+SXqvnYsExnyfTB4Ov7yUtaLyZ/niKbipNaPO01hv3mUX/BbFF2OtwLft80b08NJC50a7V/3mEZOYPlc3mhyJFiezF7W78a0+4N/oWwL46pyv+YJa6jxfrw5QLpEF3xmdjltvIy4/SXEGDOG+39Ud2wKkhdSS1zd5Qz+dpjrtS77qqxx3CiQe9PqC704RGLxwb1sTUrSiSwEfxsw+2LTh8k42LheqRmioxF665PTb5yYMOJ+OKjUK4BJDVFdV9pcAnQSSri3aJECWq9YZb5V X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DOAQC71OBW/61jaINeDoRsBrhGghMBDYFthg8CggQUAQEBAQEBAQFjJ4ItghQBAQEDASMEUgULAgEIGAICDRkCAlcCBBOIHAiwCo8qAQEBAQEBBAEBAQEBAQEZfIUcgXuCR4QiFoMCgToFh1cChVl0PYhVj1eHa4UuhX6IXgIeAQFCggMZgQ1ZHi4BiBcjARl+AQEB X-IronPort-AV: E=Sophos;i="5.24,313,1454994000"; d="scan'208";a="271821200" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-annu.net.uoguelph.ca with ESMTP; 09 Mar 2016 20:59:57 -0500 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 90D6E15F574; Wed, 9 Mar 2016 20:59:57 -0500 (EST) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id LTeijNFXb1or; Wed, 9 Mar 2016 20:59:56 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id BE86615F577; Wed, 9 Mar 2016 20:59:56 -0500 (EST) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id f1mLcALzsdLK; Wed, 9 Mar 2016 20:59:56 -0500 (EST) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 9B80F15F574; Wed, 9 Mar 2016 20:59:56 -0500 (EST) Date: Wed, 9 Mar 2016 20:59:56 -0500 (EST) From: Rick Macklem To: Paul Mather Cc: Ronald Klop , freebsd-fs@freebsd.org, freebsd-arm@freebsd.org Message-ID: <508973676.11871738.1457575196588.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <60E8006A-F0A8-4284-839E-882FAD7E6A55@gromit.dlib.vt.edu> References: <3DAB3639-8FB8-43D3-9517-94D46EDEC19E@gromit.dlib.vt.edu> <1482595660.8940439.1457405756110.JavaMail.zimbra@uoguelph.ca> <08710728-3130-49BE-8BD7-AFE85A31C633@gromit.dlib.vt.edu> <1290552239.10146172.1457484570450.JavaMail.zimbra@uoguelph.ca> <60E8006A-F0A8-4284-839E-882FAD7E6A55@gromit.dlib.vt.edu> Subject: Re: Unstable NFS on recent CURRENT MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.11] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF44 (Win)/8.0.9_GA_6191) Thread-Topic: Unstable NFS on recent CURRENT Thread-Index: RMlqSV8ZIP43xWv9Rxwnw2Yw7ASWew== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Mar 2016 02:00:28 -0000 Paul Mather wrote: > On Mar 8, 2016, at 7:49 PM, Rick Macklem wrote: > > > Paul Mather wrote: > >> On Mar 7, 2016, at 9:55 PM, Rick Macklem wrote: > >> > >>> Paul Mather (forwarded by Ronald Klop) wrote: > >>>> On Sun, 06 Mar 2016 02:57:03 +0100, Paul Mather > >>>> > >>>> wrote: > >>>> > >>>>> On my BeagleBone Black running 11-CURRENT (r296162) lately I have been > >>>>> having trouble with NFS. I have been doing a buildworld and > >>>>> buildkernel > >>>>> with /usr/src and /usr/obj mounted via NFS. Recently, this process has > >>>>> resulted in the buildworld failing at some point, with a variety of > >>>>> errors (Segmentation fault; Permission denied; etc.). Even a "ls -alR" > >>>>> of /usr/src doesn't manage to complete. It errors out thus: > >>>>> > >>>>> ===== > >>>>> [[...]] > >>>>> total 0 > >>>>> ls: ./.svn/pristine/fe: Permission denied > >>>>> > >>>>> ./.svn/pristine/ff: > >>>>> total 0 > >>>>> ls: ./.svn/pristine/ff: Permission denied > >>>>> ls: fts_read: Permission denied > >>>>> ===== > >>>>> > >>>>> On the console, I get the following: > >>>>> > >>>>> newnfs: server 'chumby.chumby.lan' error: fileid changed. fsid > >>>>> 94790777:a4385de: expected fileid 0x4, got 0x2. (BROKEN NFS SERVER OR > >>>>> MIDDLEWARE) > >>>>> > > Oh, I had forgotten this. Here's the comment related to this error. > > (about line#445 in sys/fs/nfsclient/nfs_clport.c): > > 446 * BROKEN NFS SERVER OR MIDDLEWARE > > 447 * > > 448 * Certain NFS servers (certain old proprietary filers > > ca. > > 449 * 2006) or broken middleboxes (e.g. WAN accelerator > > products) > > 450 * will respond to GETATTR requests with results for a > > 451 * different fileid. > > 452 * > > 453 * The WAN accelerator we've observed not only serves > > stale > > 454 * cache results for a given file, it also > > occasionally serves > > 455 * results for wholly different files. This causes > > surprising > > 456 * problems; for example the cached size attribute of > > a file > > 457 * may truncate down and then back up, resulting in > > zero > > 458 * regions in file contents read by applications. We > > observed > > 459 * this reliably with Clang and .c files during > > parallel build. > > 460 * A pcap revealed packet fragmentation and GETATTR > > RPC > > 461 * responses with wholly wrong fileids. > > > > If you can connect the client->server with a simple switch (or just an RJ45 > > cable), it > > might be worth testing that way. (I don't recall the name of the middleware > > product, but > > I think it was shipped by one of the major switch vendors. I also don't > > know if the product > > supports NFSv4?) > > > > rick > > > Currently, the client is connected to the server via a dumb gigabit switch, > so it is already fairly direct. > > As for the above error, it appeared on the console only once. (Sorry if I > made it sound like it appears every time.) > > I just tried another buildworld attempt via NFS and it failed again. This > time, I get this on the BeagleBone Black console: > > nfs_getpages: error 13 > vm_fault: pager read error, pid 5401 (install) > 13 is EACCES and could be caused by what I mention below. (Any mount of a file system on the server unless "-S" is specified as a flag for mountd.) > > The other thing I have noticed is that if I induce heavy load on the NFS > server---e.g., by starting a Poudriere bulk build---then that provokes the > client to crash much more readily. For example, I started a NFS buildworld > on the BeagleBone Black, and it seemed to be chugging along nicely. The > moment I kicked off a Poudriere build update of my packages on the NFS > server, it crashed the buildworld on the NFS client. > Try adding "-S" to mountd_flags on the server. Any time file systems are mounted (and Poudriere likes to do that, I am told), mount sends a SIGHUP to mountd to reload /etc/exports. When /etc/exports are being reloaded, there will be access errors for mounts (that are temporarily not exported) unless you specify "-S" (which makes mountd suspend the nfsd threads during the reload of /etc/exports). rick > I have had problems with swap on FreeBSD/arm before. Swapping to a file does > not appear to work for me. As a result, I switched to swapping to a > partition on the SD card. Maybe this is unreliable, too? > > Cheers, > > Paul. > > From owner-freebsd-fs@freebsd.org Thu Mar 10 02:07:37 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 74600ACAB81 for ; Thu, 10 Mar 2016 02:07:37 +0000 (UTC) (envelope-from torek@torek.net) Received: from elf.torek.net (mail.torek.net [96.90.199.121]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 499991481 for ; Thu, 10 Mar 2016 02:07:36 +0000 (UTC) (envelope-from torek@torek.net) Received: from elf.torek.net (localhost [127.0.0.1]) by elf.torek.net (8.14.9/8.14.9) with ESMTP id u2A279cC045486 for ; Wed, 9 Mar 2016 18:07:09 -0800 (PST) (envelope-from torek@torek.net) Message-Id: <201603100207.u2A279cC045486@elf.torek.net> From: Chris Torek To: freebsd-fs@freebsd.org Subject: quotactl bug: vfs_busy never unbusy-es MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <45484.1457575629.1@elf.torek.net> Date: Wed, 09 Mar 2016 18:07:09 -0800 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (elf.torek.net [127.0.0.1]); Wed, 09 Mar 2016 18:07:09 -0800 (PST) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Mar 2016 02:07:37 -0000 This bug is fairly small, as it mainly just prevents file system unmont of non-UFS file systems (including devfs and tmpfs as well as real ones like zfs) after issuing a particular quotactl() system call (which must be done as root) on those non-UFS file systems: rc = quotactl(path, QCMD(Q_QUOTAON, 0), 0, NULL); No user-land utility will actually do the evil system call (the main usage of quotactl() is in libutil, which skips over any non-ufs file system, it has literal strcmp()s in it for this). Still, it's just kind of sloppy at best. The bug occurs because ufs_quotactl has a special case, which sys_quotactl provides for, but no other file system implements this special case. (I checked them all: nullfs and unionfs pass the op down, but that doesn't help; smbfs has its own private copy of vfs_stdquotactl; and vfs_stdquotactl doesn't do it. This also means that nullfs and unionfs are actually worse than the others, if they happen to have a UFS file system in their pass-through: on unionfs you could get a panic, I think, when UFS attempts the vfs_unbusy() on the lower layer.) Here's the bit in sys_quotactl() that causes problems: error = vfs_busy(mp, 0); vfs_rel(mp); if (error != 0) return (error); error = VFS_QUOTACTL(mp, uap->cmd, uap->uid, uap->arg); /* large comment here about why the next two lines */ if ((uap->cmd >> SUBCMDSHIFT) != Q_QUOTAON) vfs_unbusy(mp); Now, ufs_quotactl() does in fact do the special unbusy dance, but nobody else does. Since quotas only really work on UFS there are a number of paths by which we might fix this. Maybe the best would be to change the system call: for quotaon() ops, perhaps the caller should pass the open file descriptors, for instance. Avoiding the whole busy/unbusy sequence here would be good (file systems that actually implement quotas, i.e., UFS, would do their own locking as needed). Another option is to change the type-signature of VFS_QUOTACTL: pass in a pointer to a flags, and have the callee (UFS) set or clear a flag to say "I did the unbusy". Naive callees, including all the error-returning ones, need do nothing. But this still means that nullfs and unionfs need work, as they don't actually do any vfs_busy/unbusy on their underlying mount points. I'm not sure why quotaon() is designed to do its own path lookup, but if there's no real reason to require that, I'd try the first idea. Meanwhile, I'm open to other suggestions... Chris From owner-freebsd-fs@freebsd.org Thu Mar 10 09:14:39 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D4F58AC918A for ; Thu, 10 Mar 2016 09:14:39 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 65E8CA6F for ; Thu, 10 Mar 2016 09:14:39 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u2A9EY39089869 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 10 Mar 2016 11:14:34 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u2A9EY39089869 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id u2A9EXYR089868; Thu, 10 Mar 2016 11:14:33 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 10 Mar 2016 11:14:33 +0200 From: Konstantin Belousov To: Chris Torek Cc: freebsd-fs@freebsd.org Subject: Re: quotactl bug: vfs_busy never unbusy-es Message-ID: <20160310091433.GS67250@kib.kiev.ua> References: <201603100207.u2A279cC045486@elf.torek.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201603100207.u2A279cC045486@elf.torek.net> User-Agent: Mutt/1.5.24 (2015-08-30) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Mar 2016 09:14:39 -0000 On Wed, Mar 09, 2016 at 06:07:09PM -0800, Chris Torek wrote: > This bug is fairly small, as it mainly just prevents file system > unmont of non-UFS file systems (including devfs and tmpfs as well > as real ones like zfs) after issuing a particular quotactl() system > call (which must be done as root) on those non-UFS file systems: > > rc = quotactl(path, QCMD(Q_QUOTAON, 0), 0, NULL); > > No user-land utility will actually do the evil system call (the > main usage of quotactl() is in libutil, which skips over any > non-ufs file system, it has literal strcmp()s in it for this). > Still, it's just kind of sloppy at best. > > The bug occurs because ufs_quotactl has a special case, which > sys_quotactl provides for, but no other file system implements > this special case. (I checked them all: nullfs and unionfs pass > the op down, but that doesn't help; smbfs has its own private copy > of vfs_stdquotactl; and vfs_stdquotactl doesn't do it. This also > means that nullfs and unionfs are actually worse than the others, > if they happen to have a UFS file system in their pass-through: on > unionfs you could get a panic, I think, when UFS attempts the > vfs_unbusy() on the lower layer.) > > Here's the bit in sys_quotactl() that causes problems: > > error = vfs_busy(mp, 0); > vfs_rel(mp); > if (error != 0) > return (error); > error = VFS_QUOTACTL(mp, uap->cmd, uap->uid, uap->arg); > /* large comment here about why the next two lines */ > if ((uap->cmd >> SUBCMDSHIFT) != Q_QUOTAON) > vfs_unbusy(mp); > > Now, ufs_quotactl() does in fact do the special unbusy dance, > but nobody else does. > > Since quotas only really work on UFS there are a number of paths > by which we might fix this. Maybe the best would be to change the > system call: for quotaon() ops, perhaps the caller should pass the > open file descriptors, for instance. Avoiding the whole > busy/unbusy sequence here would be good (file systems that > actually implement quotas, i.e., UFS, would do their own locking > as needed). Yes, this is a possible route, although the change would break VFS KBI. > > Another option is to change the type-signature of VFS_QUOTACTL: > pass in a pointer to a flags, and have the callee (UFS) set or > clear a flag to say "I did the unbusy". Naive callees, including > all the error-returning ones, need do nothing. But this still > means that nullfs and unionfs need work, as they don't actually > do any vfs_busy/unbusy on their underlying mount points. > > I'm not sure why quotaon() is designed to do its own path lookup, > but if there's no real reason to require that, I'd try the first > idea. > > Meanwhile, I'm open to other suggestions... Quotaon operates on some file which possibly does not belong to the mount point where the quotas are enabled. In fact, this setup is easier for UFS, since metadata updates do not trigger data writes. So the lookup in the UFS quotaon is unrelated to the lookup in the syscall. Also, it was considered that other filesystems could not need such split. The special casing for quotaon() is there because we might be unable to vfs_busy() the mount point, if parallel unmount is in progress. So sys_quotactl() must be ready to handle the case where VFS method lost vfs_busy() reference. I tried to fix this and did not thought about other filesystems. Why not do just the following ? With regard to the ufs/ufs/quota.h pollution of the VFS code, this is ugly, I agree. I wanted to move the ufs quota code into vfs layer for very long time. UFS quota has no UFS specific code at all, and is immediately reusable for other filesystems. E.g., I see tmpfs could immediately benefit from the shared quota code. diff --git a/sys/kern/vfs_default.c b/sys/kern/vfs_default.c index a7977bf..26e45f4 100644 --- a/sys/kern/vfs_default.c +++ b/sys/kern/vfs_default.c @@ -66,6 +66,8 @@ __FBSDID("$FreeBSD$"); #include #include +#include + static int vop_nolookup(struct vop_lookup_args *); static int vop_norename(struct vop_rename_args *); static int vop_nostrategy(struct vop_strategy_args *); @@ -1190,6 +1192,8 @@ vfs_stdquotactl (mp, cmds, uid, arg) void *arg; { + if ((cmds >> SUBCMDSHIFT) == Q_QUOTAON) + vfs_unbusy(mp); return (EOPNOTSUPP); } From owner-freebsd-fs@freebsd.org Thu Mar 10 11:38:45 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B1A82ACBF7B for ; Thu, 10 Mar 2016 11:38:45 +0000 (UTC) (envelope-from torek@elf.torek.net) Received: from elf.torek.net (mail.torek.net [96.90.199.121]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9A7B6F58 for ; Thu, 10 Mar 2016 11:38:45 +0000 (UTC) (envelope-from torek@elf.torek.net) Received: from elf.torek.net (localhost [127.0.0.1]) by elf.torek.net (8.14.9/8.14.9) with ESMTP id u2ABcif2048012; Thu, 10 Mar 2016 03:38:44 -0800 (PST) (envelope-from torek@elf.torek.net) Received: (from torek@localhost) by elf.torek.net (8.14.9/8.14.9/Submit) id u2ABcihi048011; Thu, 10 Mar 2016 03:38:44 -0800 (PST) (envelope-from torek) Date: Thu, 10 Mar 2016 03:38:44 -0800 (PST) From: Chris Torek Message-Id: <201603101138.u2ABcihi048011@elf.torek.net> To: kostikbel@gmail.com Subject: Re: quotactl bug: vfs_busy never unbusy-es Cc: freebsd-fs@freebsd.org In-Reply-To: <20160310091433.GS67250@kib.kiev.ua> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (elf.torek.net [127.0.0.1]); Thu, 10 Mar 2016 03:38:44 -0800 (PST) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Mar 2016 11:38:45 -0000 [A nicer change breaks KPI - yes, alas.] >Why not do just the following ? That fixes the main bug, although pass-through file systems (nullfs and unionfs) are still wrong: they don't vfs_busy their target mount points when passing the op through, nor unbusy their "mp" arguments when the sub-command is Q_QUOTAON. We could perhaps have a little subroutine in the VFS code that does the vfs_busy(mp)-then-call sequence, so that these two need not repeat it. >With regard to the ufs/ufs/quota.h pollution of the VFS code, this >is ugly, I agree. I wanted to move the ufs quota code into vfs layer >for very long time. This would be good (though zfs still has its own special quota code; hooking that up is what started me down this path, and hooking that up still looks difficult...). >diff --git a/sys/kern/vfs_default.c b/sys/kern/vfs_default.c >index a7977bf..26e45f4 100644 >--- a/sys/kern/vfs_default.c >+++ b/sys/kern/vfs_default.c >@@ -66,6 +66,8 @@ __FBSDID("$FreeBSD$"); > #include > #include > >+#include >+ > static int vop_nolookup(struct vop_lookup_args *); > static int vop_norename(struct vop_rename_args *); > static int vop_nostrategy(struct vop_strategy_args *); >@@ -1190,6 +1192,8 @@ vfs_stdquotactl (mp, cmds, uid, arg) > void *arg; > { > >+ if ((cmds >> SUBCMDSHIFT) == Q_QUOTAON) >+ vfs_unbusy(mp); > return (EOPNOTSUPP); > } That, in fact, is what I started with, before I investigated further and found the nullfs and unionfs pattern violation. It's certainly worth doing as a start, though. Chris From owner-freebsd-fs@freebsd.org Thu Mar 10 14:29:29 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3FF25ACBFDD; Thu, 10 Mar 2016 14:29:29 +0000 (UTC) (envelope-from paul@gromit.dlib.vt.edu) Received: from gromit.dlib.vt.edu (gromit.dlib.vt.edu [128.173.126.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "gromit.dlib.vt.edu", Issuer "Chumby Certificate Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id EFB3D2ED; Thu, 10 Mar 2016 14:29:28 +0000 (UTC) (envelope-from paul@gromit.dlib.vt.edu) Received: from macbook.chumby.lan (c-71-63-91-41.hsd1.va.comcast.net [71.63.91.41]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by gromit.dlib.vt.edu (Postfix) with ESMTPSA id 0D922185; Thu, 10 Mar 2016 09:29:26 -0500 (EST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\)) Subject: Re: Unstable NFS on recent CURRENT From: Paul Mather In-Reply-To: <508973676.11871738.1457575196588.JavaMail.zimbra@uoguelph.ca> Date: Thu, 10 Mar 2016 09:29:25 -0500 Cc: Ronald Klop , freebsd-fs@freebsd.org, freebsd-arm@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <3DAB3639-8FB8-43D3-9517-94D46EDEC19E@gromit.dlib.vt.edu> <1482595660.8940439.1457405756110.JavaMail.zimbra@uoguelph.ca> <08710728-3130-49BE-8BD7-AFE85A31C633@gromit.dlib.vt.edu> <1290552239.10146172.1457484570450.JavaMail.zimbra@uoguelph.ca> <60E8006A-F0A8-4284-839E-882FAD7E6A55@gromit.dlib.vt.edu> <508973676.11871738.1457575196588.JavaMail.zimbra@uoguelph.ca> To: Rick Macklem X-Mailer: Apple Mail (2.3112) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Mar 2016 14:29:29 -0000 On Mar 9, 2016, at 8:59 PM, Rick Macklem wrote: > Paul Mather wrote: >> On Mar 8, 2016, at 7:49 PM, Rick Macklem = wrote: >>=20 >>> Paul Mather wrote: >>>> On Mar 7, 2016, at 9:55 PM, Rick Macklem = wrote: >>>>=20 >>>>> Paul Mather (forwarded by Ronald Klop) wrote: >>>>>> On Sun, 06 Mar 2016 02:57:03 +0100, Paul Mather >>>>>> >>>>>> wrote: >>>>>>=20 >>>>>>> On my BeagleBone Black running 11-CURRENT (r296162) lately I = have been >>>>>>> having trouble with NFS. I have been doing a buildworld and >>>>>>> buildkernel >>>>>>> with /usr/src and /usr/obj mounted via NFS. Recently, this = process has >>>>>>> resulted in the buildworld failing at some point, with a variety = of >>>>>>> errors (Segmentation fault; Permission denied; etc.). Even a = "ls -alR" >>>>>>> of /usr/src doesn't manage to complete. It errors out thus: >>>>>>>=20 >>>>>>> =3D=3D=3D=3D=3D >>>>>>> [[...]] >>>>>>> total 0 >>>>>>> ls: ./.svn/pristine/fe: Permission denied >>>>>>>=20 >>>>>>> ./.svn/pristine/ff: >>>>>>> total 0 >>>>>>> ls: ./.svn/pristine/ff: Permission denied >>>>>>> ls: fts_read: Permission denied >>>>>>> =3D=3D=3D=3D=3D >>>>>>>=20 >>>>>>> On the console, I get the following: >>>>>>>=20 >>>>>>> newnfs: server 'chumby.chumby.lan' error: fileid changed. fsid >>>>>>> 94790777:a4385de: expected fileid 0x4, got 0x2. (BROKEN NFS = SERVER OR >>>>>>> MIDDLEWARE) >>>>>>>=20 >>> Oh, I had forgotten this. Here's the comment related to this error. >>> (about line#445 in sys/fs/nfsclient/nfs_clport.c): >>> 446 * BROKEN NFS SERVER OR MIDDLEWARE >>> 447 * >>> 448 * Certain NFS servers (certain old = proprietary filers >>> ca. >>> 449 * 2006) or broken middleboxes (e.g. WAN = accelerator >>> products) >>> 450 * will respond to GETATTR requests with = results for a >>> 451 * different fileid. >>> 452 * >>> 453 * The WAN accelerator we've observed = not only serves >>> stale >>> 454 * cache results for a given file, it = also >>> occasionally serves >>> 455 * results for wholly different files. = This causes >>> surprising >>> 456 * problems; for example the cached size = attribute of >>> a file >>> 457 * may truncate down and then back up, = resulting in >>> zero >>> 458 * regions in file contents read by = applications. We >>> observed >>> 459 * this reliably with Clang and .c files = during >>> parallel build. >>> 460 * A pcap revealed packet fragmentation = and GETATTR >>> RPC >>> 461 * responses with wholly wrong fileids. >>>=20 >>> If you can connect the client->server with a simple switch (or just = an RJ45 >>> cable), it >>> might be worth testing that way. (I don't recall the name of the = middleware >>> product, but >>> I think it was shipped by one of the major switch vendors. I also = don't >>> know if the product >>> supports NFSv4?) >>>=20 >>> rick >>=20 >>=20 >> Currently, the client is connected to the server via a dumb gigabit = switch, >> so it is already fairly direct. >>=20 >> As for the above error, it appeared on the console only once. (Sorry = if I >> made it sound like it appears every time.) >>=20 >> I just tried another buildworld attempt via NFS and it failed again. = This >> time, I get this on the BeagleBone Black console: >>=20 >> nfs_getpages: error 13 >> vm_fault: pager read error, pid 5401 (install) >>=20 > 13 is EACCES and could be caused by what I mention below. (Any mount = of a file > system on the server unless "-S" is specified as a flag for mountd.) >=20 >>=20 >> The other thing I have noticed is that if I induce heavy load on the = NFS >> server---e.g., by starting a Poudriere bulk build---then that = provokes the >> client to crash much more readily. For example, I started a NFS = buildworld >> on the BeagleBone Black, and it seemed to be chugging along nicely. = The >> moment I kicked off a Poudriere build update of my packages on the = NFS >> server, it crashed the buildworld on the NFS client. >>=20 > Try adding "-S" to mountd_flags on the server. Any time file systems = are mounted > (and Poudriere likes to do that, I am told), mount sends a SIGHUP to = mountd to > reload /etc/exports. When /etc/exports are being reloaded, there will = be access > errors for mounts (that are temporarily not exported) unless you = specify "-S" > (which makes mountd suspend the nfsd threads during the reload of = /etc/exports). >=20 > rick Bingo! I think we may have a winner. I added that flag to mountd_flags = on the server and the "instability" appears to have gone away. It may be that all along the NFS problems on the client just coincided = with Poudriere runs on the server. I build custom packages for my local = machines using Poudriere so I use it quite a lot. Maybe the Poudriere = port should come with a warning at install to those using NFS that it = may provoke disruption and suggest the addition of "-S"? = (Alternatively, maybe "-S" could become a default for mountd_flags? Is = there a downside from using it that means making it a default option is = unsuitable?) Anyway, many, many thanks for all the help, Rick. I'll keep monitoring = my BeagleBone Black, but it looks for now that this has solved the NFS = "instability." Cheers, Paul. From owner-freebsd-fs@freebsd.org Thu Mar 10 16:29:22 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id AEC40AC9DEF for ; Thu, 10 Mar 2016 16:29:22 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5959DA61 for ; Thu, 10 Mar 2016 16:29:22 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u2AGTHte025862 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 10 Mar 2016 18:29:17 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u2AGTHte025862 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id u2AGTGgg025861; Thu, 10 Mar 2016 18:29:16 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 10 Mar 2016 18:29:16 +0200 From: Konstantin Belousov To: Chris Torek Cc: freebsd-fs@freebsd.org Subject: Re: quotactl bug: vfs_busy never unbusy-es Message-ID: <20160310162916.GB1741@kib.kiev.ua> References: <20160310091433.GS67250@kib.kiev.ua> <201603101138.u2ABcihi048011@elf.torek.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201603101138.u2ABcihi048011@elf.torek.net> User-Agent: Mutt/1.5.24 (2015-08-30) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Mar 2016 16:29:22 -0000 On Thu, Mar 10, 2016 at 03:38:44AM -0800, Chris Torek wrote: > That, in fact, is what I started with, before I investigated > further and found the nullfs and unionfs pattern violation. > > It's certainly worth doing as a start, though. Lets fix nullfs too. The uncomfortable issue with the interface is that it is not always possible to busy mount point after unbusy. I propose to assume that for ENOENT error the busy ref is lost (this is the only possible error from vfs_busy()). I also verified that UFS quota code does not return this error for !quotaon case. diff --git a/sys/fs/nullfs/null_vfsops.c b/sys/fs/nullfs/null_vfsops.c index 49bae28..36e44cf 100644 --- a/sys/fs/nullfs/null_vfsops.c +++ b/sys/fs/nullfs/null_vfsops.c @@ -53,6 +53,7 @@ #include #include +#include static MALLOC_DEFINE(M_NULLFSMNT, "nullfs_mount", "NULLFS mount structure"); @@ -285,13 +286,33 @@ nullfs_root(mp, flags, vpp) } static int -nullfs_quotactl(mp, cmd, uid, arg) - struct mount *mp; - int cmd; - uid_t uid; - void *arg; +nullfs_quotactl(struct mount *mp, int cmd, uid_t uid, void *arg) { - return VFS_QUOTACTL(MOUNTTONULLMOUNT(mp)->nullm_vfs, cmd, uid, arg); + struct mount *lmp; + int error, error1; + + /* + * Nullfs mp is busy, which ensures that the lower mount is + * valid. + */ + lmp = MOUNTTONULLMOUNT(mp)->nullm_vfs; + vfs_ref(mp); + vfs_ref(lmp); + vfs_unbusy(mp); + error = vfs_busy(lmp, 0); + if (error == 0) { + error = VFS_QUOTACTL(lmp, cmd, uid, arg); + if ((cmd >> SUBCMDSHIFT) != Q_QUOTAON && error != ENOENT) { + vfs_unbusy(lmp); + error1 = vfs_busy(mp, 0); + if (error1 != 0) + error = error1; + } + } + vfs_rel(mp); + vfs_rel(lmp); + return (error); + } static int diff --git a/sys/kern/vfs_default.c b/sys/kern/vfs_default.c index a7977bf..26e45f4 100644 --- a/sys/kern/vfs_default.c +++ b/sys/kern/vfs_default.c @@ -66,6 +66,8 @@ __FBSDID("$FreeBSD$"); #include #include +#include + static int vop_nolookup(struct vop_lookup_args *); static int vop_norename(struct vop_rename_args *); static int vop_nostrategy(struct vop_strategy_args *); @@ -1190,6 +1192,8 @@ vfs_stdquotactl (mp, cmds, uid, arg) void *arg; { + if ((cmds >> SUBCMDSHIFT) == Q_QUOTAON) + vfs_unbusy(mp); return (EOPNOTSUPP); } diff --git a/sys/kern/vfs_syscalls.c b/sys/kern/vfs_syscalls.c index 11813fc..1ca9d97 100644 --- a/sys/kern/vfs_syscalls.c +++ b/sys/kern/vfs_syscalls.c @@ -198,7 +198,7 @@ sys_quotactl(td, uap) * Require that Q_QUOTAON handles the vfs_busy() reference on * its own, always returning with ubusied mount point. */ - if ((uap->cmd >> SUBCMDSHIFT) != Q_QUOTAON) + if ((uap->cmd >> SUBCMDSHIFT) != Q_QUOTAON && error != ENOENT) vfs_unbusy(mp); return (error); } From owner-freebsd-fs@freebsd.org Thu Mar 10 16:19:11 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3199BAC96E1; Thu, 10 Mar 2016 16:19:11 +0000 (UTC) (envelope-from jg@internetx.com) Received: from mx1.internetx.com (mx1.internetx.com [62.116.129.39]) by mx1.freebsd.org (Postfix) with ESMTP id E3356103; Thu, 10 Mar 2016 16:19:10 +0000 (UTC) (envelope-from jg@internetx.com) Received: from localhost (localhost [127.0.0.1]) by mx1.internetx.com (Postfix) with ESMTP id 735C145FC0AA; Thu, 10 Mar 2016 17:11:26 +0100 (CET) X-Virus-Scanned: InterNetX GmbH amavisd-new at ix-mailer.internetx.de Received: from mx1.internetx.com ([62.116.129.39]) by localhost (ix-mailer.internetx.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oPuSZKEhZ+2E; Thu, 10 Mar 2016 17:11:24 +0100 (CET) Received: from [192.168.100.26] (pizza.internetx.de [62.116.129.3]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by mx1.internetx.com (Postfix) with ESMTPSA id 33C0A4C4C135; Thu, 10 Mar 2016 17:11:24 +0100 (CET) Subject: Re: [zfs] [developer] Re: [smartos-discuss] an interesting survey -- the zpool with most disks you have ever built References: <95563acb-d27b-4d4b-b8f3-afeb87a3d599@me.com> <56D87784.4090103@broken.net> <5158F354-9636-4031-9536-E99450F312B3@RichardElling.com> <6E2B77D1-E0CA-4901-A6BD-6A22C07536B3@gmail.com> <56DF8595.4090400@rudd-o.com> To: "Manuel Amador (Rudd-O)" , developer@lists.open-zfs.org, zfs@lists.illumos.org Cc: Discussion list for OpenIndiana , omnios-discuss , developer , "zfs-devel@freebsd.org" , illumos-developer , "freebsd-fs@FreeBSD.org" , "smartos-discuss@lists.smartos.org" , "zfs-discuss@list.zfsonlinux.org" Reply-To: jg@internetx.com From: InterNetX - Juergen Gotteswinter Message-ID: <56E19CAB.5090200@internetx.com> Date: Thu, 10 Mar 2016 17:11:23 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <56DF8595.4090400@rudd-o.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Mailman-Approved-At: Thu, 10 Mar 2016 16:57:02 +0000 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Mar 2016 16:19:11 -0000 www.bolthole.com/solaris/zrep/ Am 09.03.2016 um 03:08 schrieb Manuel Amador (Rudd-O): > On 03/09/2016 12:05 AM, Liam Slusser wrote: >> >> We use a slightly modified zrep to handle the replication between the two. > > zrep? > From owner-freebsd-fs@freebsd.org Fri Mar 11 01:08:20 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 978FAACA62B; Fri, 11 Mar 2016 01:08:20 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 1D5E2B0C; Fri, 11 Mar 2016 01:08:19 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) IronPort-PHdr: 9a23:fqt3Eh14f65UqSW0smDT+DRfVm0co7zxezQtwd8ZsegeKvad9pjvdHbS+e9qxAeQG96LtLQU26GP7P+ocFdDyKjCmUhKSIZLWR4BhJdetC0bK+nBN3fGKuX3ZTcxBsVIWQwt1Xi6NU9IBJS2PAWK8TWM5DIfUi/yKRBybrysXNWC0ILnh6vrpMKbSj4LrQT+SIs6FA+xowTVu5teqqpZAYF19CH0pGBVcf9d32JiKAHbtR/94sCt4MwrqHwI6LoJvvRNWqTifqk+UacQTHF/azh0t4XXskz7RBaLrl4VTmUbiFIcGwHY6Dn1RJD0sze8uu580m+EIYv7Qa1iChq46KI+ch7ji28iPjU69GzSwphqiatQoxasojRixIHJbYWNNLx1d/WOLpshWWNdU5MJBGR6CYSmYt5KVrJZMA== X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DOAQCNGeJW/61jaINeDoRvBrgOghMBDYFthg8CgXgUAQEBAQEBAQFjJ4ItghQBAQEDASMEUgULAgEIGAICDRkCAlcCBBMbiAEIrkmPIQEBAQEBBQEBAQEBG3yFHIF7gkeEIhaDAoE6BYdYAoVadD2IWIhcB4Z8h2yFL4YCiGgCHgEBQoIDGYENWR4uAYkVIwEZfgEBAQ X-IronPort-AV: E=Sophos;i="5.24,318,1454994000"; d="scan'208";a="270314578" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 10 Mar 2016 20:08:12 -0500 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 6493415F55D; Thu, 10 Mar 2016 20:08:12 -0500 (EST) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id jwyDUfK0fRdY; Thu, 10 Mar 2016 20:08:11 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 587A615F56D; Thu, 10 Mar 2016 20:08:11 -0500 (EST) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 4PskEHakYHaR; Thu, 10 Mar 2016 20:08:11 -0500 (EST) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 39A4615F55D; Thu, 10 Mar 2016 20:08:11 -0500 (EST) Date: Thu, 10 Mar 2016 20:08:10 -0500 (EST) From: Rick Macklem To: Paul Mather Cc: Ronald Klop , freebsd-fs@freebsd.org, freebsd-arm@freebsd.org Message-ID: <2136530467.13386220.1457658490896.JavaMail.zimbra@uoguelph.ca> In-Reply-To: References: <3DAB3639-8FB8-43D3-9517-94D46EDEC19E@gromit.dlib.vt.edu> <1482595660.8940439.1457405756110.JavaMail.zimbra@uoguelph.ca> <08710728-3130-49BE-8BD7-AFE85A31C633@gromit.dlib.vt.edu> <1290552239.10146172.1457484570450.JavaMail.zimbra@uoguelph.ca> <60E8006A-F0A8-4284-839E-882FAD7E6A55@gromit.dlib.vt.edu> <508973676.11871738.1457575196588.JavaMail.zimbra@uoguelph.ca> Subject: Re: Unstable NFS on recent CURRENT MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.12] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF44 (Win)/8.0.9_GA_6191) Thread-Topic: Unstable NFS on recent CURRENT Thread-Index: vkg/rK143TXd6S6nj595PPG46ZJutA== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Mar 2016 01:08:20 -0000 Paul Mather wrote: > On Mar 9, 2016, at 8:59 PM, Rick Macklem wrote: > > > Paul Mather wrote: > >> On Mar 8, 2016, at 7:49 PM, Rick Macklem wrote: > >> > >>> Paul Mather wrote: > >>>> On Mar 7, 2016, at 9:55 PM, Rick Macklem wrote: > >>>> > >>>>> Paul Mather (forwarded by Ronald Klop) wrote: > >>>>>> On Sun, 06 Mar 2016 02:57:03 +0100, Paul Mather > >>>>>> > >>>>>> wrote: > >>>>>> > >>>>>>> On my BeagleBone Black running 11-CURRENT (r296162) lately I have > >>>>>>> been > >>>>>>> having trouble with NFS. I have been doing a buildworld and > >>>>>>> buildkernel > >>>>>>> with /usr/src and /usr/obj mounted via NFS. Recently, this process > >>>>>>> has > >>>>>>> resulted in the buildworld failing at some point, with a variety of > >>>>>>> errors (Segmentation fault; Permission denied; etc.). Even a "ls > >>>>>>> -alR" > >>>>>>> of /usr/src doesn't manage to complete. It errors out thus: > >>>>>>> > >>>>>>> ===== > >>>>>>> [[...]] > >>>>>>> total 0 > >>>>>>> ls: ./.svn/pristine/fe: Permission denied > >>>>>>> > >>>>>>> ./.svn/pristine/ff: > >>>>>>> total 0 > >>>>>>> ls: ./.svn/pristine/ff: Permission denied > >>>>>>> ls: fts_read: Permission denied > >>>>>>> ===== > >>>>>>> > >>>>>>> On the console, I get the following: > >>>>>>> > >>>>>>> newnfs: server 'chumby.chumby.lan' error: fileid changed. fsid > >>>>>>> 94790777:a4385de: expected fileid 0x4, got 0x2. (BROKEN NFS SERVER OR > >>>>>>> MIDDLEWARE) > >>>>>>> > >>> Oh, I had forgotten this. Here's the comment related to this error. > >>> (about line#445 in sys/fs/nfsclient/nfs_clport.c): > >>> 446 * BROKEN NFS SERVER OR MIDDLEWARE > >>> 447 * > >>> 448 * Certain NFS servers (certain old proprietary > >>> filers > >>> ca. > >>> 449 * 2006) or broken middleboxes (e.g. WAN accelerator > >>> products) > >>> 450 * will respond to GETATTR requests with results for > >>> a > >>> 451 * different fileid. > >>> 452 * > >>> 453 * The WAN accelerator we've observed not only > >>> serves > >>> stale > >>> 454 * cache results for a given file, it also > >>> occasionally serves > >>> 455 * results for wholly different files. This causes > >>> surprising > >>> 456 * problems; for example the cached size attribute > >>> of > >>> a file > >>> 457 * may truncate down and then back up, resulting in > >>> zero > >>> 458 * regions in file contents read by applications. > >>> We > >>> observed > >>> 459 * this reliably with Clang and .c files during > >>> parallel build. > >>> 460 * A pcap revealed packet fragmentation and GETATTR > >>> RPC > >>> 461 * responses with wholly wrong fileids. > >>> > >>> If you can connect the client->server with a simple switch (or just an > >>> RJ45 > >>> cable), it > >>> might be worth testing that way. (I don't recall the name of the > >>> middleware > >>> product, but > >>> I think it was shipped by one of the major switch vendors. I also don't > >>> know if the product > >>> supports NFSv4?) > >>> > >>> rick > >> > >> > >> Currently, the client is connected to the server via a dumb gigabit > >> switch, > >> so it is already fairly direct. > >> > >> As for the above error, it appeared on the console only once. (Sorry if I > >> made it sound like it appears every time.) > >> > >> I just tried another buildworld attempt via NFS and it failed again. This > >> time, I get this on the BeagleBone Black console: > >> > >> nfs_getpages: error 13 > >> vm_fault: pager read error, pid 5401 (install) > >> > > 13 is EACCES and could be caused by what I mention below. (Any mount of a > > file > > system on the server unless "-S" is specified as a flag for mountd.) > > > >> > >> The other thing I have noticed is that if I induce heavy load on the NFS > >> server---e.g., by starting a Poudriere bulk build---then that provokes the > >> client to crash much more readily. For example, I started a NFS > >> buildworld > >> on the BeagleBone Black, and it seemed to be chugging along nicely. The > >> moment I kicked off a Poudriere build update of my packages on the NFS > >> server, it crashed the buildworld on the NFS client. > >> > > Try adding "-S" to mountd_flags on the server. Any time file systems are > > mounted > > (and Poudriere likes to do that, I am told), mount sends a SIGHUP to mountd > > to > > reload /etc/exports. When /etc/exports are being reloaded, there will be > > access > > errors for mounts (that are temporarily not exported) unless you specify > > "-S" > > (which makes mountd suspend the nfsd threads during the reload of > > /etc/exports). > > > > rick > > > Bingo! I think we may have a winner. I added that flag to mountd_flags on > the server and the "instability" appears to have gone away. > > It may be that all along the NFS problems on the client just coincided with > Poudriere runs on the server. I build custom packages for my local machines > using Poudriere so I use it quite a lot. Maybe the Poudriere port should > come with a warning at install to those using NFS that it may provoke > disruption and suggest the addition of "-S"? (Alternatively, maybe "-S" > could become a default for mountd_flags? Is there a downside from using it > that means making it a default option is unsuitable?) > Well, the first time I proposed "-S" the collective felt it wasn't the appropriate solution to the "export reload" problem. The second time, the "collective" agreed that it was ok as a non-default option. (Part of this story was an alternative to mountd called nfse which did update exports atomically, but it never made it into FreeBSD.) The only downside to making it a default is that it does change behaviour and some might consider that a POLA violation. Others would consider it just a bug fix. There was one report of long delays before exports got updated on a very busy server. (I have a one line patch that fixes this, but that won't be committed into FreeBSD-current until April.) Now that "-S" has been in FreeBSD for a couple of years, I am planning on asking the "collective" (I usually post these kind of things on freebsd-fs@) to make it the default in FreeBSD-current, because this problem seems to crop up fairly frequently. I will probably post w.r.t. this in April when I can again to svn commits. I only recently found out the Poudriere does mounts and causes this problem. I may also commit a man page update (which can be MFC'd) that mentions if you are using Poudriere you want this flag. Having the same thing mentioned in the Poudriere port install might be nice, too. Thanks for testing this, rick > Anyway, many, many thanks for all the help, Rick. I'll keep monitoring my > BeagleBone Black, but it looks for now that this has solved the NFS > "instability." > > Cheers, > > Paul. > > From owner-freebsd-fs@freebsd.org Fri Mar 11 01:51:26 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F29A5ACB6DF for ; Fri, 11 Mar 2016 01:51:25 +0000 (UTC) (envelope-from Marc.Goroff@Quorum.net) Received: from mail.quorumlabs.com (mail.quorum.net [64.74.133.216]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (Client CN "mail.quorumlabs.com", Issuer "Go Daddy Secure Certification Authority" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id E0408176 for ; Fri, 11 Mar 2016 01:51:25 +0000 (UTC) (envelope-from Marc.Goroff@Quorum.net) Received: from Marc-Goroffs-MacBook-Air-4.local (10.20.7.92) by QLEXC01.Quorum.local (10.30.0.22) with Microsoft SMTP Server (TLS) id 14.2.318.1; Thu, 10 Mar 2016 17:47:34 -0800 Subject: Re: Unstable NFS on recent CURRENT To: References: <3DAB3639-8FB8-43D3-9517-94D46EDEC19E@gromit.dlib.vt.edu> <1482595660.8940439.1457405756110.JavaMail.zimbra@uoguelph.ca> <08710728-3130-49BE-8BD7-AFE85A31C633@gromit.dlib.vt.edu> <1290552239.10146172.1457484570450.JavaMail.zimbra@uoguelph.ca> <60E8006A-F0A8-4284-839E-882FAD7E6A55@gromit.dlib.vt.edu> <508973676.11871738.1457575196588.JavaMail.zimbra@uoguelph.ca> <2136530467.13386220.1457658490896.JavaMail.zimbra@uoguelph.ca> From: Marc Goroff CC: Message-ID: <56E22455.3040402@quorum.net> Date: Thu, 10 Mar 2016 17:50:13 -0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <2136530467.13386220.1457658490896.JavaMail.zimbra@uoguelph.ca> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.20.7.92] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Mar 2016 01:51:26 -0000 On 3/10/16 5:08 PM, Rick Macklem wrote: > Paul Mather wrote: >> On Mar 9, 2016, at 8:59 PM, Rick Macklem wrote: >> >>> Paul Mather wrote: >>>> On Mar 8, 2016, at 7:49 PM, Rick Macklem wrote: >>>> >>>>> Paul Mather wrote: >>>>>> On Mar 7, 2016, at 9:55 PM, Rick Macklem wrote: >>>>>> >>>>>>> Paul Mather (forwarded by Ronald Klop) wrote: >>>>>>>> On Sun, 06 Mar 2016 02:57:03 +0100, Paul Mather >>>>>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>>> > Well, the first time I proposed "-S" the collective felt it wasn't the appropriate > solution to the "export reload" problem. The second time, the "collective" agreed > that it was ok as a non-default option. (Part of this story was an alternative to > mountd called nfse which did update exports atomically, but it never made it into > FreeBSD.) The only downside to making it a default is that it does change behaviour > and some might consider that a POLA violation. Others would consider it just a bug fix. > There was one report of long delays before exports got updated on a very busy server. > (I have a one line patch that fixes this, but that won't be committed into FreeBSD-current > until April.) > > Now that "-S" has been in FreeBSD for a couple of years, I am planning on asking > the "collective" (I usually post these kind of things on freebsd-fs@) to make it the > default in FreeBSD-current, because this problem seems to crop up fairly frequently. > I will probably post w.r.t. this in April when I can again to svn commits. > > I only recently found out the Poudriere does mounts and causes this problem. > I may also commit a man page update (which can be MFC'd) that mentions if you > are using Poudriere you want this flag. > Having the same thing mentioned in the Poudriere port install might be nice, too. > > Thanks for testing this, rick > I was amazed when I discovered the "-S" option last year and even more amazed that it wasn't the default. The lack of -S caused us enormous problems in our production ZFS environment last year and nearly caused us to abandon FreeBSD altogether. Every time we'd provision a new ZFS file system from the zpool, all our NFS clients would start throwing alerts due to I/O errors! I'm unclear why it would be considered acceptable to have a reload of an exports file cause spurious I/O errors for NFS clients. I'd think such incorrect behavior would be clearly considered a blatant POLA violation. Causing a delay and returning correct data is far superior to incorrectly returning access errors that screw up well behaved applications on NFS clients. NFS is capable of handling network delays. Why choose instead to return invalid I/O errors? IMHO, -S should be the default. I'm unable to think of any good reason for it to be optional and the lack of it as the default behavior has clearly caused lots of needless pain and suffering in the user community. Marc From owner-freebsd-fs@freebsd.org Fri Mar 11 14:53:14 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4B5B1ACCC45 for ; Fri, 11 Mar 2016 14:53:14 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3BADE91E for ; Fri, 11 Mar 2016 14:53:14 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u2BErDJD035862 for ; Fri, 11 Mar 2016 14:53:14 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 204622] [zfs] [patch] Improve 'zpool labelclear' command Date: Fri, 11 Mar 2016 14:53:14 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: feature, patch X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: ganael.laplanche@corp.ovh.com X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Mar 2016 14:53:14 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D204622 --- Comment #2 from Ganael LAPLANCHE --- Has anyone tested that patch ? I would be pleased to get feedbacks on that = :) --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Fri Mar 11 23:16:01 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2229BACC62A for ; Fri, 11 Mar 2016 23:16:01 +0000 (UTC) (envelope-from torek@torek.net) Received: from elf.torek.net (mail.torek.net [96.90.199.121]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1010A14E4 for ; Fri, 11 Mar 2016 23:16:00 +0000 (UTC) (envelope-from torek@torek.net) Received: from elf.torek.net (localhost [127.0.0.1]) by elf.torek.net (8.14.9/8.14.9) with ESMTP id u2BNFsc0059323; Fri, 11 Mar 2016 15:15:54 -0800 (PST) (envelope-from torek@torek.net) Message-Id: <201603112315.u2BNFsc0059323@elf.torek.net> From: Chris Torek To: Konstantin Belousov cc: freebsd-fs@freebsd.org Subject: Re: quotactl bug: vfs_busy never unbusy-es In-reply-to: Your message of "Thu, 10 Mar 2016 18:29:16 +0200." <20160310162916.GB1741@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <59321.1457738154.1@elf.torek.net> Date: Fri, 11 Mar 2016 15:15:54 -0800 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (elf.torek.net [127.0.0.1]); Fri, 11 Mar 2016 15:15:54 -0800 (PST) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Mar 2016 23:16:01 -0000 >Lets fix nullfs too. And unionfs? :-) >The uncomfortable issue with the interface is that >it is not always possible to busy mount point after unbusy. Yes. >I propose to assume that for ENOENT error the busy ref is lost >(this is the only possible error from vfs_busy()). This seems pretty scary. Also: >I also verified that UFS quota code >does not return this error for !quotaon case. OK, but it certainly can for the quotaon case (though we're protected by the fact that quotaon already had to do the unbusy so it doesn't matter now). Anyway, we are getting kind of complicated here. I think we can fix this with a fairly tiny KPI change. Note that the code currently only works for UFS. Suppose we move the vfs_busy to happen later (with potential failure, i.e., quotactl() call may now return ENOENT because it can't busy the file system because it's being unmounted -- this seems pretty harmless as it only occurs in a race between quotactl() and unmount, and the caller could have lost that race anyway). I'm not 100% sure this is correct, but it seems a bit less scary to me :-) But note that it is COMPLETELY untested (not even compiled), I'm just seeing if you think this is a reasonable approach. Basically, we're just moving the vfs_busy/unbusy into UFS itself, and permitting it to fail at that point (for all ops, not just Q_QUOTAON). (Later we can refactor all this to be usable from, e.g., tmpfs.) Chris diff --git a/sys/kern/vfs_syscalls.c b/sys/kern/vfs_syscalls.c index 11813fc..e1ab670 100644 --- a/sys/kern/vfs_syscalls.c +++ b/sys/kern/vfs_syscalls.c @@ -173,6 +173,11 @@ sys_quotactl(td, uap) AUDIT_ARG_UID(uap->uid); if (!prison_allow(td->td_ucred, PR_ALLOW_QUOTAS)) return (EPERM); + /* + * Reference the mount point so that it will remain in core + * across the VFS_QUOTACTL call. We used to vfs_busy() it + * here, but now we leave that to the underlying file system. + */ NDINIT(&nd, LOOKUP, FOLLOW | LOCKLEAF | AUDITVNODE1, UIO_USERSPACE, uap->path, td); if ((error = namei(&nd)) != 0) @@ -181,25 +186,8 @@ sys_quotactl(td, uap) mp = nd.ni_vp->v_mount; vfs_ref(mp); vput(nd.ni_vp); - error = vfs_busy(mp, 0); - vfs_rel(mp); - if (error != 0) - return (error); error = VFS_QUOTACTL(mp, uap->cmd, uap->uid, uap->arg); - - /* - * Since quota on operation typically needs to open quota - * file, the Q_QUOTAON handler needs to unbusy the mount point - * before calling into namei. Otherwise, unmount might be - * started between two vfs_busy() invocations (first is our, - * second is from mount point cross-walk code in lookup()), - * causing deadlock. - * - * Require that Q_QUOTAON handles the vfs_busy() reference on - * its own, always returning with ubusied mount point. - */ - if ((uap->cmd >> SUBCMDSHIFT) != Q_QUOTAON) - vfs_unbusy(mp); + vfs_rel(mp); return (error); } diff --git a/sys/ufs/ufs/ufs_vfsops.c b/sys/ufs/ufs/ufs_vfsops.c index 5bb73ea..fc5f5bb 100644 --- a/sys/ufs/ufs/ufs_vfsops.c +++ b/sys/ufs/ufs/ufs_vfsops.c @@ -92,9 +92,6 @@ ufs_quotactl(mp, cmds, id, arg) void *arg; { #ifndef QUOTA - if ((cmds >> SUBCMDSHIFT) == Q_QUOTAON) - vfs_unbusy(mp); - return (EOPNOTSUPP); #else struct thread *td; @@ -115,21 +112,24 @@ ufs_quotactl(mp, cmds, id, arg) break; default: - if (cmd == Q_QUOTAON) - vfs_unbusy(mp); return (EINVAL); } } if ((u_int)type >= MAXQUOTAS) { - if (cmd == Q_QUOTAON) - vfs_unbusy(mp); return (EINVAL); } + /* + * Make sure we're not unmounting. + */ + error = vfs_busy(mp); + if (error) + return (error); + switch (cmd) { case Q_QUOTAON: error = quotaon(td, mp, type, arg); - break; + goto done; /* quotaon does vfs_unbusy itself */ case Q_QUOTAOFF: error = quotaoff(td, mp, type); @@ -171,6 +171,8 @@ ufs_quotactl(mp, cmds, id, arg) error = EINVAL; break; } + vfs_unbusy(mp); +done: return (error); #endif } From owner-freebsd-fs@freebsd.org Sat Mar 12 01:58:39 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 18719ACCD4A for ; Sat, 12 Mar 2016 01:58:39 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 08D94C99 for ; Sat, 12 Mar 2016 01:58:39 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u2C1wcLP066538 for ; Sat, 12 Mar 2016 01:58:39 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 204643] [msdosfs] [panic] Crash while accessing files with large, non-english names Date: Sat, 12 Mar 2016 01:58:39 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 9.3-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: gordon778@mail.ru X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Mar 2016 01:58:39 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D204643 --- Comment #1 from Alexey --- This bug is reproducible on FreeBSD 10.2-RELEASE (and others i guess): KDB: stack backtrace: #0 0xffffffff80984e30 at kdb_backtrace+0x60 #1 0xffffffff809489e6 at vpanic+0x126 #2 0xffffffff809488b3 at panic+0x43 #3 0xffffffff80976462 at __stack_chk_fail+0x12 #4 0xffffffff8083c652 at msdosfs_readdir+0x782 #5 0xffffffff80e731c7 at VOP_READDIR_APV+0xa7 #6 0xffffffff809f72bc at kern_getdirentries+0x21c #7 0xffffffff809f7078 at sys_getdirentries+0x28 #8 0xffffffff80d4b3a7 at amd64_syscall+0x357 uname: FreeBSD HP635 10.2-RELEASE-p7 FreeBSD 10.2-RELEASE-p7 #0: Mon Nov 2 14:19:= 39 UTC 2015 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERI= C=20 amd64 dmesg: Copyright (c) 1992-2015 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 10.2-RELEASE-p7 #0: Mon Nov 2 14:19:39 UTC 2015 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512 VT: running with driver "vga". info: [drm] Initialized drm 1.1.0 20060810 CPU: AMD E-300 APU with Radeon(tm) HD Graphics (1297.27-MHz K8-class CPU) Origin=3D"AuthenticAMD" Id=3D0x500f20 Family=3D0x14 Model=3D0x2 Stepp= ing=3D0 =20 Features=3D0x178bfbff Features2=3D0x802209 AMD Features=3D0x2e500800 AMD Features2=3D0x35ff SVM: (disabled in BIOS) NP,NRIP,NAsids=3D8 TSC: P-state invariant, performance statistics real memory =3D 6442450944 (6144 MB) avail memory =3D 5782937600 (5515 MB) Event timer "LAPIC" quality 400 ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0: Changing APIC ID to 4 ioapic0 irqs 0-23 on motherboard random: initialized module_register_init: MOD_LOAD (vesa, 0xffffffff80db8e60, 0) error 19 kbd1 at kbdmux0 acpi0: on motherboard acpi0: Power Button (fixed) cpu0: on acpi0 cpu1: on acpi0 hpet0: iomem 0xfed00000-0xfed003ff irq 0,8 on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 950 Event timer "HPET" frequency 14318180 Hz quality 550 Event timer "HPET1" frequency 14318180 Hz quality 450 atrtc0: port 0x70-0x71 on acpi0 Event timer "RTC" frequency 32768 Hz quality 0 attimer0: port 0x40-0x43 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <32-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 acpi_ec0: port 0x62,0x66 on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 vgapci0: port 0x4000-0x40ff mem 0xe0000000-0xefffffff,0xf0400000-0xf043ffff irq 18 at device 1.0 on pci0 acpi_video0: on vgapci0 drmn0: on vgapci0 info: [drm] RADEON_IS_PCIE info: [drm] initializing kernel modesetting (PALM 0x1002:0x9802 0x103C:0x35= 77). info: [drm] register mmio base: 0xF0400000 info: [drm] register mmio size: 262144 info: [drm] radeon_atrm_get_bios: =3D=3D=3D> Try ATRM... info: [drm] radeon_atrm_get_bios: IGP card detected, skipping this method... info: [drm] radeon_acpi_vfct_bios: =3D=3D=3D> Try VFCT... info: [drm] radeon_acpi_vfct_bios: Get "VFCT" ACPI table info: [drm] radeon_acpi_vfct_bios: Failed to get "VFCT" table: AE_NOT_FOUND info: [drm] igp_read_bios_from_vram: =3D=3D=3D> Try IGP's VRAM... info: [drm] igp_read_bios_from_vram: VRAM base address: 0xe0000000 info: [drm] igp_read_bios_from_vram: Map address: 0xfffff800e0000000 (262144 bytes) info: [drm] igp_read_bios_from_vram: Incorrect BIOS signature: 0x0000 info: [drm] radeon_read_bios: =3D=3D=3D> Try PCI Expansion ROM... info: [drm] radeon_read_bios: Map address: 0xfffff800000c0000 (131072 bytes) info: [drm] ATOM BIOS: HP drmn0: info: VRAM: 384M 0x0000000000000000 - 0x0000000017FFFFFF (384M used) drmn0: info: GTT: 512M 0x0000000018000000 - 0x0000000037FFFFFF info: [drm] Detected VRAM RAM=3D384M, BAR=3D256M info: [drm] RAM width 32bits DDR [TTM] Zone kernel: Available graphics memory: 2924656 kiB [TTM] Zone dma32: Available graphics memory: 2097152 kiB [TTM] Initializing pool allocator info: [drm] radeon: 384M of VRAM memory ready info: [drm] radeon: 512M of GTT memory ready. info: [drm] Supports vblank timestamp caching Rev 1 (10.10.2010). info: [drm] Driver supports precise vblank timestamp query. info: [drm] MSI enabled 1 message(s) drmn0: info: radeon: using MSI. info: [drm] radeon: irq initialized. info: [drm] GART: num cpu pages 131072, num gpu pages 131072 info: [drm] Loading PALM Microcode info: [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). drmn0: info: WB enabled drmn0: info: fence driver on ring 0 use gpu addr 0x0000000018000c00 and cpu addr 0x0xfffff80002f61c00 drmn0: info: fence driver on ring 3 use gpu addr 0x0000000018000c0c and cpu addr 0x0xfffff80002f61c0c info: [drm] ring test on 0 succeeded in 1 usecs info: [drm] ring test on 3 succeeded in 1 usecs info: [drm] ib test on ring 0 succeeded in 0 usecs info: [drm] ib test on ring 3 succeeded in 0 usecs info: [drm] radeon_device_init: Taking over the fictitious range 0xe0000000-0xf0000000 iicbus0: on iicbb0 addr 0xff iic0: on iicbus0 iicbus1: on iicbb1 addr 0x0 iic1: on iicbus1 iicbus2: on iicbb2 addr 0x0 iic2: on iicbus2 iicbus3: on iicbb3 addr 0x0 iic3: on iicbus3 iicbus4: on iicbb4 addr 0x0 iic4: on iicbus4 iicbus5: on iicbb5 addr 0x0 iic5: on iicbus5 iicbus6: on iicbb6 addr 0x0 iic6: on iicbus6 iicbus7: on iicbb7 addr 0x0 iic7: on iicbus7 info: [drm] Radeon Display Connectors info: [drm] Connector 0: info: [drm] LVDS-1 info: [drm] HPD1 info: [drm] DDC: 0x6430 0x6430 0x6434 0x6434 0x6438 0x6438 0x643c 0x643c info: [drm] Encoders: info: [drm] LCD1: INTERNAL_UNIPHY info: [drm] Connector 1: info: [drm] HDMI-A-1 info: [drm] HPD2 info: [drm] DDC: 0x6440 0x6440 0x6444 0x6444 0x6448 0x6448 0x644c 0x644c info: [drm] Encoders: info: [drm] DFP1: INTERNAL_UNIPHY info: [drm] Connector 2: info: [drm] VGA-1 info: [drm] DDC: 0x64d8 0x64d8 0x64dc 0x64dc 0x64e0 0x64e0 0x64e4 0x64e4 info: [drm] Encoders: info: [drm] CRT1: INTERNAL_KLDSCP_DAC1 info: [drm] Internal thermal controller without fan control info: [drm] radeon: power management initialized info: [drm] Connector LVDS-1: get mode from tunables: info: [drm] - kern.vt.fb.modes.LVDS-1 info: [drm] - kern.vt.fb.default_mode info: [drm] Connector HDMI-A-1: get mode from tunables: info: [drm] - kern.vt.fb.modes.HDMI-A-1 info: [drm] - kern.vt.fb.default_mode info: [drm] Connector VGA-1: get mode from tunables: info: [drm] - kern.vt.fb.modes.VGA-1 info: [drm] - kern.vt.fb.default_mode info: [drm] fb mappable at 0xE0142000 info: [drm] vram apper at 0xE0000000 info: [drm] size 4325376 info: [drm] fb depth is 24 info: [drm] pitch is 5632 fbd0 on drmn0 VT: Replacing driver "vga" with new "fb". error: [drm:pid0:radeon_acpi_init] *ERROR* Cannot find a backlight controll= er info: [drm] Initialized radeon 2.29.0 20080528 for drmn0 on minor 0 vgapci0: Boot video device hdac0: mem 0xf0444000-0xf0447fff irq 19 at de= vice 1.1 on pci0 ahci0: port 0x4118-0x411f,0x4124-0x4127,0x4110-0x4117,0x4120-0x4123,0x4100-0x410f mem 0xf044d000-0xf044d3ff irq 19 at device 17.0 on pci0 ahci0: AHCI v1.20 with 2 6Gbps ports, Port Multiplier supported ahci0: quirks=3D0x22000 ahcich0: at channel 0 on ahci0 ahcich1: at channel 1 on ahci0 ohci0: mem 0xf044c000-0xf044cfff irq= 18 at device 18.0 on pci0 usbus0 on ohci0 ehci0: mem 0xf044b000-0xf044b0ff= irq 17 at device 18.2 on pci0 usbus1: EHCI version 1.0 usbus1 on ehci0 hdac1: mem 0xf0440000-0xf0443fff irq 16 at device 20.2 on pci0 isab0: at device 20.3 on pci0 isa0: on isab0 pcib1: at device 20.4 on pci0 pci1: on pcib1 ohci1: mem 0xf044a000-0xf044afff irq= 18 at device 20.5 on pci0 usbus2 on ohci1 pcib2: at device 21.0 on pci0 pci2: on pcib2 pci2: at device 0.0 (no driver attached) pcib3: at device 21.1 on pci0 pci6: on pcib3 re0: port 0x2000-0x20ff mem 0xf0104000-0xf0104fff,0xf0100000-0xf0103fff irq 21 at device 0.0 on pci6 re0: Using 1 MSI-X message re0: ASPM disabled re0: Chip rev. 0x40800000 re0: MAC rev. 0x00200000 miibus0: on re0 rlphy0: PHY 1 on miibus0 rlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto, auto-flow re0: Using defaults for TSO: 65518/35/2048 re0: Ethernet address: ac:16:2d:53:99:72 pcib4: at device 21.3 on pci0 pci7: on pcib4 ath0: mem 0xf0200000-0xf020ffff irq 23 at device 0.0 on pci7 [ath] AR9285E_20 detected; using XE TX gain tables [ath] AR9285 Main LNA config: LNA1 [ath] AR9285 Alt LNA config: LNA2 [ath] LNA diversity disabled, Diversity disabled ath0: [HT] enabling HT modes ath0: [HT] 1 stream STBC receive enabled ath0: [HT] 1 RX streams; 1 TX streams ath0: AR9285 mac 192.2 RF5133 phy 14.0 ath0: 2GHz radio: 0x0000; 5GHz radio: 0x00c0 ohci2: mem 0xf0449000-0xf0449fff irq= 18 at device 22.0 on pci0 usbus3 on ohci2 ehci1: mem 0xf0448000-0xf04480ff= irq 17 at device 22.2 on pci0 usbus4: EHCI version 1.0 usbus4 on ehci1 acpi_wmi0: on acpi0 acpi_hp0: on acpi_wmi0 acpi_hp0: HP event GUID detected, installing event handler acpi_acad0: on acpi0 acpi_lid0: on acpi0 acpi_button0: on acpi0 acpi_tz0: on acpi0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] psm0: irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: model Generic PS/2 mouse, device ID 0 battery0: on acpi0 amdsbwd0: at iomem 0xfec000f0-0xfec000f3,0xfec000f4-0xfec000f7 on isa0 ppc0: cannot reserve I/O port range hwpstate0: on cpu0 random: unblocking device. usbus0: 12Mbps Full Speed USB v1.0 fuse-freebsd: version 0.4.4, FUSE ABI 7.8 Timecounters tick every 1.000 msec ipfw2 (+ipv6) initialized, divert loadable, nat loadable, default to deny, logging disabled hdacc0: at cad 0 on hdac0 hdaa0: at nid 1 on hdacc0 pcm0: at nid 3 on hdaa0 hdacc1: at cad 0 on hdac1 hdaa1: at nid 1 on hdacc1 pcm1: at nid 20,33 and 24 on hdaa1 pcm2: at nid 18 on hdaa1 ugen0.1: at usbus0 uhub0: on usbus0 usbus1: 480Mbps High Speed USB v2.0 usbus2: 12Mbps Full Speed USB v1.0 usbus3: 12Mbps Full Speed USB v1.0 usbus4: 480Mbps High Speed USB v2.0 ugen4.1: at usbus4 uhub1: on usbus4 ugen3.1: at usbus3 uhub2: on usbus3 ugen2.1: at usbus2 uhub3: on usbus2 ugen1.1: at usbus1 uhub4: on usbus1 ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: ATA8-ACS SATA 2.x device ada0: Serial Number Y1J9C50MT ada0: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes) ada0: Command Queueing enabled ada0: 305245MB (625142448 512 byte sectors: 16H 18S/T 16383C) ada0: Previously was known as ad4 cd0 at ahcich1 bus 0 scbus1 target 0 lun 0 cd0: Removable CD-ROM SCSI device cd0: Serial Number 696212041341 cd0: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes) cd0: Attempt to query device size failed: NOT READY, Medium not present - t= ray closed SMP: AP CPU #1 Launched! Timecounter "TSC" frequency 1297265685 Hz quality 800 uhub3: 2 ports with 2 removable, self powered uhub2: 4 ports with 4 removable, self powered uhub0: 5 ports with 5 removable, self powered GEOM_JOURNAL: Journal 4122145597: ada0s1a contains data. GEOM_JOURNAL: Journal 4122145597: ada0s1a contains journal. GEOM_JOURNAL: Journal ada0s1a clean. GEOM_JOURNAL: Journal 3177482727: ada0s1d contains data. GEOM_JOURNAL: Journal 3177482727: ada0s1d contains journal. GEOM_JOURNAL: Journal ada0s1d clean. GEOM_JOURNAL: Journal 1284059668: ada0s1e contains data. GEOM_JOURNAL: Journal 1284059668: ada0s1e contains journal. GEOM_JOURNAL: Journal ada0s1e clean. GEOM_JOURNAL: Journal 3655574912: ada0s1f contains data. GEOM_JOURNAL: Journal 3655574912: ada0s1f contains journal. GEOM_JOURNAL: Journal ada0s1f clean. Trying to mount root from ufs:/dev/ada0s1a.journal [rw,async]... ugen2.2: at usbus2 uhub1: 4 ports with 4 removable, self powered uhub4: 5 ports with 5 removable, self powered ugen4.2: at usbus4 ugen0.2: at usbus0 wlan0: Ethernet address: 9c:b7:0d:f7:0e:2e ums0: on usbus0 ums0: 3 buttons and [XYZ] coordinates ID=3D0 re0: link state changed to DOWN re0: link state changed to UP pid 1102 (firefox), uid 1001: exited on signal 10 (core dumped) ugen1.2: at usbus1 umass0: on usbus1 umass0: SCSI over Bulk-Only; quirks =3D 0x4100 umass0:2:0:-1: Attached to scbus2 da0 at umass-sim0 bus 0 scbus2 target 0 lun 0 da0: Removable Direct Access SPC-2 SCSI device da0: Serial Number 425839303336594C5433 da0: 40.000MB/s transfers da0: Attempt to query device size failed: NOT READY, Medium not present da0: quirks=3D0x2 panic: stack overflow detected; backtrace may be corrupted cpuid =3D 0 KDB: stack backtrace: #0 0xffffffff80984e30 at kdb_backtrace+0x60 #1 0xffffffff809489e6 at vpanic+0x126 #2 0xffffffff809488b3 at panic+0x43 #3 0xffffffff80976462 at __stack_chk_fail+0x12 #4 0xffffffff8083c652 at msdosfs_readdir+0x782 #5 0xffffffff80e731c7 at VOP_READDIR_APV+0xa7 #6 0xffffffff809f72bc at kern_getdirentries+0x21c #7 0xffffffff809f7078 at sys_getdirentries+0x28 #8 0xffffffff80d4b3a7 at amd64_syscall+0x357 --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Sat Mar 12 02:29:47 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D1A06ACD92A for ; Sat, 12 Mar 2016 02:29:47 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C263098E for ; Sat, 12 Mar 2016 02:29:47 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id u2C2Tlde040302 for ; Sat, 12 Mar 2016 02:29:47 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 204643] [msdosfs] [panic] Crash while accessing files with large, non-english names Date: Sat, 12 Mar 2016 02:29:47 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 9.3-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: kp@freebsd.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Mar 2016 02:29:47 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D204643 Kristof Provost changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kp@freebsd.org --- Comment #2 from Kristof Provost --- The cause is fairly obvious. In msdosfs_readdir() we use dos2unixfn() to translate the file name. The translation can increase the length of the filename, presumably this happened with the non-english name in this case. The output is stored in a (stored on the stack) struct dirent, where the d_= name has a maximum length of 255 bytes. dos2unixfn() has no length limit, so it = can overflow the d_name. This triggers the stack corruption protection. Fortunately, or this might be an exploitable bug. Fixing it is a little annoying, because it could conceivably lead to two directory names being translated into the same string. --=20 You are receiving this mail because: You are the assignee for the bug.= From owner-freebsd-fs@freebsd.org Sat Mar 12 03:44:16 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DD634ACD25C for ; Sat, 12 Mar 2016 03:44:16 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 89017C7C for ; Sat, 12 Mar 2016 03:44:16 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id u2C3iB4g075678 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Sat, 12 Mar 2016 05:44:11 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua u2C3iB4g075678 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id u2C3iA9t075677; Sat, 12 Mar 2016 05:44:10 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 12 Mar 2016 05:44:10 +0200 From: Konstantin Belousov To: Chris Torek Cc: freebsd-fs@freebsd.org Subject: Re: quotactl bug: vfs_busy never unbusy-es Message-ID: <20160312034410.GF1741@kib.kiev.ua> References: <20160310162916.GB1741@kib.kiev.ua> <201603112315.u2BNFsc0059323@elf.torek.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201603112315.u2BNFsc0059323@elf.torek.net> User-Agent: Mutt/1.5.24 (2015-08-30) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Mar 2016 03:44:17 -0000 On Fri, Mar 11, 2016 at 03:15:54PM -0800, Chris Torek wrote: > >Lets fix nullfs too. > > And unionfs? :-) > > >The uncomfortable issue with the interface is that > >it is not always possible to busy mount point after unbusy. > > Yes. > > >I propose to assume that for ENOENT error the busy ref is lost > >(this is the only possible error from vfs_busy()). > > This seems pretty scary. Also: > > >I also verified that UFS quota code > >does not return this error for !quotaon case. > > OK, but it certainly can for the quotaon case (though we're > protected by the fact that quotaon already had to do the unbusy > so it doesn't matter now). Anyway, we are getting kind of > complicated here. > > I think we can fix this with a fairly tiny KPI change. Note that > the code currently only works for UFS. Suppose we move the vfs_busy > to happen later (with potential failure, i.e., quotactl() call may > now return ENOENT because it can't busy the file system because > it's being unmounted -- this seems pretty harmless as it only > occurs in a race between quotactl() and unmount, and the caller > could have lost that race anyway). > > I'm not 100% sure this is correct, but it seems a bit less > scary to me :-) But note that it is COMPLETELY untested (not > even compiled), I'm just seeing if you think this is a reasonable > approach. Basically, we're just moving the vfs_busy/unbusy into > UFS itself, and permitting it to fail at that point (for all > ops, not just Q_QUOTAON). Yes, if it work out to pass only mnt_ref-referenced mount point down to the VFS_QUOTACTL method, I am fine with it. Also, I do not see why would it not work, in the sense that the failure modes due to parallel unmount are exactly the same for current code and what you propose. Please finish this. > > (Later we can refactor all this to be usable from, e.g., tmpfs.) > > Chris > > diff --git a/sys/kern/vfs_syscalls.c b/sys/kern/vfs_syscalls.c > index 11813fc..e1ab670 100644 > --- a/sys/kern/vfs_syscalls.c > +++ b/sys/kern/vfs_syscalls.c > @@ -173,6 +173,11 @@ sys_quotactl(td, uap) > AUDIT_ARG_UID(uap->uid); > if (!prison_allow(td->td_ucred, PR_ALLOW_QUOTAS)) > return (EPERM); > + /* > + * Reference the mount point so that it will remain in core > + * across the VFS_QUOTACTL call. We used to vfs_busy() it > + * here, but now we leave that to the underlying file system. > + */ > NDINIT(&nd, LOOKUP, FOLLOW | LOCKLEAF | AUDITVNODE1, UIO_USERSPACE, > uap->path, td); > if ((error = namei(&nd)) != 0) > @@ -181,25 +186,8 @@ sys_quotactl(td, uap) > mp = nd.ni_vp->v_mount; > vfs_ref(mp); > vput(nd.ni_vp); > - error = vfs_busy(mp, 0); > - vfs_rel(mp); > - if (error != 0) > - return (error); > error = VFS_QUOTACTL(mp, uap->cmd, uap->uid, uap->arg); > - > - /* > - * Since quota on operation typically needs to open quota > - * file, the Q_QUOTAON handler needs to unbusy the mount point > - * before calling into namei. Otherwise, unmount might be > - * started between two vfs_busy() invocations (first is our, > - * second is from mount point cross-walk code in lookup()), > - * causing deadlock. > - * > - * Require that Q_QUOTAON handles the vfs_busy() reference on > - * its own, always returning with ubusied mount point. > - */ > - if ((uap->cmd >> SUBCMDSHIFT) != Q_QUOTAON) > - vfs_unbusy(mp); > + vfs_rel(mp); > return (error); > } > > diff --git a/sys/ufs/ufs/ufs_vfsops.c b/sys/ufs/ufs/ufs_vfsops.c > index 5bb73ea..fc5f5bb 100644 > --- a/sys/ufs/ufs/ufs_vfsops.c > +++ b/sys/ufs/ufs/ufs_vfsops.c > @@ -92,9 +92,6 @@ ufs_quotactl(mp, cmds, id, arg) > void *arg; > { > #ifndef QUOTA > - if ((cmds >> SUBCMDSHIFT) == Q_QUOTAON) > - vfs_unbusy(mp); > - > return (EOPNOTSUPP); > #else > struct thread *td; > @@ -115,21 +112,24 @@ ufs_quotactl(mp, cmds, id, arg) > break; > > default: > - if (cmd == Q_QUOTAON) > - vfs_unbusy(mp); > return (EINVAL); > } > } > if ((u_int)type >= MAXQUOTAS) { > - if (cmd == Q_QUOTAON) > - vfs_unbusy(mp); > return (EINVAL); > } > > + /* > + * Make sure we're not unmounting. > + */ > + error = vfs_busy(mp); > + if (error) > + return (error); > + > switch (cmd) { > case Q_QUOTAON: > error = quotaon(td, mp, type, arg); > - break; > + goto done; /* quotaon does vfs_unbusy itself */ > > case Q_QUOTAOFF: > error = quotaoff(td, mp, type); > @@ -171,6 +171,8 @@ ufs_quotactl(mp, cmds, id, arg) > error = EINVAL; > break; > } > + vfs_unbusy(mp); > +done: > return (error); > #endif > } From owner-freebsd-fs@freebsd.org Sat Mar 12 05:06:10 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6506DACC80D for ; Sat, 12 Mar 2016 05:06:10 +0000 (UTC) (envelope-from jkh@ixsystems.com) Received: from barracuda.ixsystems.com (barracuda.ixsystems.com [12.229.62.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.ixsystems.com", Issuer "Go Daddy Secure Certificate Authority - G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 47555E35 for ; Sat, 12 Mar 2016 05:06:09 +0000 (UTC) (envelope-from jkh@ixsystems.com) X-ASG-Debug-ID: 1457759169-08ca04178713b070001-3nHGF7 Received: from zimbra.ixsystems.com ([10.246.0.20]) by barracuda.ixsystems.com with ESMTP id crwrNLouFIkn3NRW (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Fri, 11 Mar 2016 21:06:09 -0800 (PST) X-Barracuda-Envelope-From: jkh@ixsystems.com X-Barracuda-RBL-Trusted-Forwarder: 10.246.0.20 X-ASG-Whitelist: Client Received: from localhost (localhost [127.0.0.1]) by zimbra.ixsystems.com (Postfix) with ESMTP id 53D60AFDED3; Fri, 11 Mar 2016 21:06:09 -0800 (PST) Received: from zimbra.ixsystems.com ([127.0.0.1]) by localhost (zimbra.ixsystems.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id I70x5kEBTT1P; Fri, 11 Mar 2016 21:06:09 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.ixsystems.com (Postfix) with ESMTP id 205A2AFDFCE; Fri, 11 Mar 2016 21:06:09 -0800 (PST) X-Virus-Scanned: amavisd-new at ixsystems.com Received: from zimbra.ixsystems.com ([127.0.0.1]) by localhost (zimbra.ixsystems.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id nWWAUddwD8eZ; Fri, 11 Mar 2016 21:06:09 -0800 (PST) Received: from [10.8.0.66] (unknown [10.8.0.66]) by zimbra.ixsystems.com (Postfix) with ESMTPSA id D847CAFDED3; Fri, 11 Mar 2016 21:06:08 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: quotactl bug: vfs_busy never unbusy-es From: Jordan Hubbard X-ASG-Orig-Subj: Re: quotactl bug: vfs_busy never unbusy-es In-Reply-To: <20160312034410.GF1741@kib.kiev.ua> Date: Fri, 11 Mar 2016 21:06:03 -0800 Cc: Chris Torek , freebsd-fs@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <5E893C41-BEF1-4A9B-8E32-6D8E12356252@ixsystems.com> References: <20160310162916.GB1741@kib.kiev.ua> <201603112315.u2BNFsc0059323@elf.torek.net> <20160312034410.GF1741@kib.kiev.ua> To: Konstantin Belousov X-Mailer: Apple Mail (2.3124) X-Barracuda-Connect: UNKNOWN[10.246.0.20] X-Barracuda-Start-Time: 1457759169 X-Barracuda-Encrypted: ECDHE-RSA-AES256-GCM-SHA384 X-Barracuda-URL: https://10.246.0.26:443/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at ixsystems.com X-Barracuda-BRTS-Status: 1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Mar 2016 05:06:10 -0000 > On Mar 11, 2016, at 7:44 PM, Konstantin Belousov = wrote: >=20 > Yes, if it work out to pass only mnt_ref-referenced mount point down > to the VFS_QUOTACTL method, I am fine with it. Also, I do not see why > would it not work, in the sense that the failure modes due to parallel > unmount are exactly the same for current code and what you propose. >=20 > Please finish this. He will. :-) iXsystems is happy to sponsor this work as it directly benefits us as = well, and I=E2=80=99ve been very happy to see you engaged with the = review process. With any luck, Chris will be able to finish this and = then unwind his stack back to the point where he can finish wiring up = NFS quotas to ZFS, which was the original feature request that started = us down this path. - Jordan From owner-freebsd-fs@freebsd.org Sat Mar 12 18:27:05 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 077DAACEA2E for ; Sat, 12 Mar 2016 18:27:05 +0000 (UTC) (envelope-from Martin.vGagern@gmx.net) Received: from mout.gmx.net (mout.gmx.net [212.227.17.22]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6003F77E for ; Sat, 12 Mar 2016 18:27:04 +0000 (UTC) (envelope-from Martin.vGagern@gmx.net) Received: from [192.168.71.20] ([78.55.160.25]) by mail.gmx.com (mrgmx101) with ESMTPSA (Nemesis) id 0LsgvV-1ZhmXf1PMv-012JiD for ; Sat, 12 Mar 2016 19:27:02 +0100 To: freebsd-fs@freebsd.org From: Martin von Gagern Subject: State of ZFS xattr support in FreeBSD Message-ID: <56E45F75.2000905@gmx.net> Date: Sat, 12 Mar 2016 19:27:01 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Provags-ID: V03:K0:Dm7VQEQNMg/r8PRDi7omfRFODWCAZySSTLkUBvU6aDgxqKBweOz gFO2a5Uk7/26v6Qe5brM7gerMJJSy6WTkPVR8rbG4KJpEbuMdMXim9xg0xEkRT+9pv9StJd gYv9VotPAbZbBNR1xI8F+Xa/3ICQPiNW+f/WNG+po9UqI8QUuwbnCs6ovJwbQz/vmQEgC27 tGJ5zpt8ImZhBPq1BEmUg== X-UI-Out-Filterresults: notjunk:1;V01:K0:4m3pI/Kfd6Y=:lMHrWr+patl7jUfwPt3HLs 5Ts/N/c9296FgaH3JQkiqqW2m8SVUxVKVJ90PM7IO7R9R4xI9IwB8GCQ+IvlIiMoAjjGMURfW Zdw3T3CMEi1mAZCwn6xG7qlvwH07CnyyEOVsYl6j7VG/e2Jz2gGppKtlAGV5mJAhPNg3W0saW cJo2lP4axAANj2mxNvUZyzbAPWqE6FAeurhI9HlpNf+WiSqEbM+zyTzi2XgiAt+FkUrn5ZVWX mciADpaUIbL82CP7IkyT0ZSr7xOyFYG62FYz0dXiI9ubyZ9a58gHI11u5xkvugqRX8PxwIUhZ BZJU3QthvgTllEQkQXuc5t4EWzWDpXBxypqu6eBuavV01ER88z4puQLfsNyKVukL6LmZGbjsw A6yaWXHWPUnSbMQE2uKAaMraNIgyVVQQOEbG4MLKgnTdG4UkjZ17fvkZJf1zSD2I7r3WrClSq 1BVP+3dLmpmX8VDHhd+wsFOdhY5PtKv6KtkaiLcF+GYRVoY7U1jTe3H5Vu95ho2vXOToaPUlp fujcy8w9zhkiXOH0O5WIaQ0tp3Lq9tLRIEu25BUS3jQ+uha7b9fPuY3ZQktCacMhKL4f4dEoP 41EueqhDhqzGDdeUfjQSbFaSHxsXfB9c4q1DtrDGTN8Yo71fIVXmnzglYIs7L9Hx26DaepsRL XMG0nuBTj0a1rDKMyRITZ2lHjMH1WTVjeYxfKF0TxMaNkVe2nbqaKHmiSf1BNekezezyaRH+K Kq6LMZoLb3ESVEKBRFTMx/g5Ugqa+t1OXF+k4j5zk8pCEx+AoXwVkWmQcyuBAsHveOas5KHXz NH+5gTl X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Mar 2016 18:27:05 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi! I'm trying to work out whether or not, or rather to what extend, xattrs are supported in FreeBSD using ZFS. I've read some conflicting information. 1. zfs get xattr lists it as on (default) for /, /usr and /var, but as off (temporary) for all other datasets, including children of those mentioned above. 2. Running zfs set xattr=on zroot/usr/home I get the message “property 'xattr' not supported on FreeBSD: permission denied.” 3. This agrees with the zfs man page: “The xattr property is currently not supported on FreeBSD.” 4. setextattr, getextattr and lsextattr seem to work well enough. 5. I also managed to save and restore a device file node using “rsync --fake-super”, and could see its data using lsextattr and getextattr. 6. Wikipedia has some discussion in the xattr talk page. Apparently there once was a claim that ZFS supports xattr since FreeBSD 8, but that was removed later on, with reference to the manpage (see 3.). Currently I get the impression that extended attributes on zfs work in practice, but that the xattr property which would control their use does not work as it would in other zfs distributions. But I'd like to hear that confirmed (or corrected) before I trust large amounts of backup data to an rsync --fake-super running on such a machine. I'd rather not lose all my metadata due to known xattr problems. If it matters, this is a very fresh FreeBSD 10.2 install I just set up, with ZFS set up by the installer. I posted this question in other places before: http://unix.stackexchange.com/q/266913/20807 (currently with a bounty) https://forums.freebsd.org/threads/55418/ Feel free to cross-post your answers to these. If you don't, I'll probably post a summary myself. Thank you very much, Martin von Gagern -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlbkX24ACgkQRhp6o4m9dFtzQACaAkCbCR2Yi1PyicZ61PLM4Ad+ 2pUAn0P8YzEQ3rOMfdD8MVnLP1PQ2CLu =Bazu -----END PGP SIGNATURE-----