From owner-freebsd-fs@FreeBSD.ORG Sun Dec 7 01:29:18 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id F1BAB80B for ; Sun, 7 Dec 2014 01:29:18 +0000 (UTC) Received: from mail-qc0-f172.google.com (mail-qc0-f172.google.com [209.85.216.172]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B4F32B21 for ; Sun, 7 Dec 2014 01:29:18 +0000 (UTC) Received: by mail-qc0-f172.google.com with SMTP id m20so2162773qcx.17 for ; Sat, 06 Dec 2014 17:29:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=VYLeDsVapRdsko+T1Cocy14vJoOd/HbDkkzfPtXA/BQ=; b=EoOAyDm1k0mBNF6yi3rmkdetSZH7dEECB8XczwpLKp2q3H5lOeSMoSuhcRBKulV+x0 DYwXzOpo2EMrDoR/hWybeWJQhLVA07kQMNxcc+xjFCouz0a8kz/CPijhs33NpVKtagZJ D/moKKRWeh0e9/vJA1WSmWM5WqMl9GdpEFB/b13vfSkQC+ZwHvXFvovr5w6JsZq84eUS kKOpxPF8GNgdWj36oCd0noPlVSr/ptNs9/m4gp5YKDT655RnuXzn6ehf21Uo9B068v4T Yt4vBY+zFD9pDXo77BZ+F3Azj8jm1I51OYpWKbOS+1IwlzramxVks/uZtbpzpuHzr5vj 9q9Q== X-Gm-Message-State: ALoCoQk96Oj8wJSYC1fDWwEowoSCvjViLh8ZQEnvjUSbzXNDYnGsPR509L6xVUlTMoU88fwm8lpu MIME-Version: 1.0 X-Received: by 10.224.121.142 with SMTP id h14mr39668810qar.80.1417915352252; Sat, 06 Dec 2014 17:22:32 -0800 (PST) Received: by 10.140.39.48 with HTTP; Sat, 6 Dec 2014 17:22:32 -0800 (PST) In-Reply-To: <54825E70.20900@sorbs.net> References: <54825E70.20900@sorbs.net> Date: Sat, 6 Dec 2014 18:22:32 -0700 Message-ID: Subject: Re: ZFS weird issue... From: Will Andrews To: Michelle Sullivan Content-Type: text/plain; charset=UTF-8 Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Dec 2014 01:29:19 -0000 On Fri, Dec 5, 2014 at 6:40 PM, Michelle Sullivan wrote: > Days later new drive to replace the dead drive arrived and was > inserted. System refused to re-add as there was data in the cache, so > rebooted and cleared the cache (as per many on web faq's) Reconfigured > it to match the others. Can't do a zpool replace mfid8 because that's > already in the pool... (was mfid9) can't use mfid15 because zpool > reports it's not part of the config... can't use the uniq-id it received > (can't find vdev) ... HELP!! :) [...] > root@colossus:~ # zpool status -v [...] > pool: sorbs > state: DEGRADED > status: One or more devices could not be opened. Sufficient replicas > exist for > the pool to continue functioning in a degraded state. > action: Attach the missing device and online it using 'zpool online'. > see: http://illumos.org/msg/ZFS-8000-2Q > scan: scrub in progress since Fri Dec 5 17:11:29 2014 > 2.51T scanned out of 29.9T at 89.4M/s, 89h7m to go > 0 repaired, 8.40% done > config: > > NAME STATE READ WRITE CKSUM > sorbs DEGRADED 0 0 0 > raidz2-0 DEGRADED 0 0 0 > mfid0 ONLINE 0 0 0 > mfid1 ONLINE 0 0 0 > mfid2 ONLINE 0 0 0 > mfid3 ONLINE 0 0 0 > mfid4 ONLINE 0 0 0 > mfid5 ONLINE 0 0 0 > mfid6 ONLINE 0 0 0 > mfid7 ONLINE 0 0 0 > spare-8 DEGRADED 0 0 0 > 1702922605 UNAVAIL 0 0 0 was /dev/mfid8 > mfid14 ONLINE 0 0 0 > mfid8 ONLINE 0 0 0 > mfid9 ONLINE 0 0 0 > mfid10 ONLINE 0 0 0 > mfid11 ONLINE 0 0 0 > mfid12 ONLINE 0 0 0 > mfid13 ONLINE 0 0 0 > spares > 933862663 INUSE was /dev/mfid14 > > errors: No known data errors > root@colossus:~ # uname -a > FreeBSD colossus.sorbs.net 9.2-RELEASE FreeBSD 9.2-RELEASE #0 r255898: > Thu Sep 26 22:50:31 UTC 2013 > root@bake.isc.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 [...] > root@colossus:~ # ls -l /dev/mfi* > crw-r----- 1 root operator 0x22 Dec 5 17:18 /dev/mfi0 > crw-r----- 1 root operator 0x68 Dec 5 17:18 /dev/mfid0 > crw-r----- 1 root operator 0x69 Dec 5 17:18 /dev/mfid1 > crw-r----- 1 root operator 0x78 Dec 5 17:18 /dev/mfid10 > crw-r----- 1 root operator 0x79 Dec 5 17:18 /dev/mfid11 > crw-r----- 1 root operator 0x7a Dec 5 17:18 /dev/mfid12 > crw-r----- 1 root operator 0x82 Dec 5 17:18 /dev/mfid13 > crw-r----- 1 root operator 0x83 Dec 5 17:18 /dev/mfid14 > crw-r----- 1 root operator 0x84 Dec 5 17:18 /dev/mfid15 > crw-r----- 1 root operator 0x6a Dec 5 17:18 /dev/mfid2 > crw-r----- 1 root operator 0x6b Dec 5 17:18 /dev/mfid3 > crw-r----- 1 root operator 0x6c Dec 5 17:18 /dev/mfid4 > crw-r----- 1 root operator 0x6d Dec 5 17:18 /dev/mfid5 > crw-r----- 1 root operator 0x6e Dec 5 17:18 /dev/mfid6 > crw-r----- 1 root operator 0x75 Dec 5 17:18 /dev/mfid7 > crw-r----- 1 root operator 0x76 Dec 5 17:18 /dev/mfid8 > crw-r----- 1 root operator 0x77 Dec 5 17:18 /dev/mfid9 > root@colossus:~ # Hi, >From the above it appears your replacement drive's current name is mfid15, and the spare is now mfid14. What commands did you run that failed? Can you provide a copy of the first label from 'zdb -l /dev/mfid0'? The label will provide you with the full vdev guid that you need to replace the original drive with a new one. Another thing you could do is wait for the spare to finish resilvering, then promote it to replace the original drive, and make your new one a spare. Considering the time required to resilver this pool configuration, that may be preferable for you. --Will. From owner-freebsd-fs@FreeBSD.ORG Sun Dec 7 10:17:18 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1E477C54 for ; Sun, 7 Dec 2014 10:17:18 +0000 (UTC) Received: from mx4.wp.pl (mx4.wp.pl [212.77.101.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.wp.pl", Issuer "RapidSSL SHA256 CA - G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6B598DA7 for ; Sun, 7 Dec 2014 10:17:16 +0000 (UTC) Received: (wp-smtpd smtp.wp.pl 20163 invoked from network); 7 Dec 2014 11:17:06 +0100 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wp.pl; s=1024a; t=1417947427; bh=DRMIuWycWjkHJQtTtQKVW5V7cx0hzOFkB6Yut0oD+FA=; h=From:To:Subject; b=hgF/mYkmQYVkEYHkcFeT197AhP1FBxxp9ChtauaQRbCQwH9vwZ/Rrn0vL35YiB/bA ZExTdvfl+vsfwKhvRTT/djPbOwfsCO2OVT8+Ba2gQCnJrhcYgTvAR5wYSYqftX2DoU hF1UmhwMycIs8LTVri+21grytrCrHIRnH5tPKaEc= Received: from afqf29.neoplus.adsl.tpnet.pl (HELO [10.0.0.227]) (ipluta@[178.42.161.29]) (envelope-sender ) by smtp.wp.pl (WP-SMTPD) with ECDHE-RSA-AES256-SHA encrypted SMTP for ; 7 Dec 2014 11:17:06 +0100 Message-ID: <54842919.3030204@wp.pl> Date: Sun, 07 Dec 2014 11:16:57 +0100 From: Ireneusz Pluta User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: A way to quick fix of "leaking lots of unreferenced inodes" Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-WP-AV: skaner antywirusowy poczty Wirtualnej Polski S. A. X-WP-SPAM: NO 0000000 [8TOk] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Dec 2014 10:17:18 -0000 Hello, I need to fix a server running the: 9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243825: Tue Dec 4 09:23:10 UTC 2012 which suffers a lot from the bug fixed with this commit: http://lists.freebsd.org/pipermail/svn-src-releng/2013-September/000114.html The main and practically only purpose of this machine is running a PostgreSQL server with serveral multi-hundred GB databases. And yes, I currently use the "procedure" do restart postgres, with umount -f /mountpoint/of/pgsql/data, every few weeks, to not let it accumulate too much leak over longer uptimes. To not to take a risk of eventual troubles with freebsd-update, and thus having the machine down for maybe more than expected half to one hour, I am thinking of making the fix quicker just by applying the - VOP_UNLOCK(nvp, 0); + vput(nvp); patch to the /usr/src/sys/ufs/ufs/ufs_vnops.c and rebuilding the kernel. But I want to make sure if just applying only this patch to the 9.1-RELEASE, and ignoring all other changes within src/ufs/ufs made before this commit, is enough and safe for the filesystem. As I browse commit log of my local git clone of freebsd repo between release/9.1.0 and this fix, I can see there was one more earlier change to the ufs_vnops.c itself: $ git log -U0 --oneline release/9.1.0..releng/9.2 sys/ufs/ufs/ufs_vnops.c 0ec41e1 MFS of 255104: MFC of 253998: diff --git a/sys/ufs/ufs/ufs_vnops.c b/sys/ufs/ufs/ufs_vnops.c index 487477c..b70166d 100644 --- a/sys/ufs/ufs/ufs_vnops.c +++ b/sys/ufs/ufs/ufs_vnops.c @@ -1274 +1274 @@ relock: - VOP_UNLOCK(nvp, 0); + vput(nvp); 3d32639 MFC r248422: Remove negative name cache entry pointing to the target name, which could be instantiated while tdvp was unlocked. diff --git a/sys/ufs/ufs/ufs_vnops.c b/sys/ufs/ufs/ufs_vnops.c index 03c8bb0..487477c 100644 --- a/sys/ufs/ufs/ufs_vnops.c +++ b/sys/ufs/ufs/ufs_vnops.c @@ -1564,0 +1565 @@ relock: + cache_purge_negative(tdvp); and even some more to the sys/ufs/ufs path, as listed at the end. So again this is my question: can I just put to my /usr/src/sys/ufs/ufs/ufs_vnops.c what has changed in http://lists.freebsd.org/pipermail/svn-src-releng/2013-September/000114.html, recompile, and not worry about all the other stuff? Thanks Irek. $ git log -U0 --oneline release/9.1.0..releng/9.2 sys/ufs/ufs | tee 0ec41e1 MFS of 255104: MFC of 253998: diff --git a/sys/ufs/ufs/ufs_vnops.c b/sys/ufs/ufs/ufs_vnops.c index 487477c..b70166d 100644 --- a/sys/ufs/ufs/ufs_vnops.c +++ b/sys/ufs/ufs/ufs_vnops.c @@ -1274 +1274 @@ relock: - VOP_UNLOCK(nvp, 0); + vput(nvp); a89175a Merge the second part of the unmapped I/O changes. This enables the infrastructure in the block layer and UFS filesystem as well as a few drivers. The list of MFC revisions is long, so I won't quote changelogs. diff --git a/sys/ufs/ufs/ufs_extern.h b/sys/ufs/ufs/ufs_extern.h index c590748..31a2ba8 100644 --- a/sys/ufs/ufs/ufs_extern.h +++ b/sys/ufs/ufs/ufs_extern.h @@ -123,0 +124 @@ void softdep_revert_rmdir(struct inode *, struct inode *); +#define BA_UNMAPPED 0x00040000 /* Do not mmap resulted buffer. */ 63c193a MFC of 248561: diff --git a/sys/ufs/ufs/ufs_lookup.c b/sys/ufs/ufs/ufs_lookup.c index 35fe8fd..8d11e24 100644 --- a/sys/ufs/ufs/ufs_lookup.c +++ b/sys/ufs/ufs/ufs_lookup.c @@ -1388 +1388,2 @@ static int -ufs_dir_dd_ino(struct vnode *vp, struct ucred *cred, ino_t *dd_ino) +ufs_dir_dd_ino(struct vnode *vp, struct ucred *cred, ino_t *dd_ino, + struct vnode **dd_vp) @@ -1390,0 +1392 @@ ufs_dir_dd_ino(struct vnode *vp, struct ucred *cred, ino_t *dd_ino) + struct vnode *ddvp; @@ -1392,0 +1395 @@ ufs_dir_dd_ino(struct vnode *vp, struct ucred *cred, ino_t *dd_ino) + ASSERT_VOP_LOCKED(vp, "ufs_dir_dd_ino"); @@ -1394,0 +1398,13 @@ ufs_dir_dd_ino(struct vnode *vp, struct ucred *cred, ino_t *dd_ino) + /* + * First check to see if we have it in the name cache. + */ + if ((ddvp = vn_dir_dd_ino(vp)) != NULL) { + KASSERT(ddvp->v_mount == vp->v_mount, + ("ufs_dir_dd_ino: Unexpected mount point crossing")); + *dd_ino = VTOI(ddvp)->i_number; + *dd_vp = ddvp; + return (0); + } + /* + * Have to read the directory. + */ @@ -1411,0 +1428 @@ ufs_dir_dd_ino(struct vnode *vp, struct ucred *cred, ino_t *dd_ino) + *dd_vp = NULL; @@ -1436 +1453 @@ ufs_checkpath(ino_t source_ino, ino_t parent_ino, struct inode *target, struct u - error = ufs_dir_dd_ino(vp, cred, &dd_ino); + error = ufs_dir_dd_ino(vp, cred, &dd_ino, &vp1); @@ -1447,15 +1464,7 @@ ufs_checkpath(ino_t source_ino, ino_t parent_ino, struct inode *target, struct u - error = VFS_VGET(mp, dd_ino, LK_SHARED | LK_NOWAIT, &vp1); - if (error != 0) { - *wait_ino = dd_ino; - break; - } - /* Recheck that ".." still points to vp1 after relock of vp */ - error = ufs_dir_dd_ino(vp, cred, &dd_ino); - if (error != 0) { - vput(vp1); - break; - } - /* Redo the check of ".." if directory was reparented */ - if (dd_ino != VTOI(vp1)->i_number) { - vput(vp1); - continue; + if (vp1 == NULL) { + error = VFS_VGET(mp, dd_ino, LK_SHARED | LK_NOWAIT, + &vp1); + if (error != 0) { + *wait_ino = dd_ino; + break; + } @@ -1462,0 +1472,2 @@ ufs_checkpath(ino_t source_ino, ino_t parent_ino, struct inode *target, struct u + KASSERT(dd_ino == VTOI(vp1)->i_number, + ("directory %d reparented\n", VTOI(vp1)->i_number)); @@ -1469,0 +1481,2 @@ ufs_checkpath(ino_t source_ino, ino_t parent_ino, struct inode *target, struct u + if (vp1 != NULL) + vput(vp1); 3d32639 MFC r248422: Remove negative name cache entry pointing to the target name, which could be instantiated while tdvp was unlocked. diff --git a/sys/ufs/ufs/ufs_vnops.c b/sys/ufs/ufs/ufs_vnops.c index 03c8bb0..487477c 100644 --- a/sys/ufs/ufs/ufs_vnops.c +++ b/sys/ufs/ufs/ufs_vnops.c @@ -1564,0 +1565 @@ relock: + cache_purge_negative(tdvp); b89ace2 MFC r247388: Work around the hold of references to the struct dquot by the freeblk workitems for some time at unmount. diff --git a/sys/ufs/ufs/ufs_quota.c b/sys/ufs/ufs/ufs_quota.c index c3789c3..88437c9 100644 --- a/sys/ufs/ufs/ufs_quota.c +++ b/sys/ufs/ufs/ufs_quota.c @@ -83 +83 @@ static int dqsync(struct vnode *, struct dquot *); -static void dqflush(struct vnode *); +static int dqflush(struct vnode *); @@ -683,2 +683,6 @@ again: - dqflush(qvp); - /* Clear um_quotas before closing the quota vnode to prevent + error = dqflush(qvp); + if (error != 0) + return (error); + + /* + * Clear um_quotas before closing the quota vnode to prevent @@ -1618 +1622 @@ out: -static void +static int @@ -1622,0 +1627 @@ dqflush(struct vnode *vp) + int error; @@ -1628,0 +1634 @@ dqflush(struct vnode *vp) + error = 0; @@ -1636,3 +1642,5 @@ dqflush(struct vnode *vp) - panic("dqflush: stray dquot"); - LIST_REMOVE(dq, dq_hash); - dq->dq_ump = (struct ufsmount *)0; + error = EBUSY; + else { + LIST_REMOVE(dq, dq_hash); + dq->dq_ump = NULL; + } @@ -1641,0 +1650 @@ dqflush(struct vnode *vp) + return (error); 3436e90 MFC r246562: diff --git a/sys/ufs/ufs/inode.h b/sys/ufs/ufs/inode.h index 51f0197..25142dd 100644 --- a/sys/ufs/ufs/inode.h +++ b/sys/ufs/ufs/inode.h @@ -154,4 +153,0 @@ struct inode { -#define MAXSYMLINKLEN(ip) \ - ((ip)->i_ump->um_fstype == UFS1) ? \ - ((NDADDR + NIADDR) * sizeof(ufs1_daddr_t)) : \ - ((NDADDR + NIADDR) * sizeof(ufs2_daddr_t)) 1572df8 MFC r239359: diff --git a/sys/ufs/ufs/inode.h b/sys/ufs/ufs/inode.h index 2b02000..51f0197 100644 --- a/sys/ufs/ufs/inode.h +++ b/sys/ufs/ufs/inode.h @@ -170 +169,0 @@ struct indir { - int in_exists; /* Flag if the block exists. */ diff --git a/sys/ufs/ufs/ufs_bmap.c b/sys/ufs/ufs/ufs_bmap.c index e0fb307..22887c8 100644 --- a/sys/ufs/ufs/ufs_bmap.c +++ b/sys/ufs/ufs/ufs_bmap.c @@ -215 +214,0 @@ ufs_bmaparray(vp, bn, bnp, nbp, runp, runb) - ap->in_exists = 1; @@ -360 +358,0 @@ ufs_getlbns(vp, bn, ap, nump) - ap->in_exists = 0; @@ -373 +370,0 @@ ufs_getlbns(vp, bn, ap, nump) - ap->in_exists = 0; a53e5a7 MFC r246299; diff --git a/sys/ufs/ufs/ufs_lookup.c b/sys/ufs/ufs/ufs_lookup.c index 56ca058..35fe8fd 100644 --- a/sys/ufs/ufs/ufs_lookup.c +++ b/sys/ufs/ufs/ufs_lookup.c @@ -1435 +1434,0 @@ ufs_checkpath(ino_t source_ino, ino_t parent_ino, struct inode *target, struct u - error = 0; 75f830b MFC r243245: diff --git a/sys/ufs/ufs/ufsmount.h b/sys/ufs/ufs/ufsmount.h index 6447dce..b55d958 100644 --- a/sys/ufs/ufs/ufsmount.h +++ b/sys/ufs/ufs/ufsmount.h @@ -100,0 +101 @@ struct ufsmount { + int um_writesuspended; /* suspension in progress */ bb61831 MFC r242476: The r241025 fixed the case when a binary, executed from nullfs mount, was still possible to open for write from the lower filesystem. There is a symmetric situation where the binary could already has file descriptors opened for write, but it can be executed from the nullfs overlay. diff --git a/sys/ufs/ufs/ufs_extattr.c b/sys/ufs/ufs/ufs_extattr.c index 777f385..51bef86 100644 --- a/sys/ufs/ufs/ufs_extattr.c +++ b/sys/ufs/ufs/ufs_extattr.c @@ -337 +337 @@ ufs_extattr_enable_with_open(struct ufsmount *ump, struct vnode *vp, - vp->v_writecount++; + VOP_ADD_WRITECOUNT(vp, 1); 521315f MFC r244239: Fix a typo, resulting in the NULL pointer dereference. diff --git a/sys/ufs/ufs/ufs_quota.c b/sys/ufs/ufs/ufs_quota.c index d353167..c3789c3 100644 --- a/sys/ufs/ufs/ufs_quota.c +++ b/sys/ufs/ufs/ufs_quota.c @@ -1055 +1055 @@ again: - MNT_VNODE_FOREACH_ALL_ABORT(mp, mvp); + MNT_VNODE_FOREACH_ACTIVE_ABORT(mp, mvp); From owner-freebsd-fs@FreeBSD.ORG Sun Dec 7 10:32:46 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 89B3BE62 for ; Sun, 7 Dec 2014 10:32:46 +0000 (UTC) Received: from hades.sorbs.net (hades.sorbs.net [67.231.146.201]) by mx1.freebsd.org (Postfix) with ESMTP id 7605CF22 for ; Sun, 7 Dec 2014 10:32:46 +0000 (UTC) MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII Received: from isux.com (firewall.isux.com [213.165.190.213]) by hades.sorbs.net (Oracle Communications Messaging Server 7.0.5.29.0 64bit (built Jul 9 2013)) with ESMTPSA id <0NG700FBMK5XRQ00@hades.sorbs.net> for freebsd-fs@freebsd.org; Sun, 07 Dec 2014 02:37:10 -0800 (PST) Message-id: <54842CC5.2020604@sorbs.net> Date: Sun, 07 Dec 2014 11:32:37 +0100 From: Michelle Sullivan User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.24) Gecko/20100301 SeaMonkey/1.1.19 To: Will Andrews Subject: Re: ZFS weird issue... References: <54825E70.20900@sorbs.net> In-reply-to: Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Dec 2014 10:32:46 -0000 Will Andrews wrote: > On Fri, Dec 5, 2014 at 6:40 PM, Michelle Sullivan wrote: > >> Days later new drive to replace the dead drive arrived and was >> inserted. System refused to re-add as there was data in the cache, so >> rebooted and cleared the cache (as per many on web faq's) Reconfigured >> it to match the others. Can't do a zpool replace mfid8 because that's >> already in the pool... (was mfid9) can't use mfid15 because zpool >> reports it's not part of the config... can't use the uniq-id it received >> (can't find vdev) ... HELP!! :) >> > [...] > >> root@colossus:~ # zpool status -v >> > [...] > >> pool: sorbs >> state: DEGRADED >> status: One or more devices could not be opened. Sufficient replicas >> exist for >> the pool to continue functioning in a degraded state. >> action: Attach the missing device and online it using 'zpool online'. >> see: http://illumos.org/msg/ZFS-8000-2Q >> scan: scrub in progress since Fri Dec 5 17:11:29 2014 >> 2.51T scanned out of 29.9T at 89.4M/s, 89h7m to go >> 0 repaired, 8.40% done >> config: >> >> NAME STATE READ WRITE CKSUM >> sorbs DEGRADED 0 0 0 >> raidz2-0 DEGRADED 0 0 0 >> mfid0 ONLINE 0 0 0 >> mfid1 ONLINE 0 0 0 >> mfid2 ONLINE 0 0 0 >> mfid3 ONLINE 0 0 0 >> mfid4 ONLINE 0 0 0 >> mfid5 ONLINE 0 0 0 >> mfid6 ONLINE 0 0 0 >> mfid7 ONLINE 0 0 0 >> spare-8 DEGRADED 0 0 0 >> 1702922605 UNAVAIL 0 0 0 was /dev/mfid8 >> mfid14 ONLINE 0 0 0 >> mfid8 ONLINE 0 0 0 >> mfid9 ONLINE 0 0 0 >> mfid10 ONLINE 0 0 0 >> mfid11 ONLINE 0 0 0 >> mfid12 ONLINE 0 0 0 >> mfid13 ONLINE 0 0 0 >> spares >> 933862663 INUSE was /dev/mfid14 >> >> errors: No known data errors >> root@colossus:~ # uname -a >> FreeBSD colossus.sorbs.net 9.2-RELEASE FreeBSD 9.2-RELEASE #0 r255898: >> Thu Sep 26 22:50:31 UTC 2013 >> root@bake.isc.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 >> > [...] > >> root@colossus:~ # ls -l /dev/mfi* >> crw-r----- 1 root operator 0x22 Dec 5 17:18 /dev/mfi0 >> crw-r----- 1 root operator 0x68 Dec 5 17:18 /dev/mfid0 >> crw-r----- 1 root operator 0x69 Dec 5 17:18 /dev/mfid1 >> crw-r----- 1 root operator 0x78 Dec 5 17:18 /dev/mfid10 >> crw-r----- 1 root operator 0x79 Dec 5 17:18 /dev/mfid11 >> crw-r----- 1 root operator 0x7a Dec 5 17:18 /dev/mfid12 >> crw-r----- 1 root operator 0x82 Dec 5 17:18 /dev/mfid13 >> crw-r----- 1 root operator 0x83 Dec 5 17:18 /dev/mfid14 >> crw-r----- 1 root operator 0x84 Dec 5 17:18 /dev/mfid15 >> crw-r----- 1 root operator 0x6a Dec 5 17:18 /dev/mfid2 >> crw-r----- 1 root operator 0x6b Dec 5 17:18 /dev/mfid3 >> crw-r----- 1 root operator 0x6c Dec 5 17:18 /dev/mfid4 >> crw-r----- 1 root operator 0x6d Dec 5 17:18 /dev/mfid5 >> crw-r----- 1 root operator 0x6e Dec 5 17:18 /dev/mfid6 >> crw-r----- 1 root operator 0x75 Dec 5 17:18 /dev/mfid7 >> crw-r----- 1 root operator 0x76 Dec 5 17:18 /dev/mfid8 >> crw-r----- 1 root operator 0x77 Dec 5 17:18 /dev/mfid9 >> root@colossus:~ # >> > > Hi, > > From the above it appears your replacement drive's current name is > mfid15, and the spare is now mfid14. > No, I think LD8 was re-created but nothing was re-numbered... the following seems to confirm that (if I'm reading it right.) > What commands did you run that failed? Can you provide a copy of the > first label from 'zdb -l /dev/mfid0'? > root@colossus:~ # zdb -l /dev/mfid0 -------------------------------------------- LABEL 0 -------------------------------------------- version: 5000 name: 'sorbs' state: 0 txg: 979499 pool_guid: 1038563320 hostid: 339509314 hostname: 'colossus.sorbs.net' top_guid: 386636424 guid: 2060345993 vdev_children: 1 vdev_tree: type: 'raidz' id: 0 guid: 386636424 nparity: 2 metaslab_array: 33 metaslab_shift: 38 ashift: 9 asize: 45000449064960 is_log: 0 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 2060345993 path: '/dev/mfid0' phys_path: '/dev/mfid0' whole_disk: 1 DTL: 154 create_txg: 4 children[1]: type: 'disk' id: 1 guid: 61296476 path: '/dev/mfid1' phys_path: '/dev/mfid1' whole_disk: 1 DTL: 153 create_txg: 4 children[2]: type: 'disk' id: 2 guid: 1565205219 path: '/dev/mfid2' phys_path: '/dev/mfid2' whole_disk: 1 DTL: 152 create_txg: 4 children[3]: type: 'disk' id: 3 guid: 1876923630 path: '/dev/mfid3' phys_path: '/dev/mfid3' whole_disk: 1 DTL: 151 create_txg: 4 children[4]: type: 'disk' id: 4 guid: 1068158627 path: '/dev/mfid4' phys_path: '/dev/mfid4' whole_disk: 1 DTL: 150 create_txg: 4 children[5]: type: 'disk' id: 5 guid: 1726238716 path: '/dev/mfid5' phys_path: '/dev/mfid5' whole_disk: 1 DTL: 149 create_txg: 4 children[6]: type: 'disk' id: 6 guid: 390028842 path: '/dev/mfid6' phys_path: '/dev/mfid6' whole_disk: 1 DTL: 148 create_txg: 4 children[7]: type: 'disk' id: 7 guid: 1094656850 path: '/dev/mfid7' phys_path: '/dev/mfid7' whole_disk: 1 DTL: 147 create_txg: 4 children[8]: type: 'spare' id: 8 guid: 1773868765 whole_disk: 0 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 1702922605 path: '/dev/mfid8' phys_path: '/dev/mfid8' whole_disk: 1 DTL: 166 create_txg: 4 children[1]: type: 'disk' id: 1 guid: 933862663 path: '/dev/mfid14' phys_path: '/dev/mfid14' whole_disk: 1 is_spare: 1 DTL: 146 create_txg: 4 resilvering: 1 children[9]: type: 'disk' id: 9 guid: 1771170870 path: '/dev/mfid8' phys_path: '/dev/mfid8' whole_disk: 1 DTL: 145 create_txg: 4 children[10]: type: 'disk' id: 10 guid: 1797981023 path: '/dev/mfid9' phys_path: '/dev/mfid9' whole_disk: 1 DTL: 144 create_txg: 4 children[11]: type: 'disk' id: 11 guid: 1424656624 path: '/dev/mfid10' phys_path: '/dev/mfid10' whole_disk: 1 DTL: 143 create_txg: 4 children[12]: type: 'disk' id: 12 guid: 1908699165 path: '/dev/mfid11' phys_path: '/dev/mfid11' whole_disk: 1 DTL: 142 create_txg: 4 children[13]: type: 'disk' id: 13 guid: 396147269 path: '/dev/mfid12' phys_path: '/dev/mfid12' whole_disk: 1 DTL: 141 create_txg: 4 children[14]: type: 'disk' id: 14 guid: 847844383 path: '/dev/mfid13' phys_path: '/dev/mfid13' whole_disk: 1 DTL: 140 create_txg: 4 features_for_read: > The label will provide you with the full vdev guid that you need to > replace the original drive with a new one. > > Another thing you could do is wait for the spare to finish > resilvering, then promote it to replace the original drive, and make > your new one a spare. Considering the time required to resilver this > pool configuration, that may be preferable for you. > > --Will. > 2 physical paths of mfid8 ... that can't be good... can't seem to use guids. Michelle -- Michelle Sullivan http://www.mhix.org/ From owner-freebsd-fs@FreeBSD.ORG Sun Dec 7 18:52:17 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D827F462 for ; Sun, 7 Dec 2014 18:52:17 +0000 (UTC) Received: from mail-qa0-f54.google.com (mail-qa0-f54.google.com [209.85.216.54]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 984C65FE for ; Sun, 7 Dec 2014 18:52:17 +0000 (UTC) Received: by mail-qa0-f54.google.com with SMTP id i13so2566255qae.13 for ; Sun, 07 Dec 2014 10:52:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=e3zuVH2LnYqM4PBocfNNAFhx+9vTSFAmLKle5zuuKgY=; b=YRhGhq/6z6rwdS44IjuhOiWZ6hrtQHcAFpcc4vFSNEU8gp16mnykTTl/kfT6o3ad2y 1TB9+udeUdpu1If5K2XgsXcCXu6lxUzvlPQNPQ38zPSNeOF7pagztV7AORbT+kC6HNqJ yJn0ysRVs16UrLtULTy8Xm02PHEY6TkQmPzeg69dPH3FqVS+FFnQFR4xyrH7cdcAl98b lGV/R0ZsEf5MlZ+L+2mfXILxPMv4r3AgSROm01sjivhlu5NPFprt5Lg7ETHTtDEl5kM7 y82hmeP2OLZ9kH4MmfE9ibq5pp9ut39XEjSg2KCJZlPp2AisAGzpv+vk7815wL9KfT2o Z+Vg== X-Gm-Message-State: ALoCoQnZCo3gFbcs/dikrjcFr1RTF29gUBACdWOJtzkOxbWjrqB+LhgPzwWjkJLvDuj9VyDsLA+S MIME-Version: 1.0 X-Received: by 10.140.95.52 with SMTP id h49mr42986847qge.97.1417977994660; Sun, 07 Dec 2014 10:46:34 -0800 (PST) Received: by 10.140.39.48 with HTTP; Sun, 7 Dec 2014 10:46:34 -0800 (PST) In-Reply-To: <54842CC5.2020604@sorbs.net> References: <54825E70.20900@sorbs.net> <54842CC5.2020604@sorbs.net> Date: Sun, 7 Dec 2014 11:46:34 -0700 Message-ID: Subject: Re: ZFS weird issue... From: Will Andrews To: Michelle Sullivan Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Dec 2014 18:52:17 -0000 On Sunday, December 7, 2014, Michelle Sullivan wrote: > > 2 physical paths of mfid8 ... that can't be good... can't seem to use > guids. > Can you paste the commands you tried and the result? It's hard to guess what might be causing the problem otherwise. --Will. From owner-freebsd-fs@FreeBSD.ORG Sun Dec 7 19:32:26 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2906F93 for ; Sun, 7 Dec 2014 19:32:26 +0000 (UTC) Received: from mail-wi0-f172.google.com (mail-wi0-f172.google.com [209.85.212.172]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B1EACA33 for ; Sun, 7 Dec 2014 19:32:25 +0000 (UTC) Received: by mail-wi0-f172.google.com with SMTP id n3so2975744wiv.5 for ; Sun, 07 Dec 2014 11:32:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=FZhkjt1gvLoDOLjqz7WeKoCFMFGKk6Lx7fcEli8nrg4=; b=Y/XM9q7Iczy9DUQZfMRxDwMMY0mItF0GKjeRtyZ4+Ger6Uioy/Ow2pG1kIeZWTrYj5 kO/nM32FRqBwrgofz6KBDcyS8qBpsOKXtukOYLYVK00VOefV/KL+uUYayDlh1zO5gSv0 36hQfAls1VJ2jHOSMDI/b6QzwbCNVhuRd7HTCM3AR1bNhmOfV24OYu9FOP2oFIlDZc34 tOq2CFEwwcULZRqu9E8hhYIC2tbD7YCn3c5xyBHUQgKpzuB+1YbR6YtqPWvJU0ercmzb phly3O2jS0WaC7ylYB8Qs35po+dmXn4QuxFlL+rY/qXe0T0FgPsuxrffluQSQuRHZs22 9Rug== X-Gm-Message-State: ALoCoQklFgqn3h/exEyhT7EHERNBQW2C9xRR9boMdwT2otGMVBPuDb/2a2KflEqbX2MVSE3P7giR X-Received: by 10.180.107.136 with SMTP id hc8mr19210327wib.32.1417980737800; Sun, 07 Dec 2014 11:32:17 -0800 (PST) Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk. [82.69.141.170]) by mx.google.com with ESMTPSA id c10sm33602711wjy.4.2014.12.07.11.32.16 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 07 Dec 2014 11:32:17 -0800 (PST) Message-ID: <5484AAC8.7020209@multiplay.co.uk> Date: Sun, 07 Dec 2014 19:30:16 +0000 From: Steven Hartland User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: ZFS weird issue... References: <54825E70.20900@sorbs.net> <54842CC5.2020604@sorbs.net> In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Dec 2014 19:32:26 -0000 Out from zpool history might be useful too. On 07/12/2014 18:46, Will Andrews wrote: > On Sunday, December 7, 2014, Michelle Sullivan wrote: >> 2 physical paths of mfid8 ... that can't be good... can't seem to use >> guids. >> > Can you paste the commands you tried and the result? It's hard to guess > what might be causing the problem otherwise. > > --Will. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Sun Dec 7 21:00:08 2014 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E0E45E6E for ; Sun, 7 Dec 2014 21:00:08 +0000 (UTC) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B6ECA1B6 for ; Sun, 7 Dec 2014 21:00:08 +0000 (UTC) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id sB7L08GD029196 for ; Sun, 7 Dec 2014 21:00:08 GMT (envelope-from bugzilla-noreply@FreeBSD.org) Message-Id: <201412072100.sB7L08GD029196@kenobi.freebsd.org> From: bugzilla-noreply@FreeBSD.org To: freebsd-fs@FreeBSD.org Subject: Problem reports for freebsd-fs@FreeBSD.org that need special attention X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 Date: Sun, 07 Dec 2014 21:00:08 +0000 Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Dec 2014 21:00:09 -0000 To view an individual PR, use: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=(Bug Id). The following is a listing of current problems submitted by FreeBSD users, which need special attention. These represent problem reports covering all versions including experimental development code and obsolete releases. Status | Bug Id | Description ------------+-----------+--------------------------------------------------- Open | 136470 | [nfs] Cannot mount / in read-only, over NFS Open | 139651 | [nfs] mount(8): read-only remount of NFS volume d Open | 144447 | [zfs] sharenfs fsunshare() & fsshare_main() non f 3 problems total for which you should take action. From owner-freebsd-fs@FreeBSD.ORG Sun Dec 7 22:22:02 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7520BD5B for ; Sun, 7 Dec 2014 22:22:02 +0000 (UTC) Received: from hades.sorbs.net (hades.sorbs.net [67.231.146.201]) by mx1.freebsd.org (Postfix) with ESMTP id 60EEFBE4 for ; Sun, 7 Dec 2014 22:22:01 +0000 (UTC) MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII Received: from isux.com (firewall.isux.com [213.165.190.213]) by hades.sorbs.net (Oracle Communications Messaging Server 7.0.5.29.0 64bit (built Jul 9 2013)) with ESMTPSA id <0NG800I4KH06FI00@hades.sorbs.net> for freebsd-fs@freebsd.org; Sun, 07 Dec 2014 14:26:31 -0800 (PST) Message-id: <5484D307.7070707@sorbs.net> Date: Sun, 07 Dec 2014 23:21:59 +0100 From: Michelle Sullivan User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.24) Gecko/20100301 SeaMonkey/1.1.19 To: Will Andrews Subject: Re: ZFS weird issue... References: <54825E70.20900@sorbs.net> <54842CC5.2020604@sorbs.net> In-reply-to: Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Dec 2014 22:22:02 -0000 Will Andrews wrote: > On Sunday, December 7, 2014, Michelle Sullivan > wrote: > > 2 physical paths of mfid8 ... that can't be good... can't seem to use > guids. > > > Can you paste the commands you tried and the result? It's hard to > guess what might be causing the problem otherwise. > > --Will. I think this is all of them: root@colossus:~ # zpool replace sorbs spare-8 mfid8 invalid vdev specification use '-f' to override the following errors: /dev/mfid8 is part of active pool 'sorbs' root@colossus:~ # zpool replace sorbs spare-8 mfid15 cannot replace spare-8 with mfid15: no such device in pool root@colossus:~ # zpool replace sorbs 933862663 1702922605 cannot open '1702922605': no such GEOM provider must be a full path or shorthand device name root@colossus:~ # zpool replace sorbs mfid8 mfid8 invalid vdev specification use '-f' to override the following errors: /dev/mfid8 is part of active pool 'sorbs' root@colossus:~ # zpool replace sorbs mfid15 mfid15 cannot replace mfid15 with mfid15: no such device in pool root@colossus:~ # zpool replace sorbs spare-8 1702922605 cannot open '1702922605': no such GEOM provider must be a full path or shorthand device name root@colossus:~ # zpool replace sorbs 1702922605 spare-8 cannot open 'spare-8': no such GEOM provider must be a full path or shorthand device name root@colossus:~ # zpool replace sorbs 1702922605 mfid8 invalid vdev specification use '-f' to override the following errors: /dev/mfid8 is part of active pool 'sorbs' The problem seems to be the guid -> device name transaltion is working and failing because there is already a mfid8... and the re-number didn't happen when the device was replaced... -- Michelle Sullivan http://www.mhix.org/ From owner-freebsd-fs@FreeBSD.ORG Sun Dec 7 22:23:31 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E3F2FDD7 for ; Sun, 7 Dec 2014 22:23:31 +0000 (UTC) Received: from hades.sorbs.net (hades.sorbs.net [67.231.146.201]) by mx1.freebsd.org (Postfix) with ESMTP id CF66FBF2 for ; Sun, 7 Dec 2014 22:23:31 +0000 (UTC) MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII Received: from isux.com (firewall.isux.com [213.165.190.213]) by hades.sorbs.net (Oracle Communications Messaging Server 7.0.5.29.0 64bit (built Jul 9 2013)) with ESMTPSA id <0NG800I4MH2OFI00@hades.sorbs.net> for freebsd-fs@freebsd.org; Sun, 07 Dec 2014 14:28:01 -0800 (PST) Message-id: <5484D361.4050707@sorbs.net> Date: Sun, 07 Dec 2014 23:23:29 +0100 From: Michelle Sullivan User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.24) Gecko/20100301 SeaMonkey/1.1.19 To: Steven Hartland Subject: Re: ZFS weird issue... References: <54825E70.20900@sorbs.net> <54842CC5.2020604@sorbs.net> <5484AAC8.7020209@multiplay.co.uk> In-reply-to: <5484AAC8.7020209@multiplay.co.uk> Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Dec 2014 22:23:32 -0000 Steven Hartland wrote: > Out from zpool history might be useful too. root@colossus:~ # zpool history History for 'VirtualDisks': 2014-12-03.17:20:20 zpool create VirtualDisks /dev/zvol/sorbs/VirtualDisks 2014-12-03.17:20:52 zfs create -V 50G VirtualDisks/FreeBSD8.4-OS 2014-12-03.17:20:59 zfs create -V 50G VirtualDisks/FreeBSD8.4-Build 2014-12-03.17:21:06 zfs create -V 50G VirtualDisks/FreeBSD9.0-OS 2014-12-03.17:21:11 zfs create -V 50G VirtualDisks/FreeBSD9.0-Build 2014-12-03.17:21:15 zfs create -V 50G VirtualDisks/FreeBSD9.1-OS 2014-12-03.17:21:21 zfs create -V 50G VirtualDisks/FreeBSD9.1-Build 2014-12-03.17:21:38 zfs create -V 50G VirtualDisks/FreeBSD9.2amd64-OS 2014-12-03.17:21:43 zfs create -V 50G VirtualDisks/FreeBSD9.2amd64-Build 2014-12-03.17:21:51 zfs create -V 50G VirtualDisks/FreeBSD9.2i386-OS 2014-12-03.17:21:56 zfs create -V 50G VirtualDisks/FreeBSD9.2i386-Build 2014-12-03.17:22:03 zfs create -V 50G VirtualDisks/FreeBSD9.3amd64-OS 2014-12-03.17:22:08 zfs create -V 50G VirtualDisks/FreeBSD9.3amd64-Build 2014-12-03.17:22:18 zfs create -V 50G VirtualDisks/FreeBSD9.3i386-OS 2014-12-03.17:22:23 zfs create -V 50G VirtualDisks/FreeBSD9.3i386-Build 2014-12-03.17:22:53 zfs create -V 50G VirtualDisks/FreeBSD10.0amd64-OS 2014-12-03.17:22:58 zfs create -V 50G VirtualDisks/FreeBSD10.0amd64-Build History for 'sorbs': 2014-10-07.02:08:59 zpool create sorbs raidz2 mfid0 mfid1 mfid2 mfid3 mfid4 mfid5 mfid6 mfid7 mfid8 mfid9 mfid10 mfid11 mfid12 mfid13 mfid14 2014-10-07.02:09:58 zpool set listsnapshots=on sorbs 2014-10-10.11:14:54 zpool scrub sorbs 2014-10-17.09:54:27 zpool scrub sorbs 2014-10-18.18:25:47 zpool add sorbs spare mfid15 2014-10-18.18:26:25 zpool replace sorbs mfid8 mfid15 2014-10-18.18:33:43 zpool detach sorbs 405274101 2014-10-18.18:35:09 zpool add sorbs spare mfid8 2014-10-18.18:36:04 zpool replace sorbs mfid15 mfid8 2014-10-18.18:38:57 zpool detach sorbs mfid15 2014-10-18.19:12:55 zpool replace sorbs mfid8 mfid14 2014-10-31.15:14:56 zpool scrub sorbs 2014-12-03.17:15:18 zfs create -V 1T sorbs/VirtualDisks 2014-12-05.17:11:02 zpool online sorbs mfid8 2014-12-05.17:11:39 zpool online sorbs 1702922605 2014-12-05.17:16:22 zpool set autoreplace=on sorbs -- Michelle Sullivan http://www.mhix.org/ From owner-freebsd-fs@FreeBSD.ORG Mon Dec 8 00:15:41 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C91306E6 for ; Mon, 8 Dec 2014 00:15:41 +0000 (UTC) Received: from mail-qa0-f53.google.com (mail-qa0-f53.google.com [209.85.216.53]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8AF9D8CE for ; Mon, 8 Dec 2014 00:15:40 +0000 (UTC) Received: by mail-qa0-f53.google.com with SMTP id bm13so2695937qab.40 for ; Sun, 07 Dec 2014 16:15:34 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=a3NtsijX/HstRUjzUCYwDFFHgmCG20yHEY8+WjXE43o=; b=B3EchRYzZxUNaZB5U+VXyf6ez4WwZH7r/VPgH+Ha2oj0xvCFwdrJjoUeDL9vy8bx5+ nqiYMex9FkmU9N2xfh3QmCWAQfKNJIawN1C9QNkUYz9vWgG3LsItsyMSruHQCaVnntk6 vi4KdFCGACpwO+Vl8xZTjgUI0n/oD4dWYvGNYtaRRjeqxnrrGirdwop4FBYk/+880zWf 2QF8JYp1x5vfSH+W0n5e5T6qVbxGnsyOUQqy3TvhCdOYb/3CsRc1s3pCMIGO4YoYfLSh jHV0wC7L1bPZAz7OLgbtoBX2phlCNw5KkbplIR9GRRJHclqSxoXrNXDKfVYsnrJu3CZO 5LXg== X-Gm-Message-State: ALoCoQlYRyor44/kri3Fjh+CZHppayh3eLuHj/JSYtE0Jqd5fmj5E3hGBoE9n3wGERbKHdWNusD2 MIME-Version: 1.0 X-Received: by 10.229.97.73 with SMTP id k9mr47425475qcn.15.1417997734510; Sun, 07 Dec 2014 16:15:34 -0800 (PST) Received: by 10.140.39.48 with HTTP; Sun, 7 Dec 2014 16:15:34 -0800 (PST) In-Reply-To: <5484D307.7070707@sorbs.net> References: <54825E70.20900@sorbs.net> <54842CC5.2020604@sorbs.net> <5484D307.7070707@sorbs.net> Date: Sun, 7 Dec 2014 17:15:34 -0700 Message-ID: Subject: Re: ZFS weird issue... From: Will Andrews To: Michelle Sullivan Content-Type: text/plain; charset=UTF-8 Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Dec 2014 00:15:41 -0000 On Sun, Dec 7, 2014 at 3:21 PM, Michelle Sullivan wrote: > root@colossus:~ # zpool replace sorbs spare-8 mfid8 > root@colossus:~ # zpool replace sorbs spare-8 mfid15 > root@colossus:~ # zpool replace sorbs 933862663 1702922605 > root@colossus:~ # zpool replace sorbs mfid8 mfid8 > root@colossus:~ # zpool replace sorbs mfid15 mfid15 > root@colossus:~ # zpool replace sorbs spare-8 1702922605 > root@colossus:~ # zpool replace sorbs 1702922605 spare-8 > root@colossus:~ # zpool replace sorbs 1702922605 mfid8 [...] I believe you want to replace 1702922605 (the original member that used to be mfid8) with mfid15, not mfid8. According to your 'zpool status' output (which I assume is still current?), mfid8 is now a different member of the raidz2 than it was previously. Of the 16 devices you have, only mfid15 is currently missing, which suggests that it's the current name of the new drive. As you said, it's a brand new drive, so "zdb -l /dev/mfid15" should confirm that it has no ZFS labels, and therefore is the correct drive to use as the replacement for 1702922605. > The problem seems to be the guid -> device name transaltion is working > and failing because there is already a mfid8... and the re-number didn't > happen when the device was replaced... The guid -> device name translation isn't meant to be definitive at all times. It is just a cached mapping that can become out of date any time devices disappear and reappear or the system reboots. This is why ZFS uses the vdev GUIDs to determine which device is which regardless of what its current block device name is. --Will. From owner-freebsd-fs@FreeBSD.ORG Mon Dec 8 06:15:57 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 19395D51 for ; Mon, 8 Dec 2014 06:15:57 +0000 (UTC) Received: from hades.sorbs.net (hades.sorbs.net [67.231.146.201]) by mx1.freebsd.org (Postfix) with ESMTP id 03A0DC9A for ; Mon, 8 Dec 2014 06:15:56 +0000 (UTC) MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII Received: from isux.com (firewall.isux.com [213.165.190.213]) by hades.sorbs.net (Oracle Communications Messaging Server 7.0.5.29.0 64bit (built Jul 9 2013)) with ESMTPSA id <0NG900IB42Y0FI00@hades.sorbs.net> for freebsd-fs@freebsd.org; Sun, 07 Dec 2014 22:20:26 -0800 (PST) Message-id: <54854219.9040807@sorbs.net> Date: Mon, 08 Dec 2014 07:15:53 +0100 From: Michelle Sullivan User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.24) Gecko/20100301 SeaMonkey/1.1.19 To: Will Andrews Subject: Re: ZFS weird issue... References: <54825E70.20900@sorbs.net> <54842CC5.2020604@sorbs.net> <5484D307.7070707@sorbs.net> In-reply-to: Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Dec 2014 06:15:57 -0000 Will Andrews wrote: > On Sun, Dec 7, 2014 at 3:21 PM, Michelle Sullivan wrote: > >> root@colossus:~ # zpool replace sorbs spare-8 mfid8 >> root@colossus:~ # zpool replace sorbs spare-8 mfid15 >> root@colossus:~ # zpool replace sorbs 933862663 1702922605 >> root@colossus:~ # zpool replace sorbs mfid8 mfid8 >> root@colossus:~ # zpool replace sorbs mfid15 mfid15 >> root@colossus:~ # zpool replace sorbs spare-8 1702922605 >> root@colossus:~ # zpool replace sorbs 1702922605 spare-8 >> root@colossus:~ # zpool replace sorbs 1702922605 mfid8 >> > [...] > > I believe you want to replace 1702922605 (the original member that > used to be mfid8) with mfid15, not mfid8. According to your 'zpool > status' output (which I assume is still current?), mfid8 is now a > different member of the raidz2 than it was previously. Of the 16 > devices you have, only mfid15 is currently missing, which suggests > that it's the current name of the new drive. > > As you said, it's a brand new drive, so "zdb -l /dev/mfid15" should > confirm that it has no ZFS labels, and therefore is the correct drive > to use as the replacement for 1702922605. > > Thank you! (that seems to have got it) root@colossus:/ # zpool replace sorbs 1702922605 mfid15 root@colossus:/ # zpool status -v pool: VirtualDisks state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM VirtualDisks ONLINE 0 0 0 zvol/sorbs/VirtualDisks ONLINE 0 0 0 errors: No known data errors pool: sorbs state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Mon Dec 8 07:13:45 2014 21.6M scanned out of 29.9T at 1.14M/s, (scan is slow, no estimated time) 1.37M resilvered, 0.00% done config: NAME STATE READ WRITE CKSUM sorbs DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 mfid0 ONLINE 0 0 0 mfid1 ONLINE 0 0 0 mfid2 ONLINE 0 0 0 mfid3 ONLINE 0 0 0 mfid4 ONLINE 0 0 0 mfid5 ONLINE 0 0 0 mfid6 ONLINE 0 0 0 mfid7 ONLINE 0 0 0 spare-8 DEGRADED 0 0 0 replacing-0 UNAVAIL 0 0 0 1702922605 FAULTED 0 0 0 was /dev/mfid8 mfid15 ONLINE 0 0 0 (resilvering) mfid14 ONLINE 0 0 0 mfid8 ONLINE 0 0 0 mfid9 ONLINE 0 0 0 mfid10 ONLINE 0 0 0 mfid11 ONLINE 0 0 0 mfid12 ONLINE 0 0 0 mfid13 ONLINE 0 0 0 spares 933862663 INUSE was /dev/mfid14 errors: No known data errors root@colossus:/ # -- Michelle Sullivan http://www.mhix.org/ From owner-freebsd-fs@FreeBSD.ORG Mon Dec 8 08:36:33 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7EEF5C99 for ; Mon, 8 Dec 2014 08:36:33 +0000 (UTC) Received: from smtp.unix-experience.fr (195-154-176-227.rev.poneytelecom.eu [195.154.176.227]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3B3CDD7B for ; Mon, 8 Dec 2014 08:36:32 +0000 (UTC) Received: from smtp.unix-experience.fr (unknown [192.168.200.21]) by smtp.unix-experience.fr (Postfix) with ESMTP id 16B45220B4; Mon, 8 Dec 2014 08:36:24 +0000 (UTC) X-Virus-Scanned: scanned by unix-experience.fr Received: from smtp.unix-experience.fr ([192.168.200.21]) by smtp.unix-experience.fr (smtp.unix-experience.fr [192.168.200.21]) (amavisd-new, port 10024) with ESMTP id ZB2wtjMPoJw0; Mon, 8 Dec 2014 08:36:22 +0000 (UTC) Received: from mail.unix-experience.fr (repo.unix-experience.fr [192.168.200.30]) by smtp.unix-experience.fr (Postfix) with ESMTPSA id D8119220A6; Mon, 8 Dec 2014 08:36:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=unix-experience.fr; s=uxselect; t=1418027782; bh=0CeMJ0G7XaB0j+b9/ZZGanFCD5urDyZlD6qvnMCJCjE=; h=Date:From:Subject:To:Cc:In-Reply-To:References; b=X+tg4GpFvWY4K54JFzCX5pfdWMAGEzrVTIKEhRS2gAeuAS/n8iTXVIGExmSgzvJfj ARhPjilf7unWee5LCSgt89lSMMLHhDBhQwSfr6NQUvpCPgEkbQ11Vehu4L8dDedcSD iXgPYhRa7QqO/rDejXy+N9IqIu+lRZ3Nu0rGVTVE= Mime-Version: 1.0 Date: Mon, 08 Dec 2014 08:36:21 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-ID: X-Mailer: RainLoop/1.6.10.182 From: "=?utf-8?B?TG/Dr2MgQmxvdA==?=" Subject: Re: High Kernel Load with nfsv4 To: "Rick Macklem" In-Reply-To: <581583623.5730217.1417788866930.JavaMail.root@uoguelph.ca> References: <581583623.5730217.1417788866930.JavaMail.root@uoguelph.ca> Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Dec 2014 08:36:33 -0000 Hi Rick,=0AI stopped the jails this week-end and started it this morning,= i'll give you some stats this week.=0A=0AHere is my nfsstat -m output (w= ith your rsize/wsize tweaks)=0A=0Anfsv4,tcp,resvport,hard,cto,sec=3Dsys,a= cdirmin=3D3,acdirmax=3D60,acregmin=3D5,acregmax=3D60,nametimeo=3D60,negna= metimeo=3D60,rsize=3D32768,wsize=3D32768,readdirsize=3D32768,readahead=3D= 1,wcommitsize=3D773136,timeout=3D120,retrans=3D2147483647=0A=0AOn server = side my disks are on a raid controller which show a 512b volume and write= performances are very honest (dd if=3D/dev/zero of=3D/jails/test.dd bs= =3D4096 count=3D100000000 =3D> 450MBps)=0A=0ARegards,=0A=0ALo=C3=AFc Blot= ,=0AUNIX Systems, Network and Security Engineer=0Ahttp://www.unix-experie= nce.fr=0A=0A5 d=C3=A9cembre 2014 15:14 "Rick Macklem" a =C3=A9crit: =0A> Loic Blot wrote:=0A> =0A>> Hi,=0A>> i'm trying to = create a virtualisation environment based on jails.=0A>> Those jails are = stored under a big ZFS pool on a FreeBSD 9.3 which=0A>> export a NFSv4 vo= lume. This NFSv4 volume was mounted on a big=0A>> hypervisor (2 Xeon E5v3= + 128GB memory and 8 ports (but only 1 was=0A>> used at this time).=0A>>= =0A>> The problem is simple, my hypervisors runs 6 jails (used 1% cpu an= d=0A>> 10GB RAM approximatively and less than 1MB bandwidth) and works=0A= >> fine at start but the system slows down and after 2-3 days become=0A>>= unusable. When i look at top command i see 80-100% on system and=0A>> co= mmands are very very slow. Many process are tagged with nfs_cl*.=0A> =0A>= To be honest, I would expect the slowness to be because of slow response= =0A> from the NFSv4 server, but if you do:=0A> # ps axHl=0A> on a client = when it is slow and post that, it would give us some more=0A> information= on where the client side processes are sitting.=0A> If you also do somet= hing like:=0A> # nfsstat -c -w 1=0A> and let it run for a while, that sho= uld show you how many RPCs are=0A> being done and which ones.=0A> =0A> # = nfsstat -m=0A> will show you what your mount is actually using.=0A> The o= nly mount option I can suggest trying is "rsize=3D32768,wsize=3D32768",= =0A> since some network environments have difficulties with 64K.=0A> =0A>= There are a few things you can try on the NFSv4 server side, if it appea= rs=0A> that the clients are generating a large RPC load.=0A> - disabling = the DRC cache for TCP by setting vfs.nfsd.cachetcp=3D0=0A> - If the serve= r is seeing a large write RPC load, then "sync=3Ddisabled"=0A> might help= , although it does run a risk of data loss when the server=0A> crashes.= =0A> Then there are a couple of other ZFS related things (I'm not a ZFS g= uy,=0A> but these have shown up on the mailing lists).=0A> - make sure yo= ur volumes are 4K aligned and ashift=3D12 (in case a drive=0A> that uses = 4K sectors is pretending to be 512byte sectored)=0A> - never run over 70-= 80% full if write performance is an issue=0A> - use a zil on an SSD with = good write performance=0A> =0A> The only NFSv4 thing I can tell you is th= at it is known that ZFS's=0A> algorithm for determining sequential vs ran= dom I/O fails for NFSv4=0A> during writing and this can be a performance = hit. The only workaround=0A> is to use NFSv3 mounts, since file handle af= finity apparently fixes=0A> the problem and this is only done for NFSv3.= =0A> =0A> rick=0A> =0A>> I saw that there are TSO issues with igb then i'= m trying to disable=0A>> it with sysctl but the situation wasn't solved.= =0A>> =0A>> Someone has got ideas ? I can give you more informations if y= ou=0A>> need.=0A>> =0A>> Thanks in advance.=0A>> Regards,=0A>> =0A>> Lo= =C3=AFc Blot,=0A>> UNIX Systems, Network and Security Engineer=0A>> http:= //www.unix-experience.fr=0A>> ___________________________________________= ____=0A>> freebsd-fs@freebsd.org mailing list=0A>> http://lists.freebsd.o= rg/mailman/listinfo/freebsd-fs=0A>> To unsubscribe, send any mail to "fre= ebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Mon Dec 8 08:59:30 2014 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 371EC200 for ; Mon, 8 Dec 2014 08:59:30 +0000 (UTC) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 62318F7C for ; Mon, 8 Dec 2014 08:59:28 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA01050; Mon, 08 Dec 2014 11:01:20 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1XxuA9-0006FO-TD; Mon, 08 Dec 2014 10:59:25 +0200 Message-ID: <5485681C.7010504@FreeBSD.org> Date: Mon, 08 Dec 2014 10:58:04 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Will Andrews Subject: Re: ZFS weird issue... References: <54825E70.20900@sorbs.net> <54842CC5.2020604@sorbs.net> In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Dec 2014 08:59:30 -0000 BTW, Will, don't you have a patch that help the vdev phys path to stay up-to-date? -- Andriy Gapon From owner-freebsd-fs@FreeBSD.ORG Mon Dec 8 09:36:38 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 61598FEA for ; Mon, 8 Dec 2014 09:36:38 +0000 (UTC) Received: from mail-qc0-f170.google.com (mail-qc0-f170.google.com [209.85.216.170]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1F82D672 for ; Mon, 8 Dec 2014 09:36:37 +0000 (UTC) Received: by mail-qc0-f170.google.com with SMTP id x3so3276327qcv.29 for ; Mon, 08 Dec 2014 01:36:31 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=/Hqhe+z8EhSlB8nG3IH8IZNqLKa2fAE+szq7UWS4jGI=; b=NBu4ZJmx/b2p91X+bv+U926S47vWj113Ve2yr58nyj8as20H1ifMbNPNF9/jN3Olej ptK7CC3hljjSeGh5MS1KyV4YHEl1B1+otsJF5RiVTyQBsOZd6i+sxePSiqwgA8O6wgBw yy6CW190rW3ex0so9Dtg494MRPDbDsFOG13eLAJ2ECR6AZ6joc3wkFQm5lPzpP1gtA3V GKtvqRsFWslfehCwgns64VPLhZzHvL8/1GOBgvFdh1fiFOUeaxW2InyBlXYGGs4c5l91 u2BXyR3uveb8yZqthOTIFe+DKhAn7Csim7EU74qkcbYH//WirZGDbaRkPbFb8vGjtm0U yvCA== X-Gm-Message-State: ALoCoQloo4lKzgaYoC5oBsmdVnTAyJgTNhWVTtnLwFfsC3VXRHqKgxnQVe0LNLzg+pJwtS3wTPnT MIME-Version: 1.0 X-Received: by 10.224.14.133 with SMTP id g5mr50403405qaa.81.1418031026889; Mon, 08 Dec 2014 01:30:26 -0800 (PST) Received: by 10.140.39.48 with HTTP; Mon, 8 Dec 2014 01:30:26 -0800 (PST) In-Reply-To: <5485681C.7010504@FreeBSD.org> References: <54825E70.20900@sorbs.net> <54842CC5.2020604@sorbs.net> <5485681C.7010504@FreeBSD.org> Date: Mon, 8 Dec 2014 02:30:26 -0700 Message-ID: Subject: Re: ZFS weird issue... From: Will Andrews To: Andriy Gapon Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Dec 2014 09:36:38 -0000 On Monday, December 8, 2014, Andriy Gapon wrote: > > BTW, Will, don't you have a patch that help the vdev phys path to stay > up-to-date? > I don't believe we have any patch for a situation where a member disappears altogether. Perhaps the best we could do for a situation like this would be to wipe the physical path in the label for a faulted device, to reduce confusion. But a device that's been either manually offlined or physically removed could come back and be resilvered. There are patches to provide physical paths through SES/SAF-TE/SGPIO which offer paths decidedly more physical than a device name. And to perform automatic replacement by physical path based on these. Not sure to what degree these have been integrated in the mainline. But this still requires the system to have an enclosure service of some sort. Fundamentally, documentation should make clear which /dev device names are logical and can be reordered due to configuration changes (including device departure and arrival events) or reboots. As opposed to those based on immutable properties of the device. In the case of ZFS, the ideal physical path identifies a slot, so ZFS can detect when a particular member is being replaced at its physical location by a new device. So using a device's serial number is a little too specific for the job. But a typical logical device name is not specific enough. --Will. From owner-freebsd-fs@FreeBSD.ORG Mon Dec 8 13:47:22 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 335CC19B for ; Mon, 8 Dec 2014 13:47:22 +0000 (UTC) Received: from smtp.unix-experience.fr (195-154-176-227.rev.poneytelecom.eu [195.154.176.227]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E27F4398 for ; Mon, 8 Dec 2014 13:47:21 +0000 (UTC) Received: from smtp.unix-experience.fr (unknown [192.168.200.21]) by smtp.unix-experience.fr (Postfix) with ESMTP id 3B5DE25A84; Mon, 8 Dec 2014 13:47:18 +0000 (UTC) X-Virus-Scanned: scanned by unix-experience.fr Received: from smtp.unix-experience.fr ([192.168.200.21]) by smtp.unix-experience.fr (smtp.unix-experience.fr [192.168.200.21]) (amavisd-new, port 10024) with ESMTP id hT6XG7sQ9ghv; Mon, 8 Dec 2014 13:47:12 +0000 (UTC) Received: from mail.unix-experience.fr (repo.unix-experience.fr [192.168.200.30]) by smtp.unix-experience.fr (Postfix) with ESMTPSA id A5CE325A79; Mon, 8 Dec 2014 13:47:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=unix-experience.fr; s=uxselect; t=1418046432; bh=gbbq7Bw66H7kj0+p71Exbn2D98kgbDIEhhZIVhUxcIo=; h=Date:From:Subject:To:Cc:In-Reply-To:References; b=jbBBF8PVyvm42cI+/B06NnXyVqEK54kQxNdI32SnsKp6/s0nxcLi2sm5EbcoUlH1P zk4kvd69moveF5Nq9ahwAUDlj+WPYBcHaWUAKrN2bmLX1Kk3cqk0HL3rCLyYWMfeuK OEzK3jKFS8gLCJaLYW/0UwpvtsP2dNwtuTwyhjbw= Mime-Version: 1.0 Date: Mon, 08 Dec 2014 13:47:12 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-ID: X-Mailer: RainLoop/1.6.10.182 From: "=?utf-8?B?TG/Dr2MgQmxvdA==?=" Subject: Re: High Kernel Load with nfsv4 To: "Rick Macklem" In-Reply-To: References: <581583623.5730217.1417788866930.JavaMail.root@uoguelph.ca> Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Dec 2014 13:47:22 -0000 Hi rick,=0A=0AI waited 3 hours (no lag at jail launch) and now I do: sysr= c memcached_flags=3D"-v -m 512"=0ACommand was very very slow...=0A=0AHere= is a dd over NFS:=0A=0A601062912 bytes transferred in 21.060679 secs (28= 539579 bytes/sec)=0A=0AThis is quite slow...=0A=0AYou can found some nfss= tat below (command isn't finished yet)=0A=0Anfsstat -c -w 1=0A=0A GtAttr = Lookup Rdlink Read Write Rename Access Rddir=0A 0 0 0 = 0 0 0 0 0=0A 4 0 0 0 0 = 0 16 0=0A 2 0 0 0 0 0 17 = 0=0A 0 0 0 0 0 0 0 0=0A = 0 0 0 0 0 0 0 0=0A 0 0 = 0 0 0 0 0 0=0A 0 0 0 0 = 0 0 0 0=0A 0 4 0 0 0 0 = 4 0=0A 0 0 0 0 0 0 0 0=0A = 0 0 0 0 0 0 0 0=0A 0 0 = 0 0 0 0 0 0=0A 0 0 0 0 = 0 0 0 0=0A 4 0 0 0 0 0 = 3 0=0A 0 0 0 0 0 0 3 0= =0A 37 10 0 8 0 0 14 1=0A 18 = 16 0 4 1 2 4 0=0A 78 91 0 = 82 6 12 30 0=0A 19 18 0 2 2 = 4 2 0=0A 0 0 0 0 2 0 0 = 0=0A 0 0 0 0 0 0 0 0=0A GtAttr = Lookup Rdlink Read Write Rename Access Rddir=0A 0 0 0 = 0 0 0 0 0=0A 0 0 0 0 0 = 0 0 0=0A 0 0 0 0 0 0 0 = 0=0A 0 1 0 0 0 0 1 0=0A = 4 6 0 0 6 0 3 0=0A 2 0 = 0 0 0 0 0 0=0A 0 0 0 0 = 0 0 0 0=0A 1 0 0 0 0 0 = 0 0=0A 0 0 0 0 1 0 0 0=0A = 0 0 0 0 0 0 0 0=0A 0 0 = 0 0 0 0 0 0=0A 0 0 0 0 = 0 0 0 0=0A 0 0 0 0 0 0 = 0 0=0A 0 0 0 0 0 0 0 0= =0A 0 0 0 0 0 0 0 0=0A 0 = 0 0 0 0 0 0 0=0A 0 0 0 = 0 0 0 0 0=0A 6 108 0 0 0 = 0 0 0=0A 0 0 0 0 0 0 0 = 0=0A 0 0 0 0 0 0 0 0=0A GtAttr = Lookup Rdlink Read Write Rename Access Rddir=0A 0 0 0 = 0 0 0 0 0=0A 0 0 0 0 0 = 0 0 0=0A 0 0 0 0 0 0 0 = 0=0A 0 0 0 0 0 0 0 0=0A = 0 0 0 0 0 0 0 0=0A 0 0 = 0 0 0 0 0 0=0A 0 0 0 0 = 0 0 0 0=0A 98 54 0 86 11 0 = 25 0=0A 36 24 0 39 25 0 10 1=0A = 67 8 0 63 63 0 41 0=0A 34 0 = 0 35 34 0 0 0=0A 75 0 0 75 = 77 0 0 0=0A 34 0 0 35 35 0 = 0 0=0A 75 0 0 74 76 0 0 0= =0A 33 0 0 34 33 0 0 0=0A 0 = 0 0 0 5 0 0 0=0A 0 0 0 = 0 0 0 6 0=0A 11 0 0 0 0 = 0 11 0=0A 0 0 0 0 0 0 0 = 0=0A 0 17 0 0 0 0 1 0=0A GtAttr = Lookup Rdlink Read Write Rename Access Rddir=0A 4 5 0 = 0 0 0 12 0=0A 2 0 0 0 0 = 0 26 0=0A 0 0 0 0 0 0 0 = 0=0A 0 0 0 0 0 0 0 0=0A = 0 0 0 0 0 0 0 0=0A 0 0 = 0 0 0 0 0 0=0A 0 0 0 0 = 0 0 0 0=0A 0 4 0 0 0 0 = 4 0=0A 0 0 0 0 0 0 0 0=0A = 0 0 0 0 0 0 0 0=0A 0 0 = 0 0 0 0 0 0=0A 4 0 0 0 = 0 0 2 0=0A 2 0 0 0 0 0 = 24 0=0A 0 0 0 0 0 0 0 0= =0A 0 0 0 0 0 0 0 0=0A 0 = 0 0 0 0 0 0 0=0A 0 0 0 = 0 0 0 0 0=0A 0 0 0 0 0 = 0 0 0=0A 0 0 0 0 0 0 0 = 0=0A 0 0 0 0 0 0 0 0=0A GtAttr = Lookup Rdlink Read Write Rename Access Rddir=0A 0 0 0 = 0 0 0 0 0=0A 0 0 0 0 0 = 0 0 0=0A 4 0 0 0 0 0 7 = 0=0A 2 1 0 0 0 0 1 0=0A = 0 0 0 0 2 0 0 0=0A 0 0 = 0 0 0 0 0 0=0A 0 0 0 0 = 6 0 0 0=0A 0 0 0 0 0 0 = 0 0=0A 0 0 0 0 0 0 0 0=0A = 0 0 0 0 0 0 0 0=0A 0 0 = 0 0 0 0 0 0=0A 0 0 0 0 = 0 0 0 0=0A 0 0 0 0 0 0 = 0 0=0A 4 6 0 0 0 0 3 0= =0A 0 0 0 0 0 0 0 0=0A 2 = 0 0 0 0 0 0 0=0A 0 0 0 = 0 0 0 0 0=0A 0 0 0 0 0 = 0 0 0=0A 0 0 0 0 0 0 0 = 0=0A 0 0 0 0 0 0 0 0=0A GtAttr = Lookup Rdlink Read Write Rename Access Rddir=0A 0 0 0 = 0 0 0 0 0=0A 0 0 0 0 0 = 0 0 0=0A 0 0 0 0 0 0 0 = 0=0A 0 0 0 0 0 0 0 0=0A = 0 0 0 0 0 0 0 0=0A 4 71 = 0 0 0 0 0 0=0A 0 1 0 0 = 0 0 0 0=0A 2 36 0 0 0 0 = 1 0=0A 0 0 0 0 0 0 0 0=0A = 0 0 0 0 0 0 0 0=0A 0 0 = 0 0 0 0 0 0=0A 0 0 0 0 = 0 0 0 0=0A 1 0 0 0 0 0 = 1 0=0A 0 0 0 0 0 0 0 0= =0A 0 0 0 0 0 0 0 0=0A 79 = 6 0 79 79 0 2 0=0A 25 0 0 = 25 26 0 6 0=0A 43 18 0 39 46 = 0 23 0=0A 36 0 0 36 36 0 31 = 0=0A 68 1 0 66 68 0 0 0=0A GtAttr = Lookup Rdlink Read Write Rename Access Rddir=0A 36 0 0 = 36 36 0 0 0=0A 48 0 0 48 49 = 0 0 0=0A 20 0 0 20 20 0 0 = 0=0A 0 0 0 0 0 0 0 0=0A = 3 14 0 1 0 0 11 0=0A 0 0 = 0 0 0 0 0 0=0A 0 0 0 0 = 0 0 0 0=0A 0 4 0 0 0 0 = 4 0=0A 0 0 0 0 0 0 0 0=0A = 4 22 0 0 0 0 16 0=0A 2 0 = 0 0 0 0 23 0=0A=0ARegards,=0A=0ALo=C3=AFc Blo= t,=0AUNIX Systems, Network and Security Engineer=0Ahttp://www.unix-experi= ence.fr=0A=0A8 d=C3=A9cembre 2014 09:36 "Lo=C3=AFc Blot" a =C3=A9crit: =0A> Hi Rick,=0A> I stopped the jails this w= eek-end and started it this morning, i'll give you some stats this week.= =0A> =0A> Here is my nfsstat -m output (with your rsize/wsize tweaks)=0A>= =0A> nfsv4,tcp,resvport,hard,cto,sec=3Dsys,acdirmin=3D3,acdirmax=3D60,ac= regmin=3D5,acregmax=3D60,nametimeo=3D60,negna=0A> etimeo=3D60,rsize=3D327= 68,wsize=3D32768,readdirsize=3D32768,readahead=3D1,wcommitsize=3D773136,t= imeout=3D120,retra=0A> s=3D2147483647=0A> =0A> On server side my disks ar= e on a raid controller which show a 512b volume and write performances=0A= > are very honest (dd if=3D/dev/zero of=3D/jails/test.dd bs=3D4096 count= =3D100000000 =3D> 450MBps)=0A> =0A> Regards,=0A> =0A> Lo=C3=AFc Blot,=0A>= UNIX Systems, Network and Security Engineer=0A> http://www.unix-experien= ce.fr=0A> =0A> 5 d=C3=A9cembre 2014 15:14 "Rick Macklem" a =C3=A9crit:=0A> =0A>> Loic Blot wrote:=0A>> =0A>>> Hi,=0A>>> i'm= trying to create a virtualisation environment based on jails.=0A>>> Thos= e jails are stored under a big ZFS pool on a FreeBSD 9.3 which=0A>>> expo= rt a NFSv4 volume. This NFSv4 volume was mounted on a big=0A>>> hyperviso= r (2 Xeon E5v3 + 128GB memory and 8 ports (but only 1 was=0A>>> used at t= his time).=0A>>> =0A>>> The problem is simple, my hypervisors runs 6 jail= s (used 1% cpu and=0A>>> 10GB RAM approximatively and less than 1MB bandw= idth) and works=0A>>> fine at start but the system slows down and after 2= -3 days become=0A>>> unusable. When i look at top command i see 80-100% o= n system and=0A>>> commands are very very slow. Many process are tagged w= ith nfs_cl*.=0A>> =0A>> To be honest, I would expect the slowness to be b= ecause of slow response=0A>> from the NFSv4 server, but if you do:=0A>> #= ps axHl=0A>> on a client when it is slow and post that, it would give us= some more=0A>> information on where the client side processes are sittin= g.=0A>> If you also do something like:=0A>> # nfsstat -c -w 1=0A>> and le= t it run for a while, that should show you how many RPCs are=0A>> being d= one and which ones.=0A>> =0A>> # nfsstat -m=0A>> will show you what your = mount is actually using.=0A>> The only mount option I can suggest trying = is "rsize=3D32768,wsize=3D32768",=0A>> since some network environments ha= ve difficulties with 64K.=0A>> =0A>> There are a few things you can try o= n the NFSv4 server side, if it appears=0A>> that the clients are generati= ng a large RPC load.=0A>> - disabling the DRC cache for TCP by setting vf= s.nfsd.cachetcp=3D0=0A>> - If the server is seeing a large write RPC load= , then "sync=3Ddisabled"=0A>> might help, although it does run a risk of = data loss when the server=0A>> crashes.=0A>> Then there are a couple of o= ther ZFS related things (I'm not a ZFS guy,=0A>> but these have shown up = on the mailing lists).=0A>> - make sure your volumes are 4K aligned and a= shift=3D12 (in case a drive=0A>> that uses 4K sectors is pretending to be= 512byte sectored)=0A>> - never run over 70-80% full if write performance= is an issue=0A>> - use a zil on an SSD with good write performance=0A>> = =0A>> The only NFSv4 thing I can tell you is that it is known that ZFS's= =0A>> algorithm for determining sequential vs random I/O fails for NFSv4= =0A>> during writing and this can be a performance hit. The only workarou= nd=0A>> is to use NFSv3 mounts, since file handle affinity apparently fix= es=0A>> the problem and this is only done for NFSv3.=0A>> =0A>> rick=0A>>= =0A>>> I saw that there are TSO issues with igb then i'm trying to disab= le=0A>>> it with sysctl but the situation wasn't solved.=0A>>> =0A>>> Som= eone has got ideas ? I can give you more informations if you=0A>>> need.= =0A>>> =0A>>> Thanks in advance.=0A>>> Regards,=0A>>> =0A>>> Lo=C3=AFc Bl= ot,=0A>>> UNIX Systems, Network and Security Engineer=0A>>> http://www.un= ix-experience.fr=0A>>> _______________________________________________=0A= >>> freebsd-fs@freebsd.org mailing list=0A>>> http://lists.freebsd.org/ma= ilman/listinfo/freebsd-fs=0A>>> To unsubscribe, send any mail to "freebsd= -fs-unsubscribe@freebsd.org"=0A> =0A> ___________________________________= ____________=0A> freebsd-fs@freebsd.org mailing list=0A> http://lists.fre= ebsd.org/mailman/listinfo/freebsd-fs=0A> To unsubscribe, send any mail to= "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Mon Dec 8 13:55:14 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 089C2593; Mon, 8 Dec 2014 13:55:14 +0000 (UTC) Received: from mx2.paymentallianceintl.com (mx2.paymentallianceintl.com [216.26.158.171]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mx2.paymentallianceintl.com", Issuer "Go Daddy Secure Certification Authority" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id AD9EF693; Mon, 8 Dec 2014 13:55:13 +0000 (UTC) Received: from PAIMAIL.pai.local (paimail.pai.local [10.10.0.153]) by mx2.paymentallianceintl.com (8.14.5/8.13.8) with ESMTP id sB8Dk22o099676 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Mon, 8 Dec 2014 08:46:02 -0500 (EST) (envelope-from mikej@paymentallianceintl.com) Received: from PAIMAIL.pai.local ([::1]) by PAIMAIL.pai.local ([::1]) with mapi; Mon, 8 Dec 2014 08:46:01 -0500 From: Michael Jung To: Will Andrews , Andriy Gapon Date: Mon, 8 Dec 2014 08:45:56 -0500 Subject: FreeBSD Enclosure Management - WAS: ZFS weird issue... Thread-Topic: FreeBSD Enclosure Management - WAS: ZFS weird issue... Thread-Index: AdAS7Q0yXe6cCl4IQ52pBOM8jKJQAw== Message-ID: <9C91F97841BC4347910F206618BAA3BB04E8583B27@PAIMAIL.pai.local> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "freebsd-fs@freebsd.org" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Dec 2014 13:55:14 -0000 >There are patches to provide physical paths through SES/SAF-TE/SGPIO which >offer paths decidedly more physical than a device name. And to perform >automatic replacement by physical path based on these. Not sure to what >degree these have been integrated in the mainline. But this still requires >the system to have an enclosure service of some sort. All: At home, I have used gpart labels to identify drives (14 drives) which obvi= ously doesn't scale well. Do the patches get one to a point of identifying drives as lon= g as you have controllers and backplanes supporting say SGPIO e.g. LSI/Superm= icro? Can you expound on what the 'service' would provide in addition to the patc= hes? This link is informative but leave questions for the reader. https://people.freebsd.org/~mav/Enclosure_Management_en.pdf Where are your patches located? I would very much like to learn more as to how best to handle enclosure management under FreeBSD, or to know that in fact it is not generic enough and that it will always be hardware (backplane/expander/controller) dependent for any implementation requiring programing skills or vendor supplied tools. Kind regards, --mikej GoPai.com | Facebook.com/PaymentAlliance CONFIDENTIALITY NOTE: This message is intended only for the use of the individual or entity to whom it is addressed and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this transmission in error, please notify us by telephone at (502) 212-4001 or notify us at PAI , Dept. 99, 6060 Dutchmans Lane, Suite 320, Louisville, KY 40205 From owner-freebsd-fs@FreeBSD.ORG Mon Dec 8 14:38:40 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 58064CB for ; Mon, 8 Dec 2014 14:38:40 +0000 (UTC) Received: from kerio.tuxis.nl (alcyone.saas.tuxis.net [31.3.111.19]) (using TLSv1.1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C82E9B24 for ; Mon, 8 Dec 2014 14:38:38 +0000 (UTC) X-Footer: dHV4aXMubmw= Received: from [31.3.104.222] ([31.3.104.222]) by kerio.tuxis.nl (Kerio Connect 8.4.0) for freebsd-fs@freebsd.org; Mon, 8 Dec 2014 15:38:29 +0100 Date: Mon, 8 Dec 2014 15:38:29 +0100 Subject: Mountd, why not use the '-S' flag by default X-Mailer: Kerio Connect 8.4.0/Kerio Connect client X-User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36 Message-ID: <710130010-1872@kerio.tuxis.nl> X-Priority: 3 Importance: Normal MIME-Version: 1.0 MIME-Version: 1.0 MIME-Version: 1.0 From: Mark Schouten To: freebsd-fs@freebsd.org MIME-Version: 1.0 Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg="sha1"; boundary="=-v/092G2AT9LgfiAD1CAy" X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Dec 2014 14:38:40 -0000 --=-v/092G2AT9LgfiAD1CAy Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hi, I'm using a FreeBSD nfs-server as storage for my Linux KVM-based VPS-platfo= rm. The images reside on the NFS-server. I'm been noticing errors in my VPS disks when running 'zfs set sharenfs=3DX= YZ', probably because of reloads of mountd. While trying to debug that, I ran acros this message in mountd(8): =C2=A0 =C2=A0 =C2=A0-S =C2=A0 =C2=A0 =C2=A0Tell mountd to suspend/resume ex= ecution of the nfsd threads when- =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0ever the exports list is be= ing reloaded. =C2=A0This avoids intermit- =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0tent access errors for clie= nts that do NFS RPCs while the exports =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0are being reloaded, but int= roduces a delay in RPC response while =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0the reload is in progress. = =C2=A0If mountd crashes while an exports =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0load is in progress, mountd= must be restarted to get the nfsd =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0threads running again, if t= his option is used. I can't think of a reason why you wouldn't want to use -S by default.. An '= /etc/rc.d/mountd reload' without it causes even my running Bonnie on a norm= al NFS-share (not via a diskimage) to stop with 'input/output error'. Can s= omeone enlighten me with the drawbacks of using -S ? Met vriendelijke groeten, --=C2=A0 Kerio Operator in de Cloud? https://www.kerioindecloud.nl/ Mark Schouten | Tuxis Internet Engineering KvK:=C2=A061527076=C2=A0| http://www.tuxis.nl/ T: 0318 200208 | info@tuxis.nl= --=-v/092G2AT9LgfiAD1CAy Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Disposition: attachment; filename="smime.p7s" Content-Description: Electronic Signature S/MIME Content-Transfer-Encoding: base64 MIIRwQYJKoZIhvcNAQcCoIIRsjCCEa4CAQExCzAJBgUrDgMCGgUAMAsGCSqGSIb3DQEHAaCCDt4w ggUbMIIEA6ADAgECAhAsv+VdGX6YsSHI/WRu2j2JMA0GCSqGSIb3DQEBBQUAMIGTMQswCQYDVQQG EwJHQjEbMBkGA1UECBMSR3JlYXRlciBNYW5jaGVzdGVyMRAwDgYDVQQHEwdTYWxmb3JkMRowGAYD VQQKExFDT01PRE8gQ0EgTGltaXRlZDE5MDcGA1UEAxMwQ09NT0RPIENsaWVudCBBdXRoZW50aWNh dGlvbiBhbmQgU2VjdXJlIEVtYWlsIENBMB4XDTE0MDcwMTAwMDAwMFoXDTE1MDcwMTIzNTk1OVow HjEcMBoGCSqGSIb3DQEJARYNbWFya0B0dXhpcy5ubDCCASIwDQYJKoZIhvcNAQEBBQADggEPADCC AQoCggEBANHu3SxlMZOG5GA0/mqtRXR1QmWwhUXzmCIprI0IPtSBWSA31YBJ5qcmXRhLzaiTB3Fr UpIGkW5aAZnDms9DD64kasF3oZE00Fvfnj/BDGbw098px1PukKfg4hasbTaELAjQTSUj8xRSHzKk VVynvLA/YmyRT/+u3ueK4wdaxcej241xH6mNfZeiKMAvbkv6Tm9vdup0BtqqbRSKcnc01KKrspun Eh73jLIUhP21uJv8vuOTxS1I9zJSlhIcMCEapjBQ+26cQl+s+qBuAs/LP3UPVytSbxvicdhDxtqH npN2h3jJ/+J86zGfQi8bG7EamsULTGHkbgIJL9AdKZRgAKECAwEAAaOCAd0wggHZMB8GA1UdIwQY MBaAFHoTTgB0W8Z4Y2QnwS/ioFu8ecV7MB0GA1UdDgQWBBR+dMZE22X/SMCyTxYzIP+NMZBNXTAO BgNVHQ8BAf8EBAMCBaAwDAYDVR0TAQH/BAIwADAgBgNVHSUEGTAXBggrBgEFBQcDBAYLKwYBBAGy MQEDBQIwEQYJYIZIAYb4QgEBBAQDAgUgMEYGA1UdIAQ/MD0wOwYMKwYBBAGyMQECAQEBMCswKQYI KwYBBQUHAgEWHWh0dHBzOi8vc2VjdXJlLmNvbW9kby5uZXQvQ1BTMFcGA1UdHwRQME4wTKBKoEiG Rmh0dHA6Ly9jcmwuY29tb2RvY2EuY29tL0NPTU9ET0NsaWVudEF1dGhlbnRpY2F0aW9uYW5kU2Vj dXJlRW1haWxDQS5jcmwwgYgGCCsGAQUFBwEBBHwwejBSBggrBgEFBQcwAoZGaHR0cDovL2NydC5j b21vZG9jYS5jb20vQ09NT0RPQ2xpZW50QXV0aGVudGljYXRpb25hbmRTZWN1cmVFbWFpbENBLmNy dDAkBggrBgEFBQcwAYYYaHR0cDovL29jc3AuY29tb2RvY2EuY29tMBgGA1UdEQQRMA+BDW1hcmtA dHV4aXMubmwwDQYJKoZIhvcNAQEFBQADggEBAIB8FhqaML1EzfvgNwwHDC3k0ICeMerOncgee6uJ KLxwU2mstttX5jtAmgK9RuDOu+TrMkkpF2yxYMTPpSM8nL7r+N/kdogu5Bustol8WTsW1e5vs+Nh hJYFORk113ouur1kSjXuHF8TWy+/PjFJBS/xm/H+/fkghppRU+4Dj2IReUBvlexAPYr4VDxjV7AD xPOXqTQkP15LWGvhTz2YVbJ3IAVOyUNkRhr9QwzToUxXa9k/QAOpXMuvS74AT2RBV/YCEEx7ebRD MAR6lZcbYiV8sXv1ASbnMdO3Fh2F98g+5rJn5PfFH8qLpapsZx0I2/axtSG09QMDJqXd3Ab6NpEw ggUaMIIEAqADAgECAhBtGeqnGU9qMyLmIjJ6qnHeMA0GCSqGSIb3DQEBBQUAMIGuMQswCQYDVQQG EwJVUzELMAkGA1UECBMCVVQxFzAVBgNVBAcTDlNhbHQgTGFrZSBDaXR5MR4wHAYDVQQKExVUaGUg VVNFUlRSVVNUIE5ldHdvcmsxITAfBgNVBAsTGGh0dHA6Ly93d3cudXNlcnRydXN0LmNvbTE2MDQG A1UEAxMtVVROLVVTRVJGaXJzdC1DbGllbnQgQXV0aGVudGljYXRpb24gYW5kIEVtYWlsMB4XDTEx MDQyODAwMDAwMFoXDTIwMDUzMDEwNDgzOFowgZMxCzAJBgNVBAYTAkdCMRswGQYDVQQIExJHcmVh dGVyIE1hbmNoZXN0ZXIxEDAOBgNVBAcTB1NhbGZvcmQxGjAYBgNVBAoTEUNPTU9ETyBDQSBMaW1p dGVkMTkwNwYDVQQDEzBDT01PRE8gQ2xpZW50IEF1dGhlbnRpY2F0aW9uIGFuZCBTZWN1cmUgRW1h aWwgQ0EwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCShIRbS1eY1F4vi6ThQMijU1hf ZmXxMk73nzJ9VdB4TFW3QpTg+SdxB8XGaaS5MsTxQBqQzCdWYn8XtXFpruUgG+TLY15gyqJB9mrh o/+43x9IbWVDjCouK2M4d9+xF6zC2oIC1tQyatRnbyATj1w1+uVUgK/YcQodNwoCUFNslR2pEBS0 mZVZEjH/CaLSTNxS297iQAFbSGjdxUq04O0kHzqvcV8H46y/FDuwJXFoPfQP1hdYRhWBPGiLi4MP bXohV+Y0sNsyfuNK4aVScmQmkU6lkg//4LFg/RpvaFGZY40ai6XMQpubfSJj06mg/M6ekN9EGfRc WzW6FvOnm//BAgMBAAGjggFLMIIBRzAfBgNVHSMEGDAWgBSJgmd9xJ0mcABLtFBIfN49rgRufTAd BgNVHQ4EFgQUehNOAHRbxnhjZCfBL+KgW7x5xXswDgYDVR0PAQH/BAQDAgEGMBIGA1UdEwEB/wQI MAYBAf8CAQAwEQYDVR0gBAowCDAGBgRVHSAAMFgGA1UdHwRRME8wTaBLoEmGR2h0dHA6Ly9jcmwu dXNlcnRydXN0LmNvbS9VVE4tVVNFUkZpcnN0LUNsaWVudEF1dGhlbnRpY2F0aW9uYW5kRW1haWwu Y3JsMHQGCCsGAQUFBwEBBGgwZjA9BggrBgEFBQcwAoYxaHR0cDovL2NydC51c2VydHJ1c3QuY29t L1VUTkFkZFRydXN0Q2xpZW50X0NBLmNydDAlBggrBgEFBQcwAYYZaHR0cDovL29jc3AudXNlcnRy dXN0LmNvbTANBgkqhkiG9w0BAQUFAAOCAQEAhda+eFdVbTN/RFL+QtUGqAEDgIr7DbL9Sr/2r0FJ 9RtaxdKtG3NuPukmfOZMmMEwKN/L+0I8oSU+CnXW0D05hmbRoZu1TZtvryhsHa/l6nRaqNqxwPF1 ei+eupN5yv7ikR5WdLL4jdPgQ3Ib7Y/9YDkgR/uLrzplSDyYPaUlv73vYOBJ5RbI6z9Dg/Dg7g3B 080zX5vQvWBqszv++tTJOjwf7Zv/m0kzvkIpOYPuM2kugp1FTahp2oAbHj3SGl18R5mlmwhtEpmG 1l1XBxunML5LSUS4kH7K0Xk467Qz+qA6XSZYnmFVGLQh1ZnV4ENAQjC+6qXnlNKw/vN1+X9u5zCC BJ0wggOFoAMCAQICEDQ96SusJzT/j8s0lPvMcFQwDQYJKoZIhvcNAQEFBQAwbzELMAkGA1UEBhMC U0UxFDASBgNVBAoTC0FkZFRydXN0IEFCMSYwJAYDVQQLEx1BZGRUcnVzdCBFeHRlcm5hbCBUVFAg TmV0d29yazEiMCAGA1UEAxMZQWRkVHJ1c3QgRXh0ZXJuYWwgQ0EgUm9vdDAeFw0wNTA2MDcwODA5 MTBaFw0yMDA1MzAxMDQ4MzhaMIGuMQswCQYDVQQGEwJVUzELMAkGA1UECBMCVVQxFzAVBgNVBAcT DlNhbHQgTGFrZSBDaXR5MR4wHAYDVQQKExVUaGUgVVNFUlRSVVNUIE5ldHdvcmsxITAfBgNVBAsT GGh0dHA6Ly93d3cudXNlcnRydXN0LmNvbTE2MDQGA1UEAxMtVVROLVVTRVJGaXJzdC1DbGllbnQg QXV0aGVudGljYXRpb24gYW5kIEVtYWlsMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA sjmFpPJ9q0E7YkY3rs3BYHW8OWX5ShpHornMSMxqmNVNNRm5pELlzkniii8efNIxB8dOtINknS4p 1aJkxIW9hVE1eaROaJB7HHqkkqgX8pgV8pPMyaQylbsMTzC9mKALi+VuG6JG+ni8om+rWV6lL8/K 2m2qL+usobNqqrcuZzWLeeEeaYji5kbNoKXqvgvOdjp6Dpvq/NonWz1zHyLmSGHGTPNpsaguG7bU MSAsvIKKjqQOpdeJQ/wWWq8dcdcRWdq6hw2v+vPhwvCkxWeM1tZUOt4KpLoDd7NlyP0e03RiqhjK aJMeoYV+9Udly/hNVyh00jT/MLbu9mIwFIws6wIDAQABo4H0MIHxMB8GA1UdIwQYMBaAFK29mHo0 tCb3+sQmVO8DveAky1QaMB0GA1UdDgQWBBSJgmd9xJ0mcABLtFBIfN49rgRufTAOBgNVHQ8BAf8E BAMCAQYwDwYDVR0TAQH/BAUwAwEB/zARBgNVHSAECjAIMAYGBFUdIAAwRAYDVR0fBD0wOzA5oDeg NYYzaHR0cDovL2NybC51c2VydHJ1c3QuY29tL0FkZFRydXN0RXh0ZXJuYWxDQVJvb3QuY3JsMDUG CCsGAQUFBwEBBCkwJzAlBggrBgEFBQcwAYYZaHR0cDovL29jc3AudXNlcnRydXN0LmNvbTANBgkq hkiG9w0BAQUFAAOCAQEAAbyc42MosPMxAcLfe91ioAGdIzEPnJJzU1HqH0z61p/Eyi9nfngzD3QW uZGHkfWKJvpkcADYHvkLBGJQh5OB1Nr1I9s0u4VWtHA0bniDNx6FHMURFZJfhxe9rGr98cLRzIlf sXzwPlHyNfN87GCYazor4O/fs32G67Ub9VvsonyYE9cAULnRLXPeA3h04QWFMV7LmrmdlMa5lDd1 ctxE+2fo8PolHlKn2iXpR+CgxzygTrEKNvt3SJ/vl4r7tP7jlBSog7xcLT/SYHFg7sJxggzpiDbj 2iC0o6BsqpZLuICOdcpJB/Y7FLrf3AXZn9vgsuZNoHgm5+ctbn9fxh6IFTGCAqswggKnAgEBMIGo MIGTMQswCQYDVQQGEwJHQjEbMBkGA1UECBMSR3JlYXRlciBNYW5jaGVzdGVyMRAwDgYDVQQHEwdT YWxmb3JkMRowGAYDVQQKExFDT01PRE8gQ0EgTGltaXRlZDE5MDcGA1UEAxMwQ09NT0RPIENsaWVu dCBBdXRoZW50aWNhdGlvbiBhbmQgU2VjdXJlIEVtYWlsIENBAhAsv+VdGX6YsSHI/WRu2j2JMAkG BSsOAwIaBQCggdgwGAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMTQx MjA4MTQzODI5WjAjBgkqhkiG9w0BCQQxFgQUz/Gm8Q6n1A0MnuujgQXx8f/d9UUweQYJKoZIhvcN AQkPMWwwajALBglghkgBZQMEASowCwYJYIZIAWUDBAEWMAsGCWCGSAFlAwQBAjAKBggqhkiG9w0D BzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYIKoZIhvcNAwICASgw DQYJKoZIhvcNAQEBBQAEggEAHOn8wTzk2S1fHwBiqjb5pOJHHYVAjBgPpU0KfO0QEFFkjjSvJr5u dfKrdIOaTWWLxO9M9ZZYWyoaGW2zHPus/NzsPxYmAblrnoqu0LyJcEAYBh30UzzSfx24kcI246UZ Ij02MlYBPVDq2TNDfwAUm1X9zL3fjBBXjXj53uj0TwlD+jgPt/MyVUSAN9PqjhUkQzGfCp0pA57r +ldlGHj+P7y6yWr/r/IpeLqueDYisqUG8VqVMglyF/GlBkK4VBa1WFKC1Gb1+H+uIY4A0aYXCsUD SBPvAetVgAsGwTZFDnOB92W3plnDSBhasRRCYXLl4R4y+Z+nUQMfD8Bs2i0Spg== --=-v/092G2AT9LgfiAD1CAy-- From owner-freebsd-fs@FreeBSD.ORG Mon Dec 8 15:59:43 2014 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E834B97C for ; Mon, 8 Dec 2014 15:59:43 +0000 (UTC) Received: from zulu.iotz.org (zulu.iotz.org [192.73.233.125]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C55905FB for ; Mon, 8 Dec 2014 15:59:43 +0000 (UTC) Received: from iozz.us (zulu.iotz.org [192.73.233.125]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by zulu.iotz.org (Postfix) with ESMTPSA id 9B1D01E0128; Mon, 8 Dec 2014 10:53:51 -0500 (EST) Date: Mon, 8 Dec 2014 10:53:31 -0500 From: Brian N To: fs@freebsd.org Subject: ZFS scrub on new disks gives cksum errors Message-ID: <20141208155230.GA574@iozz.us> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Dec 2014 15:59:44 -0000 Hi. I have two new 3TB disks that I'm testing prior to installing in a production server. I created a zpool mirror, dumped a bunch of data to the pool and ran a scrub. I got back numerous CKSUM errors, the same number on each drive. I'm guessing this is not a problem with my 3TB drives but a controller or some other hardware problem. Is this a correct assumption? The PC where I'm running this test is using a GA-E7AUM-DS2H mainboard and non-ecc memory. ###### root@server:~ # zpool status pool: tank state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scan: scrub repaired 0 in 3h57m with 23 errors on Mon Dec 8 03:05:28 2014 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 23 mirror-0 ONLINE 0 0 46 ada1 ONLINE 0 0 46 ada3 ONLINE 0 0 46 errors: 2 data errors, use '-v' for a list From owner-freebsd-fs@FreeBSD.ORG Mon Dec 8 16:03:02 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7288886C for ; Mon, 8 Dec 2014 16:03:02 +0000 (UTC) Received: from mail-wg0-f46.google.com (mail-wg0-f46.google.com [74.125.82.46]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 08FD674F for ; Mon, 8 Dec 2014 16:03:01 +0000 (UTC) Received: by mail-wg0-f46.google.com with SMTP id x13so4009271wgg.5 for ; Mon, 08 Dec 2014 08:02:54 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=ZbNkO+ocQ25GQ/ZLPzwregyg2eDQO7WOszWIZ+qyg5k=; b=OmsYCxXspmKsIGAsJl7Z3SHDH59uOmh5PbsB6QfoK93FyMT0pIphCv5rYVEgr/9uc3 IcMAGQpYkO9sAbsDaL2+2K935aDz6Cd5DCYPzeg5aAn/Z+mfoenYN8WYluXfyIj4SEyr LVcjtsbDlYTT46XG4z5AS2PIm3B7ICxlw5qtKM0M1yupiIjpehudJlZTopWcSkgbjKzZ wD2Sww7vQrBi9RvkldyhokblAU1IeEE1lYed1bIwx0ui/+n1pybXh+/iHGupsF5OrsJr MpbUqRBK0Z4Fw2cOrrJxIpGgMxxg5Ugw7E8RdJ/qdoA4GGTEF6J1XU7DC+96I15mibsy hx+A== X-Gm-Message-State: ALoCoQn2r/bdjQLspSwqmy/2hIP6/+M52xVFtychujQxqdClO5Fz/KfEGbizC+yH6E6FzL9/S+kI X-Received: by 10.180.73.235 with SMTP id o11mr25032746wiv.51.1418054573926; Mon, 08 Dec 2014 08:02:53 -0800 (PST) Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk. [82.69.141.170]) by mx.google.com with ESMTPSA id xt9sm24611141wjc.42.2014.12.08.08.02.52 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 08 Dec 2014 08:02:53 -0800 (PST) Message-ID: <5485CB35.5010608@multiplay.co.uk> Date: Mon, 08 Dec 2014 16:00:53 +0000 From: Steven Hartland User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: ZFS scrub on new disks gives cksum errors References: <20141208155230.GA574@iozz.us> In-Reply-To: <20141208155230.GA574@iozz.us> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Dec 2014 16:03:02 -0000 Bad memory, cabling? On 08/12/2014 15:53, Brian N wrote: > Hi. > > I have two new 3TB disks that I'm testing prior to installing in a production > server. I created a zpool mirror, dumped a bunch of data to the pool and ran a > scrub. I got back numerous CKSUM errors, the same number on each drive. I'm > guessing this is not a problem with my 3TB drives but a controller or some > other hardware problem. Is this a correct assumption? > > The PC where I'm running this test is using a GA-E7AUM-DS2H mainboard and > non-ecc memory. > > ###### > > root@server:~ # zpool status > pool: tank > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://illumos.org/msg/ZFS-8000-8A > scan: scrub repaired 0 in 3h57m with 23 errors on Mon Dec 8 03:05:28 2014 > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 23 > mirror-0 ONLINE 0 0 46 > ada1 ONLINE 0 0 46 > ada3 ONLINE 0 0 46 > > errors: 2 data errors, use '-v' for a list > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Mon Dec 8 17:09:54 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 02FCEA1A for ; Mon, 8 Dec 2014 17:09:54 +0000 (UTC) Received: from mail-qg0-f42.google.com (na3sys010aog105.obsmtp.com [74.125.245.78]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 85FEBF32 for ; Mon, 8 Dec 2014 17:09:53 +0000 (UTC) Received: from mail-qg0-f42.google.com ([209.85.192.42]) (using TLSv1) by na3sys010aob105.postini.com ([74.125.244.12]) with SMTP ID DSNKVIXbYLnQHa+NITh3bgkQ4UgWadTlmEfZ@postini.com; Mon, 08 Dec 2014 09:09:53 PST Received: by mail-qg0-f42.google.com with SMTP id z107so3837494qgd.1 for ; Mon, 08 Dec 2014 09:09:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=groupon.com; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=PJYoUpUu/yxYVaOEBnl/A9Cf/weA+odE9g64jbKFMps=; b=MONHFvJafhA2Uhhu50tYttG+qdFLvACfaYRi5sSzfUWyeOVikx+Ld3SqshTcdBaC1T YWBzh4UEpLJevUIoTttjmSPtnxlDbu1P1S5kiRr9PYPM+QHQZTYNlLS9udWn2DNQAWu1 6MaiXeSxRJ8WOW/gQzs9YK3u+g4jBs9ERXDlE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=PJYoUpUu/yxYVaOEBnl/A9Cf/weA+odE9g64jbKFMps=; b=G0S7G/fRwDVdQr0Czu4DvXLa80AxRas8tOml//rD5ZvCZluEHssjeVjbiUqyPsb4yF JnigkTnA549XqC+GpINLUQ1b1UDtXhjoQckuao61mL60py6F1CM/MCMLHUOGzdI46cMi eWonGoP1vqR+xLPXvq2JjvP49M4viQvk/4htdltLp9U7XWnLk59YHYT1YdFqs9rHe+TZ cP1urie4hMeT+0QLhPDoA6NwkyyjrPE0XTWAq98RT1CXWu9GR0sLmaqpwrz8GOUem9eq O+34ynQiw+HviRvA9d7TgmPdEWbhD0DvvIctkCsMxQRah3wUxekXzj6llK4Kas/xaLra UYBA== X-Gm-Message-State: ALoCoQkV4eyk0ukqYAkw3KxwZyMYWbum8w2Qv1K7+ZmhW8P/+DoqtTlkwm8z5G/v0mIUYwJcoaXINhIZTLCg+LdxmBennFTWS06oer9gMMmZDcJ5xOcMivnTVhRnm48sGNwwR43L+W1QXTpUBzrwYtjHuhj4klCjcA== X-Received: by 10.224.4.8 with SMTP id 8mr55034719qap.77.1418058591962; Mon, 08 Dec 2014 09:09:51 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.224.4.8 with SMTP id 8mr55034688qap.77.1418058591756; Mon, 08 Dec 2014 09:09:51 -0800 (PST) Received: by 10.96.115.163 with HTTP; Mon, 8 Dec 2014 09:09:51 -0800 (PST) In-Reply-To: <5485CB35.5010608@multiplay.co.uk> References: <20141208155230.GA574@iozz.us> <5485CB35.5010608@multiplay.co.uk> Date: Mon, 8 Dec 2014 09:09:51 -0800 Message-ID: Subject: Re: ZFS scrub on new disks gives cksum errors From: Sean Chittenden To: Steven Hartland Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Dec 2014 17:09:54 -0000 I had this exact problem at home on my FreeNAS host when a box is plugged in to residential power that is "not clean." If you plug this host in to a UPS it should clear things up instantly (find a sine wave UPS, not a digital stepping supply). There's something about the fluctuating power or brownouts that ZFS is well suited to detecting (and became terrifying to me because wtf happened in the past pre-ZFS??). -sc On Mon, Dec 8, 2014 at 8:00 AM, Steven Hartland wrote: > Bad memory, cabling? > > > On 08/12/2014 15:53, Brian N wrote: > >> Hi. >> >> I have two new 3TB disks that I'm testing prior to installing in a >> production >> server. I created a zpool mirror, dumped a bunch of data to the pool and >> ran a >> scrub. I got back numerous CKSUM errors, the same number on each drive. >> I'm >> guessing this is not a problem with my 3TB drives but a controller or some >> other hardware problem. Is this a correct assumption? >> >> The PC where I'm running this test is using a GA-E7AUM-DS2H mainboard and >> non-ecc memory. >> >> ###### >> >> root@server:~ # zpool status >> pool: tank >> state: ONLINE >> status: One or more devices has experienced an error resulting in data >> corruption. Applications may be affected. >> action: Restore the file in question if possible. Otherwise restore the >> entire pool from backup. >> see: http://illumos.org/msg/ZFS-8000-8A >> scan: scrub repaired 0 in 3h57m with 23 errors on Mon Dec 8 03:05:28 >> 2014 >> config: >> >> NAME STATE READ WRITE CKSUM >> tank ONLINE 0 0 23 >> mirror-0 ONLINE 0 0 46 >> ada1 ONLINE 0 0 46 >> ada3 ONLINE 0 0 46 >> >> errors: 2 data errors, use '-v' for a list >> >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > -- Sean Chittenden From owner-freebsd-fs@FreeBSD.ORG Mon Dec 8 18:08:50 2014 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 70369685; Mon, 8 Dec 2014 18:08:50 +0000 (UTC) Received: from elvis.mu.org (elvis.mu.org [IPv6:2001:470:1f05:b76::196]) by mx1.freebsd.org (Postfix) with ESMTP id 5E732AD1; Mon, 8 Dec 2014 18:08:50 +0000 (UTC) Received: from [10.0.1.20] (c-76-21-10-192.hsd1.ca.comcast.net [76.21.10.192]) by elvis.mu.org (Postfix) with ESMTPSA id 346FB341F872; Mon, 8 Dec 2014 10:08:50 -0800 (PST) Subject: Re: backups of bhyve images Mime-Version: 1.0 (Apple Message framework v1283) Content-Type: text/plain; charset=us-ascii From: Alfred Perlstein In-Reply-To: <20141208163358.GA52969@potato.growveg.org> Date: Mon, 8 Dec 2014 10:08:52 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: <33053EB5-91C5-4036-8CC2-34103E33A0FA@mu.org> References: <20141208163358.GA52969@potato.growveg.org> To: John X-Mailer: Apple Mail (2.1283) Cc: fs@freebsd.org, freebsd-virtualization@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Dec 2014 18:08:50 -0000 On Dec 8, 2014, at 8:33 AM, John wrote: > Hello list, >=20 > I have a few questions about creating backups to be stored offsite. >=20 > If a guest is running, can I compress the image without it becoming=20 > inconsistent? If not, can it be copied without it becoming = inconsistent? =20 > By inconsistent, I mean will I see weird effects and broken files if = the=20 > backup is restored? Previously I've shut the VM down to avoid this,=20 > before archiving. >=20 > I have each image on its own (external to the image) ZFS filesystem. =20= > Internally the image is using ufs if freebsd, ext3fs if linux. Would=20= > using some ZFS method of duplication be better? In this case, would = the=20 > image become inconsistent? >=20 > Basically, what I want to do is to run accurate backups without = shutting=20 > down and restarting the VM. Is this possible? If it isn't, I think the=20= > only alternative is to make a script that shuts the vm down, copies = it,=20 > restarts the vm then runs its compression and backup-over-ssh routine. [[ adding fs@freebsd.org in case I'm wrong ]] If you are using UFS internally to the VMs then you'll need to send a = snapshot that is consistent. If you are just copying the files out from under a running vm you are = going to get spaghettios for a filesystem if you try to recover as you = need a true point in time snapshot. I think a few better options would be: 1) Inside the VM create a UFS snapshot then dump that externally using = tools. 2) Create the UFS snapshot, then make sure that the file/vzol is = snapshotted using zfs. 3) Just snapshot the underlying zvol you've made the UFS image on and = send that (you'll get a dirty FS on restore, but it *should* be = recoverable with a simple fsck) =20 4) Use zfs internally to the vm and send/receive the internal zfs. option 3 is the least safe imo as you can wind up with filesystem = "angry". in case 1 and 2 you'll have UFS snapshots that should be "OK" to restore = from. in case 4 you are also doing snapshot, but you switch to ZFS. -Alfred From owner-freebsd-fs@FreeBSD.ORG Mon Dec 8 18:19:14 2014 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 068D8815 for ; Mon, 8 Dec 2014 18:19:14 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9E305BE0 for ; Mon, 8 Dec 2014 18:19:13 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id sB8IJ7V7030791 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Mon, 8 Dec 2014 20:19:07 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua sB8IJ7V7030791 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id sB8IJ71N030790 for fs@freebsd.org; Mon, 8 Dec 2014 20:19:07 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 8 Dec 2014 20:19:06 +0200 From: Konstantin Belousov To: fs@freebsd.org Subject: VFS_SYNC() call in dounmount() Message-ID: <20141208181906.GW97072@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Dec 2014 18:19:14 -0000 Right now, dounmount() has the following code: if (((mp->mnt_flag & MNT_RDONLY) || (error = VFS_SYNC(mp, MNT_WAIT)) == 0) || (flags & MNT_FORCE) != 0) error = VFS_UNMOUNT(mp, flags); In other words, if the filesystem is mounted rw, we try VFS_SYNC(). If the unmount request if forced, VFS_UNMOUNT() is called unconditionally, otherwise, VFS_UNMOUNT() is only performed when VFS_SYNC() succeeded. Apparently, the sync call is problematic, both for UFS and NFS. It was demonstrated by Peter Holm that sufficient fsx load prevents sync from finishing for unbounded amount of time. The ffs_unmount() makes neccessary measures to stop writers and to sync filesystem to the stable state before destroying the mount structures, so removal of VFS_SYNC() above fixed the test. More, NFS client just ignores the VFS_SYNC() call for forced unmount, to work around the hung nfs requests. Andrey Gapon assured me that ZFS handles unmount correctly and does not need help in the form of sync before unmount. The only major writeable filesystem which apparently did not correctly synced on unmount is msdosfs. Note that relying on VFS_SYNC() before VFS_UNMOUNT() to flush all caches to permanent storage is racy, since VFS does not (and cannot) stop other threads from writing to fs meantime. UFS and TMPFS suspend filesystem in VFS_UNMOUNT(), handling the race in VFS_UNMOUNT(). I propose to only call VFS_SYNC() before VFS_UNMOUNT() for non-forced unmount. As I explained, the call for forced case is mostly pointless. For non-forced unmount, this is needed for KBI compatibility for NFS (not important), and to increase the chances of unmount succeedeing (again not important). I still prefer to keep the call around for non-forced case for some time. diff --git a/sys/fs/msdosfs/msdosfs_vfsops.c b/sys/fs/msdosfs/msdosfs_vfsops.c index 213dd00..d14cdef 100644 --- a/sys/fs/msdosfs/msdosfs_vfsops.c +++ b/sys/fs/msdosfs/msdosfs_vfsops.c @@ -797,11 +797,15 @@ msdosfs_unmount(struct mount *mp, int mntflags) int error, flags; flags = 0; - if (mntflags & MNT_FORCE) + error = msdosfs_sync(mp, MNT_WAIT); + if ((mntflags & MNT_FORCE) != 0) { flags |= FORCECLOSE; + } else if (error != 0) { + return (error); + } error = vflush(mp, 0, flags, curthread); - if (error && error != ENXIO) - return error; + if (error != 0 && error != ENXIO) + return (error); pmp = VFSTOMSDOSFS(mp); if ((pmp->pm_flags & MSDOSFSMNT_RONLY) == 0) { error = markvoldirty(pmp, 0); diff --git a/sys/kern/vfs_mount.c b/sys/kern/vfs_mount.c index c407699..b2b4969 100644 --- a/sys/kern/vfs_mount.c +++ b/sys/kern/vfs_mount.c @@ -1305,8 +1305,8 @@ dounmount(mp, flags, td) } vput(fsrootvp); } - if (((mp->mnt_flag & MNT_RDONLY) || - (error = VFS_SYNC(mp, MNT_WAIT)) == 0) || (flags & MNT_FORCE) != 0) + if ((mp->mnt_flag & MNT_RDONLY) != 0 || (flags & MNT_FORCE) != 0 || + (error = VFS_SYNC(mp, MNT_WAIT)) == 0) error = VFS_UNMOUNT(mp, flags); vn_finished_write(mp); /* From owner-freebsd-fs@FreeBSD.ORG Mon Dec 8 18:38:41 2014 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 70978F97 for ; Mon, 8 Dec 2014 18:38:41 +0000 (UTC) Received: from chez.mckusick.com (chez.mckusick.com [IPv6:2001:5a8:4:7e72:4a5b:39ff:fe12:452]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 51420DD9 for ; Mon, 8 Dec 2014 18:38:41 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id sB8IcPSD012844; Mon, 8 Dec 2014 10:38:25 -0800 (PST) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201412081838.sB8IcPSD012844@chez.mckusick.com> To: Konstantin Belousov Subject: Re: VFS_SYNC() call in dounmount() In-reply-to: <20141208181906.GW97072@kib.kiev.ua> Date: Mon, 08 Dec 2014 10:38:25 -0800 From: Kirk McKusick Cc: fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Dec 2014 18:38:41 -0000 > Date: Mon, 8 Dec 2014 20:19:06 +0200 > From: Konstantin Belousov > To: fs@freebsd.org > Subject: VFS_SYNC() call in dounmount() > > Right now, dounmount() has the following code: > if (((mp->mnt_flag & MNT_RDONLY) || > (error = VFS_SYNC(mp, MNT_WAIT)) == 0) || (flags & MNT_FORCE) != 0) > error = VFS_UNMOUNT(mp, flags); > In other words, if the filesystem is mounted rw, we try VFS_SYNC(). > If the unmount request if forced, VFS_UNMOUNT() is called unconditionally, > otherwise, VFS_UNMOUNT() is only performed when VFS_SYNC() succeeded. > > Apparently, the sync call is problematic, both for UFS and NFS. It > was demonstrated by Peter Holm that sufficient fsx load prevents sync > from finishing for unbounded amount of time. The ffs_unmount() makes > neccessary measures to stop writers and to sync filesystem to the stable > state before destroying the mount structures, so removal of VFS_SYNC() > above fixed the test. > > More, NFS client just ignores the VFS_SYNC() call for forced unmount, > to work around the hung nfs requests. > > Andrey Gapon assured me that ZFS handles unmount correctly and does > not need help in the form of sync before unmount. The only major > writeable filesystem which apparently did not correctly synced on > unmount is msdosfs. > > Note that relying on VFS_SYNC() before VFS_UNMOUNT() to flush all caches > to permanent storage is racy, since VFS does not (and cannot) stop > other threads from writing to fs meantime. UFS and TMPFS suspend > filesystem in VFS_UNMOUNT(), handling the race in VFS_UNMOUNT(). > > I propose to only call VFS_SYNC() before VFS_UNMOUNT() for non-forced > unmount. As I explained, the call for forced case is mostly pointless. > For non-forced unmount, this is needed for KBI compatibility for NFS > (not important), and to increase the chances of unmount succeedeing > (again not important). I still prefer to keep the call around for > non-forced case for some time. > > diff --git a/sys/fs/msdosfs/msdosfs_vfsops.c b/sys/fs/msdosfs/msdosfs_vfsops.c > index 213dd00..d14cdef 100644 > --- a/sys/fs/msdosfs/msdosfs_vfsops.c > +++ b/sys/fs/msdosfs/msdosfs_vfsops.c > @@ -797,11 +797,15 @@ msdosfs_unmount(struct mount *mp, int mntflags) > int error, flags; > > flags = 0; > - if (mntflags & MNT_FORCE) > + error = msdosfs_sync(mp, MNT_WAIT); > + if ((mntflags & MNT_FORCE) != 0) { > flags |= FORCECLOSE; > + } else if (error != 0) { > + return (error); > + } > error = vflush(mp, 0, flags, curthread); > - if (error && error != ENXIO) > - return error; > + if (error != 0 && error != ENXIO) > + return (error); > pmp = VFSTOMSDOSFS(mp); > if ((pmp->pm_flags & MSDOSFSMNT_RONLY) == 0) { > error = markvoldirty(pmp, 0); > diff --git a/sys/kern/vfs_mount.c b/sys/kern/vfs_mount.c > index c407699..b2b4969 100644 > --- a/sys/kern/vfs_mount.c > +++ b/sys/kern/vfs_mount.c > @@ -1305,8 +1305,8 @@ dounmount(mp, flags, td) > } > vput(fsrootvp); > } > - if (((mp->mnt_flag & MNT_RDONLY) || > - (error = VFS_SYNC(mp, MNT_WAIT)) == 0) || (flags & MNT_FORCE) != 0) > + if ((mp->mnt_flag & MNT_RDONLY) != 0 || (flags & MNT_FORCE) != 0 || > + (error = VFS_SYNC(mp, MNT_WAIT)) == 0) > error = VFS_UNMOUNT(mp, flags); > vn_finished_write(mp); > /* I agree with your analysis and believe that your proposed change is functionally correct for at least UFS. Kirk McKusick From owner-freebsd-fs@FreeBSD.ORG Mon Dec 8 20:29:13 2014 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 05DE4B82 for ; Mon, 8 Dec 2014 20:29:13 +0000 (UTC) Received: from dmz-mailsec-scanner-1.mit.edu (dmz-mailsec-scanner-1.mit.edu [18.9.25.12]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 98D0CBB7 for ; Mon, 8 Dec 2014 20:29:12 +0000 (UTC) X-AuditID: 1209190c-f79e46d000000eb2-9d-54860a11fb4f Received: from mailhub-auth-4.mit.edu ( [18.7.62.39]) (using TLS with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by dmz-mailsec-scanner-1.mit.edu (Symantec Messaging Gateway) with SMTP id F5.E7.03762.11A06845; Mon, 8 Dec 2014 15:29:05 -0500 (EST) Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by mailhub-auth-4.mit.edu (8.13.8/8.9.2) with ESMTP id sB8KT4ni008509; Mon, 8 Dec 2014 15:29:04 -0500 Received: from multics.mit.edu (system-low-sipb.mit.edu [18.187.2.37]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id sB8KT2Fc017082 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Mon, 8 Dec 2014 15:29:04 -0500 Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308) id sB8KT2fi004161; Mon, 8 Dec 2014 15:29:02 -0500 (EST) Date: Mon, 8 Dec 2014 15:29:02 -0500 (EST) From: Benjamin Kaduk To: Konstantin Belousov Subject: Re: VFS_SYNC() call in dounmount() In-Reply-To: <20141208181906.GW97072@kib.kiev.ua> Message-ID: References: <20141208181906.GW97072@kib.kiev.ua> User-Agent: Alpine 1.10 (GSO 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrNIsWRmVeSWpSXmKPExsUixG6nrivI1RZiMHUjt8XhJy4WDdMeszkw ecz4NJ/FY+esu+wBTFFcNimpOZllqUX6dglcGXun72QpuMtf8aXtFXMD4zWeLkZODgkBE4md H+4zQthiEhfurWfrYuTiEBJYzCSx8sN5KGcDo8SNaw+YIJyDTBITJ8xnBmkREqiXaN/8lgXE ZhHQkpjzsRssziagIjHzzUY2EFtEQFfi44I9YHFmASGJg8+/g60TFtCWODf7NVgNp4ChxNmf U9m7GDk4eAUcJe7tsYIYbyDRee83O4gtKqAjsXr/FLBVvAKCEidnPmGBGKklsXz6NpYJjIKz kKRmIUktYGRaxSibklulm5uYmVOcmqxbnJyYl5dapGuol5tZopeaUrqJERSmnJI8OxjfHFQ6 xCjAwajEw7vgQUuIEGtiWXFl7iFGSQ4mJVFedY62ECG+pPyUyozE4oz4otKc1OJDjBIczEoi vMt3toYI8aYkVlalFuXDpKQ5WJTEeTf94AsREkhPLEnNTk0tSC2CycpwcChJ8F4BGSpYlJqe WpGWmVOCkGbi4AQZzgM03JATqIa3uCAxtzgzHSJ/ilFRSpxXFyQhAJLIKM2D64WlkVeM4kCv CPN+B1nBA0xBcN2vgAYzAQ1+kQhydXFJIkJKqoFx3rm+dd12sVd2sq581cV1uem9/L/d+zen M1yr5tLZtypBcPIUmzts942Ujzd8n224R/Jyv/BDt5kid368Le3acHO5lOzLv0zbrobu3nJb mL3gm+fR0F8B5c8SFv5bppBjd8Triv5UmZfHj93nnxsfbefDXprnKX235bT5xqVuVU2Xl2hP LPqwS4mlOCPRUIu5qDgRAHcoPTj+AgAA Cc: fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Dec 2014 20:29:13 -0000 On Mon, 8 Dec 2014, Konstantin Belousov wrote: > Right now, dounmount() has the following code: > if (((mp->mnt_flag & MNT_RDONLY) || > (error = VFS_SYNC(mp, MNT_WAIT)) == 0) || (flags & MNT_FORCE) != 0) > error = VFS_UNMOUNT(mp, flags); > In other words, if the filesystem is mounted rw, we try VFS_SYNC(). > If the unmount request if forced, VFS_UNMOUNT() is called unconditionally, > otherwise, VFS_UNMOUNT() is only performed when VFS_SYNC() succeeded. > > Apparently, the sync call is problematic, both for UFS and NFS. It > was demonstrated by Peter Holm that sufficient fsx load prevents sync > from finishing for unbounded amount of time. The ffs_unmount() makes > neccessary measures to stop writers and to sync filesystem to the stable > state before destroying the mount structures, so removal of VFS_SYNC() > above fixed the test. > > More, NFS client just ignores the VFS_SYNC() call for forced unmount, > to work around the hung nfs requests. > > Andrey Gapon assured me that ZFS handles unmount correctly and does > not need help in the form of sync before unmount. The only major > writeable filesystem which apparently did not correctly synced on > unmount is msdosfs. > > Note that relying on VFS_SYNC() before VFS_UNMOUNT() to flush all caches > to permanent storage is racy, since VFS does not (and cannot) stop > other threads from writing to fs meantime. UFS and TMPFS suspend > filesystem in VFS_UNMOUNT(), handling the race in VFS_UNMOUNT(). > > I propose to only call VFS_SYNC() before VFS_UNMOUNT() for non-forced > unmount. As I explained, the call for forced case is mostly pointless. > For non-forced unmount, this is needed for KBI compatibility for NFS > (not important), and to increase the chances of unmount succeedeing > (again not important). I still prefer to keep the call around for > non-forced case for some time. It looks like VFS_SYNC is a no-op for net/openafs, so we should not be affected by this change. -Ben From owner-freebsd-fs@FreeBSD.ORG Mon Dec 8 23:31:01 2014 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5B3E0E00; Mon, 8 Dec 2014 23:31:01 +0000 (UTC) Received: from mail-lb0-x235.google.com (mail-lb0-x235.google.com [IPv6:2a00:1450:4010:c04::235]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CE1B1115; Mon, 8 Dec 2014 23:31:00 +0000 (UTC) Received: by mail-lb0-f181.google.com with SMTP id l4so4693103lbv.12 for ; Mon, 08 Dec 2014 15:30:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=61nDyCxztZJ2scYFeONQjxtowDbtH/TzmSKAKNuNOJc=; b=laJeXNSUmFukEVCcRUoz7GZqSfKX7lLMMEqFGGn1TqFbC7oLwZvq5DvFVZvNRG4Rn1 H6qVCipPf62qBCuum8FJggyxamU0a5nNZGnNc/atTxonweCH7RhZdixlEdDbsUAW9H9B x06E4F0DbS4ZZeIlwJNMFmHRsprRVox5U27GmQMk/8qXRpNfG3R5owntjU1+uzbPUa0S 5zbw85LRImdh+vrwnewpONoEBZRCy0bVza3dAvltMQ+HJ86JdAAfk0lQPgt0n3cWaHB7 8YaxdsaSJ7SkgQFDEFal2KD6OpsJWgBlzAbh7zwgtQsYMh/t6mimQvLBPs+hXZyVVGD7 Eagg== MIME-Version: 1.0 X-Received: by 10.112.201.226 with SMTP id kd2mr18799233lbc.98.1418081459003; Mon, 08 Dec 2014 15:30:59 -0800 (PST) Sender: crodr001@gmail.com Received: by 10.112.130.168 with HTTP; Mon, 8 Dec 2014 15:30:58 -0800 (PST) In-Reply-To: <33053EB5-91C5-4036-8CC2-34103E33A0FA@mu.org> References: <20141208163358.GA52969@potato.growveg.org> <33053EB5-91C5-4036-8CC2-34103E33A0FA@mu.org> Date: Mon, 8 Dec 2014 15:30:58 -0800 X-Google-Sender-Auth: e59PMgYm6LmsimwZRm9W9n1Ba0k Message-ID: Subject: Re: backups of bhyve images From: Craig Rodrigues To: John Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: "freebsd-virtualization@freebsd.org" , fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 08 Dec 2014 23:31:01 -0000 On Dec 8, 2014, at 8:33 AM, John wrote: > > I have each image on its own (external to the image) ZFS filesystem. > Internally the image is using ufs if freebsd, ext3fs if linux. Would > using some ZFS method of duplication be better? In this case, would the > image become inconsistent? I recommend that you do the following: (1) Learn about ZFS zvol: http://zfsonlinux.org/example-zvol.html (2) Instead of creating a big disk image to hold your bhyve VM, use a ZFS zvol (3) When you want to backup the VM, do a "zfs snapshot" take take a snapshot of the ZFS zvol. (4) You can backup the zvol to another host by using "zfs send", and onthe receiving host, you do "zfs receive" The content of your VM can be any file system that you want (UFS, ext4, zfs), but you can backup the ZFS zvol using zfs commands. I've been doing it, and it works really nicely. -- Craig From owner-freebsd-fs@FreeBSD.ORG Tue Dec 9 02:10:51 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 22608999 for ; Tue, 9 Dec 2014 02:10:51 +0000 (UTC) Received: from zulu.iotz.org (zulu.iotz.org [192.73.233.125]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id EF95C640 for ; Tue, 9 Dec 2014 02:10:50 +0000 (UTC) Received: from iozz.us (zulu.iotz.org [192.73.233.125]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by zulu.iotz.org (Postfix) with ESMTPSA id CAD451E0128; Mon, 8 Dec 2014 21:10:48 -0500 (EST) Date: Mon, 8 Dec 2014 21:10:42 -0500 From: Brian N To: Sean Chittenden Subject: Re: ZFS scrub on new disks gives cksum errors Message-ID: <20141209021042.GA649@iozz.us> References: <20141208155230.GA574@iozz.us> <5485CB35.5010608@multiplay.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Dec 2014 02:10:51 -0000 On Mon, Dec 08, 2014 at 09:09:51AM -0800, Sean Chittenden wrote: > I had this exact problem at home on my FreeNAS host when a box is plugged > in to residential power that is "not clean." If you plug this host in to a > UPS it should clear things up instantly (find a sine wave UPS, not a > digital stepping supply). There's something about the fluctuating power or > brownouts that ZFS is well suited to detecting (and became terrifying to me > because wtf happened in the past pre-ZFS??). > > -sc Interesting. I have a UPS (APC Back-UPS ES 550 (stepped)) on the same circuit (residential power) that routinely gives me power outage messages where an outage lasts 1-2 seconds. Other electrical devices (lights, etc) are not affected and I just thought that the UPS was very sensitive. Thanks for the reply. From owner-freebsd-fs@FreeBSD.ORG Tue Dec 9 02:27:28 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A8B45D9B for ; Tue, 9 Dec 2014 02:27:28 +0000 (UTC) Received: from fs.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "NewFS.denninger.net", Issuer "NewFS.denninger.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 62E7187D for ; Tue, 9 Dec 2014 02:27:27 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by fs.denninger.net (8.14.9/8.14.8) with ESMTP id sB92Go9m003870 for ; Mon, 8 Dec 2014 20:16:51 -0600 (CST) (envelope-from karl@denninger.net) Received: from [127.0.0.1] (TLS/SSL) [70.165.30.124] by Spamblock-sys (LOCAL/AUTH); Mon Dec 8 20:16:51 2014 Content-Type: text/plain; charset="iso-8859-1" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Mailer: BlackBerry Email (10.3.1.1154) Message-ID: <20141209021651.4304976.79605.2556@denninger.net> Date: Mon, 08 Dec 2014 20:16:51 -0600 Subject: Re: ZFS scrub on new disks gives cksum errors From: Karl Denninger In-Reply-To: <20141209021042.GA649@iozz.us> References: <20141208155230.GA574@iozz.us> <5485CB35.5010608@multiplay.co.uk> <20141209021042.GA649@iozz.us> To: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Dec 2014 02:27:28 -0000 Stepped-wave ups power is perfectly fine to feed a switching power supply -= - in fact most run COOLER on it than on sine wave power! --=A0Karl (On=A0Passport=A0PDA) =A0 Original Message =A0 From: Brian N Sent: Monday, December 8, 2014 20:11 To: Sean Chittenden Cc: freebsd-fs@freebsd.org Subject: Re: ZFS scrub on new disks gives cksum errors On Mon, Dec 08, 2014 at 09:09:51AM -0800, Sean Chittenden wrote: > I had this exact problem at home on my FreeNAS host when a box is plugged > in to residential power that is "not clean." If you plug this host in to a > UPS it should clear things up instantly (find a sine wave UPS, not a > digital stepping supply). There's something about the fluctuating power or > brownouts that ZFS is well suited to detecting (and became terrifying to = me > because wtf happened in the past pre-ZFS??). >=20 > -sc Interesting. I have a UPS (APC Back-UPS ES 550 (stepped)) on the same circu= it (residential power) that routinely gives me power outage messages where an outage lasts 1-2 seconds. Other electrical devices (lights, etc) are not affected and I just thought that the UPS was very sensitive. Thanks for the reply. _______________________________________________ freebsd-fs@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" %SPAMBLOCK-SYS: Matched [@freebsd.org+], message ok From owner-freebsd-fs@FreeBSD.ORG Tue Dec 9 03:25:21 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1B2AA6F6 for ; Tue, 9 Dec 2014 03:25:21 +0000 (UTC) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id BD415EC5 for ; Tue, 9 Dec 2014 03:25:20 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AtcEABdrhlSDaFve/2dsb2JhbABSB4NYWASDAcMJhg0CgUMBAQEBAX2EAgEBAQMBI1YZAhgCAg0ZAlkGE4gwCKJhnGaXEAEBAQEBAQEBAgEBAQEBAQEBARmBJo4zCw8BFAEzB4JvgUcFiUKKIAeBdJMvhAwhMIEDAR8DH34BAQE X-IronPort-AV: E=Sophos;i="5.07,542,1413259200"; d="scan'208";a="174827258" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 08 Dec 2014 22:24:06 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 7D63AAEA32; Mon, 8 Dec 2014 22:24:06 -0500 (EST) Date: Mon, 8 Dec 2014 22:24:06 -0500 (EST) From: Rick Macklem To: Mark Schouten Message-ID: <891022143.8046492.1418095446417.JavaMail.root@uoguelph.ca> In-Reply-To: <710130010-1872@kerio.tuxis.nl> Subject: Re: Mountd, why not use the '-S' flag by default MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Dec 2014 03:25:21 -0000 Mark Schouten wrote: > Hi, >=20 >=20 > I'm using a FreeBSD nfs-server as storage for my Linux KVM-based > VPS-platform. The images reside on the NFS-server. >=20 >=20 > I'm been noticing errors in my VPS disks when running 'zfs set > sharenfs=3DXYZ', probably because of reloads of mountd. >=20 >=20 > While trying to debug that, I ran acros this message in mountd(8): >=20 > =C2=A0 =C2=A0 =C2=A0-S =C2=A0 =C2=A0 =C2=A0Tell mountd to suspend/resume = execution of the nfsd > =C2=A0 =C2=A0 =C2=A0threads when- > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0ever the exports list is = being reloaded. =C2=A0This avoids > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0intermit- > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0tent access errors for cl= ients that do NFS RPCs while > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0the exports > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0are being reloaded, but i= ntroduces a delay in RPC > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0response while > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0the reload is in progress= . =C2=A0If mountd crashes while an > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0exports > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0load is in progress, moun= td must be restarted to get the > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nfsd > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0threads running again, if= this option is used. >=20 >=20 > I can't think of a reason why you wouldn't want to use -S by > default.. An '/etc/rc.d/mountd reload' without it causes even my > running Bonnie on a normal NFS-share (not via a diskimage) to stop > with 'input/output error'. Can someone enlighten me with the > drawbacks of using -S ? >=20 Well, there are a couple of things: With "-S" all nfsd threads get suspended/resumed whenever exports changes. This can result in a "pause" in NFS server response and that might be considered a POLA violation. It only works for the new NFS server and not the old one and the old one is still used by some. If it was the default, then the old and new NFS servers would have had different behaviour. (Again, this could be considered a POLA violation.) When "-S" was introduced by me, it was done as a "stop gap", since I had thought that mountd would eventually be replaced by nfse (and nfse did allow exports to be updated "atomically" so the problem didn't occur). It now appears that no variant of nfse will end up in FreeBSD. The last one is noted in the description. If, for some reason, mountd crashes during a reload, then all the nfsd threads could be stuck suspended. (I don't know if this occurs in practice.) Basically, I am a coward w.r.t. POLA and almost never change a default. (The one case I did change was making rsize, wsize default to MAX_BSIZE instead of 32K. By some strange twist of fate, this caused a lot of grief, since there was a bug related to TSO segments just under 64K for network interfaces that are limited to 32 transmit segments. I am still saying "disable TSO" to people running older FreeBSD systems because of this.;-) rick >=20 > Met vriendelijke groeten, >=20 > -- > Kerio Operator in de Cloud? https://www.kerioindecloud.nl/ > Mark Schouten | Tuxis Internet Engineering > KvK:=C2=A061527076=C2=A0| http://www.tuxis.nl/ > T: 0318 200208 | info@tuxis.nl From owner-freebsd-fs@FreeBSD.ORG Tue Dec 9 03:31:59 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6F9347C3 for ; Tue, 9 Dec 2014 03:31:59 +0000 (UTC) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id EFF43F8B for ; Tue, 9 Dec 2014 03:31:58 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AtoEAEJshlSDaFve/2dsb2JhbABZg1hYBIMBwwMKhTtOAoFDAQEBAQF9hAIBAQEDAQEBASArIAsFFhgCAg0ZAikBCSYGCAIFBAEcBIgPCA2/OpcQAQEBAQEBBAEBAQEBAQEBAQEYgSaORQEBGwEzB4IxPhGBNgWJQogkgyGDKDCEYodtg16Bfh6BcCEwB4EFOX4BAQE X-IronPort-AV: E=Sophos;i="5.07,542,1413259200"; d="scan'208";a="176530886" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 08 Dec 2014 22:31:50 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id B6293E7956; Mon, 8 Dec 2014 22:31:50 -0500 (EST) Date: Mon, 8 Dec 2014 22:31:50 -0500 (EST) From: Rick Macklem To: =?utf-8?B?TG/Dr2M=?= Blot Message-ID: <766911003.8048587.1418095910736.JavaMail.root@uoguelph.ca> In-Reply-To: Subject: Re: High Kernel Load with nfsv4 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Dec 2014 03:31:59 -0000 Loic Blot wrote: > Hi rick, >=20 > I waited 3 hours (no lag at jail launch) and now I do: sysrc > memcached_flags=3D"-v -m 512" > Command was very very slow... >=20 > Here is a dd over NFS: >=20 > 601062912 bytes transferred in 21.060679 secs (28539579 bytes/sec) >=20 Can you try the same read using an NFSv3 mount? (If it runs much faster, you have probably been bitten by the ZFS "sequential vs random" read heuristic which I've been told things NFS is doing "random" reads without file handle affinity. File handle affinity is very hard to do for NFSv4, so it isn't done.) rick > This is quite slow... >=20 > You can found some nfsstat below (command isn't finished yet) >=20 > nfsstat -c -w 1 >=20 > GtAttr Lookup Rdlink Read Write Rename Access Rddir > 0 0 0 0 0 0 0 0 > 4 0 0 0 0 0 16 0 > 2 0 0 0 0 0 17 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 4 0 0 0 0 4 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 4 0 0 0 0 0 3 0 > 0 0 0 0 0 0 3 0 > 37 10 0 8 0 0 14 1 > 18 16 0 4 1 2 4 0 > 78 91 0 82 6 12 30 0 > 19 18 0 2 2 4 2 0 > 0 0 0 0 2 0 0 0 > 0 0 0 0 0 0 0 0 > GtAttr Lookup Rdlink Read Write Rename Access Rddir > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 1 0 0 0 0 1 0 > 4 6 0 0 6 0 3 0 > 2 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 1 0 0 0 0 0 0 0 > 0 0 0 0 1 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 6 108 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > GtAttr Lookup Rdlink Read Write Rename Access Rddir > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 98 54 0 86 11 0 25 0 > 36 24 0 39 25 0 10 1 > 67 8 0 63 63 0 41 0 > 34 0 0 35 34 0 0 0 > 75 0 0 75 77 0 0 0 > 34 0 0 35 35 0 0 0 > 75 0 0 74 76 0 0 0 > 33 0 0 34 33 0 0 0 > 0 0 0 0 5 0 0 0 > 0 0 0 0 0 0 6 0 > 11 0 0 0 0 0 11 0 > 0 0 0 0 0 0 0 0 > 0 17 0 0 0 0 1 0 > GtAttr Lookup Rdlink Read Write Rename Access Rddir > 4 5 0 0 0 0 12 0 > 2 0 0 0 0 0 26 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 4 0 0 0 0 4 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 4 0 0 0 0 0 2 0 > 2 0 0 0 0 0 24 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > GtAttr Lookup Rdlink Read Write Rename Access Rddir > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 4 0 0 0 0 0 7 0 > 2 1 0 0 0 0 1 0 > 0 0 0 0 2 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 6 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 4 6 0 0 0 0 3 0 > 0 0 0 0 0 0 0 0 > 2 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > GtAttr Lookup Rdlink Read Write Rename Access Rddir > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 4 71 0 0 0 0 0 0 > 0 1 0 0 0 0 0 0 > 2 36 0 0 0 0 1 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 1 0 0 0 0 0 1 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 79 6 0 79 79 0 2 0 > 25 0 0 25 26 0 6 0 > 43 18 0 39 46 0 23 0 > 36 0 0 36 36 0 31 0 > 68 1 0 66 68 0 0 0 > GtAttr Lookup Rdlink Read Write Rename Access Rddir > 36 0 0 36 36 0 0 0 > 48 0 0 48 49 0 0 0 > 20 0 0 20 20 0 0 0 > 0 0 0 0 0 0 0 0 > 3 14 0 1 0 0 11 0 > 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 > 0 4 0 0 0 0 4 0 > 0 0 0 0 0 0 0 0 > 4 22 0 0 0 0 16 0 > 2 0 0 0 0 0 23 0 >=20 > Regards, >=20 > Lo=C3=AFc Blot, > UNIX Systems, Network and Security Engineer > http://www.unix-experience.fr >=20 > 8 d=C3=A9cembre 2014 09:36 "Lo=C3=AFc Blot" a > =C3=A9crit: > > Hi Rick, > > I stopped the jails this week-end and started it this morning, i'll > > give you some stats this week. > >=20 > > Here is my nfsstat -m output (with your rsize/wsize tweaks) > >=20 > > nfsv4,tcp,resvport,hard,cto,sec=3Dsys,acdirmin=3D3,acdirmax=3D60,acregm= in=3D5,acregmax=3D60,nametimeo=3D60,negna > > etimeo=3D60,rsize=3D32768,wsize=3D32768,readdirsize=3D32768,readahead= =3D1,wcommitsize=3D773136,timeout=3D120,retra > > s=3D2147483647 > >=20 > > On server side my disks are on a raid controller which show a 512b > > volume and write performances > > are very honest (dd if=3D/dev/zero of=3D/jails/test.dd bs=3D4096 > > count=3D100000000 =3D> 450MBps) > >=20 > > Regards, > >=20 > > Lo=C3=AFc Blot, > > UNIX Systems, Network and Security Engineer > > http://www.unix-experience.fr > >=20 > > 5 d=C3=A9cembre 2014 15:14 "Rick Macklem" a > > =C3=A9crit: > >=20 > >> Loic Blot wrote: > >>=20 > >>> Hi, > >>> i'm trying to create a virtualisation environment based on jails. > >>> Those jails are stored under a big ZFS pool on a FreeBSD 9.3 > >>> which > >>> export a NFSv4 volume. This NFSv4 volume was mounted on a big > >>> hypervisor (2 Xeon E5v3 + 128GB memory and 8 ports (but only 1 > >>> was > >>> used at this time). > >>>=20 > >>> The problem is simple, my hypervisors runs 6 jails (used 1% cpu > >>> and > >>> 10GB RAM approximatively and less than 1MB bandwidth) and works > >>> fine at start but the system slows down and after 2-3 days become > >>> unusable. When i look at top command i see 80-100% on system and > >>> commands are very very slow. Many process are tagged with > >>> nfs_cl*. > >>=20 > >> To be honest, I would expect the slowness to be because of slow > >> response > >> from the NFSv4 server, but if you do: > >> # ps axHl > >> on a client when it is slow and post that, it would give us some > >> more > >> information on where the client side processes are sitting. > >> If you also do something like: > >> # nfsstat -c -w 1 > >> and let it run for a while, that should show you how many RPCs are > >> being done and which ones. > >>=20 > >> # nfsstat -m > >> will show you what your mount is actually using. > >> The only mount option I can suggest trying is > >> "rsize=3D32768,wsize=3D32768", > >> since some network environments have difficulties with 64K. > >>=20 > >> There are a few things you can try on the NFSv4 server side, if it > >> appears > >> that the clients are generating a large RPC load. > >> - disabling the DRC cache for TCP by setting vfs.nfsd.cachetcp=3D0 > >> - If the server is seeing a large write RPC load, then > >> "sync=3Ddisabled" > >> might help, although it does run a risk of data loss when the > >> server > >> crashes. > >> Then there are a couple of other ZFS related things (I'm not a ZFS > >> guy, > >> but these have shown up on the mailing lists). > >> - make sure your volumes are 4K aligned and ashift=3D12 (in case a > >> drive > >> that uses 4K sectors is pretending to be 512byte sectored) > >> - never run over 70-80% full if write performance is an issue > >> - use a zil on an SSD with good write performance > >>=20 > >> The only NFSv4 thing I can tell you is that it is known that ZFS's > >> algorithm for determining sequential vs random I/O fails for NFSv4 > >> during writing and this can be a performance hit. The only > >> workaround > >> is to use NFSv3 mounts, since file handle affinity apparently > >> fixes > >> the problem and this is only done for NFSv3. > >>=20 > >> rick > >>=20 > >>> I saw that there are TSO issues with igb then i'm trying to > >>> disable > >>> it with sysctl but the situation wasn't solved. > >>>=20 > >>> Someone has got ideas ? I can give you more informations if you > >>> need. > >>>=20 > >>> Thanks in advance. > >>> Regards, > >>>=20 > >>> Lo=C3=AFc Blot, > >>> UNIX Systems, Network and Security Engineer > >>> http://www.unix-experience.fr > >>> _______________________________________________ > >>> freebsd-fs@freebsd.org mailing list > >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>> To unsubscribe, send any mail to > >>> "freebsd-fs-unsubscribe@freebsd.org" > >=20 > > _______________________________________________ > > freebsd-fs@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > To unsubscribe, send any mail to > > "freebsd-fs-unsubscribe@freebsd.org" >=20 From owner-freebsd-fs@FreeBSD.ORG Tue Dec 9 08:45:26 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 4789972F for ; Tue, 9 Dec 2014 08:45:26 +0000 (UTC) Received: from kerio.tuxis.nl (alcyone.saas.tuxis.net [31.3.111.19]) (using TLSv1.1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D3D4317B for ; Tue, 9 Dec 2014 08:45:24 +0000 (UTC) X-Footer: dHV4aXMubmw= Received: from [31.3.104.222] ([31.3.104.222]) by kerio.tuxis.nl (Kerio Connect 8.4.0); Tue, 9 Dec 2014 09:45:21 +0100 From: "Mark Schouten" Subject: Re: Mountd, why not use the '-S' flag by default To: "Rick Macklem" Organization: Tuxis Internet Engineering In-Reply-To: <891022143.8046492.1418095446417.JavaMail.root@uoguelph.ca> Message-ID: <20141209084521.05ccb5c7@kerio.tuxis.nl> Date: Tue, 09 Dec 2014 09:45:21 +0100 X-Mailer: Kerio Connect 8.4.0 WebMail X-User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Dec 2014 08:45:26 -0000 Hi, Rik Macklem wrote: > Well, there are a couple of things: > With "-S" all nfsd threads get suspended/resumed whenever exports > changes. This can result in a "pause" in NFS server response and > that might be considered a POLA violation. IMHO, An incidental raised latency is better than input/output errors wh= ich breaks files. > It only works for the new NFS server and not the old one and the > old one is still used by some. If it was the default, then the > old and new NFS servers would have had different behaviour. > (Again, this could be considered a POLA violation.) Backport=3F Or detect the old nfsd and ignore the -S flag if it is found= =3F > When "-S" was introduced by me, it was done as a "stop gap", since I > had thought that mountd would eventually be replaced by nfse > (and nfse did allow exports to be updated "atomically" so the problem > didn't occur). It now appears that no variant of nfse will end up > in FreeBSD. That's more of a reason to enable it by default, if you ask me. > The last one is noted in the description. If, for some reason, mountd > crashes during a reload, then all the nfsd threads could be stuck > suspended. (I don't know if this occurs in practice.) I've tested (not thoroughly) and I didn't get my client to break. Even w= ith errors in the exports-file, stuff kept working. Doesn't most of NFS = break without mountd anyways=3F > Basically, I am a coward w.r.t. POLA and almost never change a > default. (The one case I did change was making rsize, wsize > default to MAX=5FBSIZE instead of 32K. By some strange twist of > fate, this caused a lot of grief, since there was a bug related > to TSO segments just under 64K for network interfaces that are > limited to 32 transmit segments. I am still saying "disable TSO" > to people running older FreeBSD systems because of this.;-) Hehe. :) All I can say is that I would prefer rsize and wsize to be bigg= er. But that's a different story. :) Regards, -- Kerio Operator in de Cloud=3F https://www.kerioindecloud.nl/ Mark Schouten | Tuxis Internet Engineering KvK:=C2=A061527076=C2=A0| http://www.tuxis.nl/ T: 0318 200208 | info@tuxis.nl From owner-freebsd-fs@FreeBSD.ORG Tue Dec 9 12:41:23 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2CED69CC for ; Tue, 9 Dec 2014 12:41:23 +0000 (UTC) Received: from mail.ijs.si (mail.ijs.si [IPv6:2001:1470:ff80::25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D0E3BE5B for ; Tue, 9 Dec 2014 12:41:22 +0000 (UTC) Received: from amavis-proxy-ori.ijs.si (localhost [IPv6:::1]) by mail.ijs.si (Postfix) with ESMTP id 3jxgxC2gdDz16m for ; Tue, 9 Dec 2014 13:41:19 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ijs.si; h= user-agent:message-id:references:in-reply-to:organization :subject:subject:from:from:date:date:content-transfer-encoding :content-type:content-type:mime-version:received:received :received:received; s=jakla4; t=1418128876; x=1420720877; bh=rJy RJCIDLBGWNj4TM/3+jwzU34GAbc/JqcUV6YwNaSU=; b=le/S7cteN3d/9Sy28RT SBwuvlwgzOPRBJSoLQZ4SV2KOoYiRFTJNxbYR4f0pT685o6C7kV6cFI5R8PDGh8M x5vbTheYCNz0gXLG6wi9nL4dx5tWBD/5BH3+yLHJDh7yiXJbxctkkAcwk1QqstND cyhUE8wbqYSAc7SQuJHMx0ak= X-Virus-Scanned: amavisd-new at ijs.si Received: from mail.ijs.si ([IPv6:::1]) by amavis-proxy-ori.ijs.si (mail.ijs.si [IPv6:::1]) (amavisd-new, port 10012) with ESMTP id e4GOfBYc0wQr for ; Tue, 9 Dec 2014 13:41:16 +0100 (CET) Received: from mildred.ijs.si (mailbox.ijs.si [IPv6:2001:1470:ff80::143:1]) by mail.ijs.si (Postfix) with ESMTP for ; Tue, 9 Dec 2014 13:41:15 +0100 (CET) Received: from neli.ijs.si (neli.ijs.si [IPv6:2001:1470:ff80:88:21c:c0ff:feb1:8c91]) by mildred.ijs.si (Postfix) with ESMTP id 3jxgx76MbkzDn for ; Tue, 9 Dec 2014 13:41:15 +0100 (CET) Received: from sleepy.ijs.si ([2001:1470:ff80:e001::1:1]) by neli.ijs.si with HTTP (HTTP/1.1 POST); Tue, 09 Dec 2014 13:41:15 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Date: Tue, 09 Dec 2014 13:41:15 +0100 From: Mark Martinec To: freebsd-fs@freebsd.org Subject: Re: ZFS scrub on new disks gives cksum errors Organization: J. Stefan Institute In-Reply-To: <20141209021651.4304976.79605.2556@denninger.net> References: <20141208155230.GA574@iozz.us> <5485CB35.5010608@multiplay.co.uk> <20141209021042.GA649@iozz.us> <20141209021651.4304976.79605.2556@denninger.net> Message-ID: <5216a88b2411cb197af1170e5a81bab0@mailbox.ijs.si> X-Sender: Mark.Martinec+freebsd@ijs.si User-Agent: Roundcube Webmail/1.0.3 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Dec 2014 12:41:23 -0000 The key here is 'true on-line' (double-conversion). The waveform itself does not matter much, can be sine or pseudo-sine. Line-interactive and off-line UPS devices with line filters are not good candidates when quality of grid is poor. They handle clean true power outages just fine, but dealing with brownouts and electrical noise calls for a better UPS. Mark 2014-12-09 03:16, Karl Denninger wrote > Stepped-wave ups power is perfectly fine to feed a switching power > supply -- in fact most run COOLER on it than on sine wave power! >=20 > --=C2=A0Karl > From: Brian N > Sent: Monday, December 8, 2014 20:11 > To: Sean Chittenden > Cc: freebsd-fs@freebsd.org > Subject: Re: ZFS scrub on new disks gives cksum errors >=20 > On Mon, Dec 08, 2014 at 09:09:51AM -0800, Sean Chittenden wrote: >> I had this exact problem at home on my FreeNAS host when a box is=20 >> plugged >> in to residential power that is "not clean." If you plug this host in=20 >> to a >> UPS it should clear things up instantly (find a sine wave UPS, not a >> digital stepping supply). There's something about the fluctuating=20 >> power or >> brownouts that ZFS is well suited to detecting (and became terrifying=20 >> to me >> because wtf happened in the past pre-ZFS??). >>=20 >> -sc >=20 > Interesting. I have a UPS (APC Back-UPS ES 550 (stepped)) on the same=20 > circuit > (residential power) that routinely gives me power outage messages where= =20 > an > outage lasts 1-2 seconds. Other electrical devices (lights, etc) are=20 > not > affected and I just thought that the UPS was very sensitive. Thanks for= =20 > the > reply. From owner-freebsd-fs@FreeBSD.ORG Tue Dec 9 13:58:49 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 49559150 for ; Tue, 9 Dec 2014 13:58:49 +0000 (UTC) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id EDA1B876 for ; Tue, 9 Dec 2014 13:58:48 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AtgEAAL/hlSDaFve/2dsb2JhbABSB4NYWASDAcMQBoYNAoE5AQEBAQF9hAIBAQEDASNWBRYYAgINGQJZBhOIMAi/dZccAQEBAQEBAQEBAQEBAQEBAQEBARmBJo4+AQ8UNAeCMT4RgTYFiUKKIAeBdJMvhAwhMIEDAR8DH34BAQE X-IronPort-AV: E=Sophos;i="5.07,545,1413259200"; d="scan'208";a="174909690" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 09 Dec 2014 08:58:46 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id C7BFEB3F2B; Tue, 9 Dec 2014 08:58:46 -0500 (EST) Date: Tue, 9 Dec 2014 08:58:46 -0500 (EST) From: Rick Macklem To: Mark Schouten Message-ID: <1144091338.8206962.1418133526806.JavaMail.root@uoguelph.ca> In-Reply-To: <20141209084521.05ccb5c7@kerio.tuxis.nl> Subject: Re: Mountd, why not use the '-S' flag by default MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Dec 2014 13:58:49 -0000 Mark Schouten wrote: > Hi, >=20 > Rik Macklem wrote: > > Well, there are a couple of things: > > With "-S" all nfsd threads get suspended/resumed whenever exports > > changes. This can result in a "pause" in NFS server response and > > that might be considered a POLA violation. >=20 > IMHO, An incidental raised latency is better than input/output errors > which breaks files. >=20 > > It only works for the new NFS server and not the old one and the > > old one is still used by some. If it was the default, then the > > old and new NFS servers would have had different behaviour. > > (Again, this could be considered a POLA violation.) >=20 > Backport? Or detect the old nfsd and ignore the -S flag if it is > found? >=20 The "-S" flag will be ignored for the old server if I recall correctly. (Actually, the system call it does fails with an error, but the code doesn't mind that.) If I am wrong, then this is a bug that should be fixed. The problem is that you have differing behaviour between the two NFS servers when the flag is used. POLA stands for Principal of Least Astonishment (or something close to that) and my understanding is that this means default behaviour isn't supposed to change. (Although I wasn't the guy who committed it, I got complaints about the default # of nfsd threads changing from 4 to 8 * #cores. The default of 4 was selected in the 1980s for machines N orders of magnitude slower than the slowest hardware to-day, but some still felt the default shouldn't change.) My point, FreeBSDers take this POLA principal seriously, so I try hard not to violate it. Also, adding a new option is an easy way to document a change in the man page. > > When "-S" was introduced by me, it was done as a "stop gap", since > > I > > had thought that mountd would eventually be replaced by nfse > > (and nfse did allow exports to be updated "atomically" so the > > problem > > didn't occur). It now appears that no variant of nfse will end up > > in FreeBSD. >=20 > That's more of a reason to enable it by default, if you ask me. >=20 > > The last one is noted in the description. If, for some reason, > > mountd > > crashes during a reload, then all the nfsd threads could be stuck > > suspended. (I don't know if this occurs in practice.) >=20 > I've tested (not thoroughly) and I didn't get my client to break. > Even with errors in the exports-file, stuff kept working. Doesn't > most of NFS break without mountd anyways? >=20 > > Basically, I am a coward w.r.t. POLA and almost never change a > > default. (The one case I did change was making rsize, wsize > > default to MAX_BSIZE instead of 32K. By some strange twist of > > fate, this caused a lot of grief, since there was a bug related > > to TSO segments just under 64K for network interfaces that are > > limited to 32 transmit segments. I am still saying "disable TSO" > > to people running older FreeBSD systems because of this.;-) >=20 > Hehe. :) All I can say is that I would prefer rsize and wsize to be > bigger. But that's a different story. :) >=20 Well, MAX_BSIZE is an internal limit, but I do hope to increase MAX_BSIZE someday. I've done some testing with it set to 128K and didn't find any problems. (It's a 1 line kernel change you can try if you'd like.) 128K does seem to be desirable, since my understanding is that ZFS likes to use this size. Some will find that smaller sizes do perform better, but most should see comparable or better performance with a larger block size, from what my experience. rick > Regards, >=20 > -- > Kerio Operator in de Cloud? https://www.kerioindecloud.nl/ > Mark Schouten | Tuxis Internet Engineering > KvK:=C2=A061527076=C2=A0| http://www.tuxis.nl/ > T: 0318 200208 | info@tuxis.nl >=20 From owner-freebsd-fs@FreeBSD.ORG Wed Dec 10 00:38:56 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id ED53FE00 for ; Wed, 10 Dec 2014 00:38:56 +0000 (UTC) Received: from styx.niksun.com (styx.niksun.com [24.104.71.38]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "*.niksun.com", Issuer "Go Daddy Secure Certification Authority" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9DC2BD90 for ; Wed, 10 Dec 2014 00:38:56 +0000 (UTC) Received: from EXCHANGE2013A.mj.niksun.com (10.25.8.14) by EXCHANGE2013A.mj.niksun.com (10.25.8.14) with Microsoft SMTP Server (TLS) id 15.0.913.22; Tue, 9 Dec 2014 19:23:17 -0500 Received: from EXCHANGE2010A.mj.niksun.com (10.25.8.13) by EXCHANGE2013A.mj.niksun.com (10.25.8.14) with Microsoft SMTP Server (TLS) id 15.0.913.22 via Frontend Transport; Tue, 9 Dec 2014 19:23:17 -0500 Received: from yuengling.local (10.24.4.176) by Exchange2010A.mj.niksun.com (10.25.8.13) with Microsoft SMTP Server (TLS) id 14.3.174.1; Tue, 9 Dec 2014 19:23:17 -0500 Message-ID: <54879274.5010001@niksun.com> Date: Tue, 9 Dec 2014 19:23:16 -0500 From: Andrew Heybey User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.1.2 MIME-Version: 1.0 To: Zaphod Beeblebrox Subject: Re: ZDB -Z? References: In-Reply-To: Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Dec 2014 00:38:57 -0000 On 11/24/14 1:49 PM, Zaphod Beeblebrox wrote: > I'm reading about someone else's recovery of files from a damaged ZFS > partition. He claims to have added (possibly to opensolaris or whatnot) an > argument to zdb '-Z' ... which operates somewhat like -R, but which > highlights what parts of the region are on what physical disks, and which > are parity. > > Has anyone patched this into FreeBSD? Sorry for the late reply, I am behind on my mailing list reading. I assume you were looking at this post: http://mbruning.blogspot.com/2009_12_01_archive.html I was also recently trying to recover data in a ZFS pool. I made an ugly attempt at -Z for zdb. It will not work for anything but RAIDZ pools (I tried it on one containing two 6-disk raidz1 vdevs). The diff (against FreeBSD 10) is in this email. I copy-pasted the static function vdev_raidz_map() out of libzfs since it is static and not callable externally. Not very tasteful but it worked for me. andrew commit 86ab9e2dab7e76dcdf527d2aa6b84a2fe429ee28 Author: Andrew Heybey Date: Tue Nov 18 15:00:57 2014 -0500 zdb: Add -Z flag like http://mbruning.blogspot.com/2009/12/zfs-raidz-data-walk.html diff --git a/cddl/contrib/opensolaris/cmd/zdb/zdb.c b/cddl/contrib/opensolaris/cmd/zdb/zdb.c index c265c99..bf43ea1 100644 --- a/cddl/contrib/opensolaris/cmd/zdb/zdb.c +++ b/cddl/contrib/opensolaris/cmd/zdb/zdb.c @@ -59,6 +59,7 @@ #include #include #include +#include #undef ZFS_MAXNAMELEN #undef verify #include @@ -2745,6 +2746,168 @@ zdb_dump_block(char *label, void *buf, uint64_t size, int flags) } } + +typedef struct raidz_col { + uint64_t rc_devidx; /* child device index for I/O */ + uint64_t rc_offset; /* device offset */ + uint64_t rc_size; /* I/O size */ + void *rc_data; /* I/O data */ + void *rc_gdata; /* used to store the "good" version */ + int rc_error; /* I/O error for this device */ + uint8_t rc_tried; /* Did we attempt this I/O column? */ + uint8_t rc_skipped; /* Did we skip this I/O column? */ +} raidz_col_t; + +typedef struct raidz_map { + uint64_t rm_cols; /* Regular column count */ + uint64_t rm_scols; /* Count including skipped columns */ + uint64_t rm_bigcols; /* Number of oversized columns */ + uint64_t rm_asize; /* Actual total I/O size */ + uint64_t rm_missingdata; /* Count of missing data devices */ + uint64_t rm_missingparity; /* Count of missing parity devices */ + uint64_t rm_firstdatacol; /* First data column/parity count */ + uint64_t rm_nskip; /* Skipped sectors for padding */ + uint64_t rm_skipstart; /* Column index of padding start */ + void *rm_datacopy; /* rm_asize-buffer of copied data */ + uintptr_t rm_reports; /* # of referencing checksum reports */ + uint8_t rm_freed; /* map no longer has referencing ZIO */ + uint8_t rm_ecksuminjected; /* checksum error was injected */ + raidz_col_t rm_col[1]; /* Flexible array of I/O columns */ +} raidz_map_t; + +/* + * Divides the IO evenly across all child vdevs; usually, dcols is + * the number of children in the target vdev. + * + * copy-pasted from vdev_raidz in the ZFS sources + */ +raidz_map_t* +vdev_raidz_map(uint64_t size, uint64_t offset, uint64_t unit_shift, + uint64_t dcols, uint64_t nparity) +{ + raidz_map_t* rm; + /* The starting RAIDZ (parent) vdev sector of the block. */ + uint64_t b = offset >> unit_shift; + /* The zio's size in units of the vdev's minimum sector size. */ + uint64_t s = size >> unit_shift; + /* The first column for this stripe. */ + uint64_t f = b % dcols; + /* The starting byte offset on each child vdev. */ + uint64_t o = (b / dcols) << unit_shift; + uint64_t q, r, c, bc, col, acols, scols, coff, devidx, asize, tot; + + /* + * "Quotient": The number of data sectors for this stripe on all but + * the "big column" child vdevs that also contain "remainder" data. + */ + q = s / (dcols - nparity); + + /* + * "Remainder": The number of partial stripe data sectors in this I/O. + * This will add a sector to some, but not all, child vdevs. + */ + r = s - q * (dcols - nparity); + + /* The number of "big columns" - those which contain remainder data. */ + bc = (r == 0 ? 0 : r + nparity); + + /* + * The total number of data and parity sectors associated with + * this I/O. + */ + tot = s + nparity * (q + (r == 0 ? 0 : 1)); + + /* acols: The columns that will be accessed. */ + /* scols: The columns that will be accessed or skipped. */ + if (q == 0) { + /* Our I/O request doesn't span all child vdevs. */ + acols = bc; + scols = MIN(dcols, roundup(bc, nparity + 1)); + } else { + acols = dcols; + scols = dcols; + } + + rm = umem_alloc(offsetof(raidz_map_t, rm_col[scols]), KM_SLEEP); + + rm->rm_cols = acols; + rm->rm_scols = scols; + rm->rm_bigcols = bc; + rm->rm_skipstart = bc; + rm->rm_missingdata = 0; + rm->rm_missingparity = 0; + rm->rm_firstdatacol = nparity; + rm->rm_datacopy = NULL; + rm->rm_reports = 0; + rm->rm_freed = 0; + rm->rm_ecksuminjected = 0; + + asize = 0; + + for (c = 0; c < scols; c++) { + col = f + c; + coff = o; + if (col >= dcols) { + col -= dcols; + coff += 1ULL << unit_shift; + } + rm->rm_col[c].rc_devidx = col; + rm->rm_col[c].rc_offset = coff; + rm->rm_col[c].rc_data = NULL; + rm->rm_col[c].rc_gdata = NULL; + rm->rm_col[c].rc_error = 0; + rm->rm_col[c].rc_tried = 0; + rm->rm_col[c].rc_skipped = 0; + + if (c >= acols) + rm->rm_col[c].rc_size = 0; + else if (c < bc) + rm->rm_col[c].rc_size = (q + 1) << unit_shift; + else + rm->rm_col[c].rc_size = q << unit_shift; + + asize += rm->rm_col[c].rc_size; + } + + rm->rm_asize = roundup(asize, (nparity + 1) << unit_shift); + rm->rm_nskip = roundup(tot, nparity + 1) - tot; + + /* + * If all data stored spans all columns, there's a danger that parity + * will always be on the same device and, since parity isn't read + * during normal operation, that that device's I/O bandwidth won't be + * used effectively. We therefore switch the parity every 1MB. + * + * ... at least that was, ostensibly, the theory. As a practical + * matter unless we juggle the parity between all devices evenly, we + * won't see any benefit. Further, occasional writes that aren't a + * multiple of the LCM of the number of children and the minimum + * stripe width are sufficient to avoid pessimal behavior. + * Unfortunately, this decision created an implicit on-disk format + * requirement that we need to support for all eternity, but only + * for single-parity RAID-Z. + * + * If we intend to skip a sector in the zeroth column for padding + * we must make sure to note this swap. We will never intend to + * skip the first column since at least one data and one parity + * column must appear in each row. + */ + if (rm->rm_firstdatacol == 1 && (offset & (1ULL << 20))) { + devidx = rm->rm_col[0].rc_devidx; + o = rm->rm_col[0].rc_offset; + rm->rm_col[0].rc_devidx = rm->rm_col[1].rc_devidx; + rm->rm_col[0].rc_offset = rm->rm_col[1].rc_offset; + rm->rm_col[1].rc_devidx = devidx; + rm->rm_col[1].rc_offset = o; + + if (rm->rm_skipstart == 0) + rm->rm_skipstart = 1; + } + + return (rm); +} + + /* * There are two acceptable formats: * leaf_name - For example: c1t0d0 or /tmp/ztest.0a @@ -2803,8 +2966,10 @@ name: } /* - * Read a block from a pool and print it out. The syntax of the - * block descriptor is: + * Read a block from a pool and print it out, or (if Zflag is true) + * print out where the block is found on the constituents of the vdev. + * + * The syntax of the block descriptor is: * * pool:vdev_specifier:offset:size[:flags] * @@ -2825,7 +2990,7 @@ name: * * = not yet implemented */ static void -zdb_read_block(char *thing, spa_t *spa) +zdb_read_block(char *thing, spa_t *spa, boolean_t Zflag) { blkptr_t blk, *bp = &blk; dva_t *dva = bp->blk_dva; @@ -2904,6 +3069,22 @@ zdb_read_block(char *thing, spa_t *spa) psize = size; lsize = size; + if (Zflag) { + raidz_map_t* rm; + rm = vdev_raidz_map(psize, offset, vd->vdev_ashift, + vd->vdev_children, vd->vdev_nparity); + (void) printf("columns %lu bigcols %lu asize %lu firstdatacol %lu\n", + rm->rm_cols, rm->rm_bigcols, rm->rm_asize, + rm->rm_firstdatacol); + for (int c = 0; c < rm->rm_scols; ++c) { + raidz_col_t* rc = &rm->rm_col[c]; + (void) printf("devidx %lu offset 0x%lx size 0x%lx\n", + rc->rc_devidx, rc->rc_offset, rc->rc_size); + } + umem_free(rm, offsetof(raidz_map_t, rm_col[rm->rm_scols])); + return; + } + pbuf = umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL); lbuf = umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL); @@ -3124,7 +3305,7 @@ main(int argc, char **argv) dprintf_setup(&argc, argv); - while ((c = getopt(argc, argv, "bcdhilmsuCDRSAFLXevp:t:U:P")) != -1) { + while ((c = getopt(argc, argv, "bcdhilmsuCDRSAFLXevp:t:U:PZ")) != -1) { switch (c) { case 'b': case 'c': @@ -3139,6 +3320,7 @@ main(int argc, char **argv) case 'D': case 'R': case 'S': + case 'Z': dump_opt[c]++; dump_all = 0; break; @@ -3197,6 +3379,9 @@ main(int argc, char **argv) if (dump_all) verbose = MAX(verbose, 1); + if (dump_opt['Z']) + dump_opt['R'] = 1; + for (c = 0; c < 256; c++) { if (dump_all && !strchr("elAFLRSXP", c)) dump_opt[c] = 1; @@ -3325,7 +3510,7 @@ main(int argc, char **argv) flagbits['r'] = ZDB_FLAG_RAW; for (i = 0; i < argc; i++) - zdb_read_block(argv[i], spa); + zdb_read_block(argv[i], spa, dump_opt['Z']); } (os != NULL) ? dmu_objset_disown(os, FTAG) : spa_close(spa, FTAG); From owner-freebsd-fs@FreeBSD.ORG Wed Dec 10 03:41:37 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 26DDEE45; Wed, 10 Dec 2014 03:41:37 +0000 (UTC) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id D25FF619; Wed, 10 Dec 2014 03:41:36 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqYEALa/h1SDaFve/2dsb2JhbABZhDSDAckzgUMBAQEBAX2ELARSNQINGQJfiEu/dpclAQEBAQYBAQEBAQEcgSaOKTSCdoFHBYlAnzSEDCGBdX4BAQE X-IronPort-AV: E=Sophos;i="5.07,549,1413259200"; d="scan'208";a="176844796" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-annu.net.uoguelph.ca with ESMTP; 09 Dec 2014 22:41:31 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 358F4B404B; Tue, 9 Dec 2014 22:41:30 -0500 (EST) Date: Tue, 9 Dec 2014 22:41:30 -0500 (EST) From: Rick Macklem To: FreeBSD Filesystems Message-ID: <156074187.8997064.1418182890206.JavaMail.root@uoguelph.ca> Subject: fuse dirent bug??? MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) Cc: George Neville-Neil X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Dec 2014 03:41:37 -0000 Hi, While looking at the fuse code to change it to use a new "struct dirent", I spotted this line, which doesn't look correct. Line 358 of sys/fs/fuse/fuse_internal.c: ((char *)cookediov->base)[bytesavail] = '\0'; - I think this is intended to null terminate the name, since it comes right after the memcpy() of the file name. However, bytesavail is the value returned by GENERIC_DIRSIZ(), which means [bytesavail] after "cookediov->base" would be the first byte after the "struct dirent" (including the space for null termination and padding. If I'm correct, I think this line can be replaced by: de->d_name[fudge->namelen] = '\0'; which would be the byte after the name in the structure. Also, although I think the first argument to the memcpy() call just above this is correct, it is complex/convoluted. Wouldn't just writing "memcpy(de->d_name, ..." make it more readable? Anyone out there familiar with fuse able to look at/test this? Thanks, rick From owner-freebsd-fs@FreeBSD.ORG Wed Dec 10 09:24:35 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7DF1076F; Wed, 10 Dec 2014 09:24:35 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E7A25169; Wed, 10 Dec 2014 09:24:34 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id sBA9OSc3052136 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 10 Dec 2014 11:24:28 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua sBA9OSc3052136 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id sBA9OSjm052135; Wed, 10 Dec 2014 11:24:28 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 10 Dec 2014 11:24:28 +0200 From: Konstantin Belousov To: Rick Macklem Subject: Re: fuse dirent bug??? Message-ID: <20141210092428.GE97072@kib.kiev.ua> References: <156074187.8997064.1418182890206.JavaMail.root@uoguelph.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <156074187.8997064.1418182890206.JavaMail.root@uoguelph.ca> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: FreeBSD Filesystems , George Neville-Neil X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Dec 2014 09:24:35 -0000 On Tue, Dec 09, 2014 at 10:41:30PM -0500, Rick Macklem wrote: > Hi, > > While looking at the fuse code to change it to use a new > "struct dirent", I spotted this line, which doesn't look > correct. > > Line 358 of sys/fs/fuse/fuse_internal.c: > ((char *)cookediov->base)[bytesavail] = '\0'; > - I think this is intended to null terminate the name, > since it comes right after the memcpy() of the file name. > However, bytesavail is the value returned by GENERIC_DIRSIZ(), > which means [bytesavail] after "cookediov->base" would be the > first byte after the "struct dirent" (including the space for > null termination and padding. > > If I'm correct, I think this line can be replaced by: > de->d_name[fudge->namelen] = '\0'; > which would be the byte after the name in the structure. > > Also, although I think the first argument to the memcpy() call > just above this is correct, it is complex/convoluted. > Wouldn't just writing "memcpy(de->d_name, ..." make it > more readable? > > Anyone out there familiar with fuse able to look at/test this? No, I am not familiar with fuse. Still, I think you are right. OTOH, it is probably very rare to result in the actual override of the last byte after the buffer, since dirents have to fill the buffer to the last byte. One additional note. The getdirentries(2) specifies that the name must be null-terminated. But sys/dirent.h comment claims that the whole padding must be zeroed. I did not tracked the source of the buffer in fuse_internal_readdir(), so my question is whether the buffer is zeroed before filled. If not, padding must be cleared. From owner-freebsd-fs@FreeBSD.ORG Wed Dec 10 11:22:30 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 788E7DA4 for ; Wed, 10 Dec 2014 11:22:30 +0000 (UTC) Received: from hades.sorbs.net (hades.sorbs.net [67.231.146.201]) by mx1.freebsd.org (Postfix) with ESMTP id 6775D9F for ; Wed, 10 Dec 2014 11:22:30 +0000 (UTC) MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII Received: from isux.com (firewall.isux.com [213.165.190.213]) by hades.sorbs.net (Oracle Communications Messaging Server 7.0.5.29.0 64bit (built Jul 9 2013)) with ESMTPSA id <0NGD003CT6GT2800@hades.sorbs.net> for freebsd-fs@freebsd.org; Wed, 10 Dec 2014 03:26:54 -0800 (PST) Message-id: <54882CED.6000909@sorbs.net> Date: Wed, 10 Dec 2014 12:22:21 +0100 From: Michelle Sullivan User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.24) Gecko/20100301 SeaMonkey/1.1.19 To: "freebsd-fs@freebsd.org" Subject: Minor cosmetic issue... 100.01% done... X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Dec 2014 11:22:30 -0000 pool: sorbs state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Mon Dec 8 07:13:45 2014 29.9T scanned out of 29.9T at 164M/s, 0h0m to go 1.98T resilvered, 100.00% done pool: sorbs state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Mon Dec 8 07:13:45 2014 29.9T scanned out of 29.9T at 164M/s, (scan is slow, no estimated time) 1.98T resilvered, 100.00% done pool: sorbs state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Mon Dec 8 07:13:45 2014 29.9T scanned out of 29.9T at 164M/s, (scan is slow, no estimated time) 1.98T resilvered, 100.01% done pool: sorbs state: ONLINE scan: resilvered 1.98T in 53h5m with 0 errors on Wed Dec 10 12:19:16 2014 config: -- Michelle Sullivan http://www.mhix.org/ From owner-freebsd-fs@FreeBSD.ORG Wed Dec 10 11:33:27 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D5040129 for ; Wed, 10 Dec 2014 11:33:27 +0000 (UTC) Received: from smtp.unix-experience.fr (195-154-176-227.rev.poneytelecom.eu [195.154.176.227]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 929201C1 for ; Wed, 10 Dec 2014 11:33:26 +0000 (UTC) Received: from smtp.unix-experience.fr (unknown [192.168.200.21]) by smtp.unix-experience.fr (Postfix) with ESMTP id EFC2013CF; Wed, 10 Dec 2014 11:33:17 +0000 (UTC) X-Virus-Scanned: scanned by unix-experience.fr Received: from smtp.unix-experience.fr ([192.168.200.21]) by smtp.unix-experience.fr (smtp.unix-experience.fr [192.168.200.21]) (amavisd-new, port 10024) with ESMTP id knhIw6ukiDUd; Wed, 10 Dec 2014 11:33:14 +0000 (UTC) Received: from mail.unix-experience.fr (repo.unix-experience.fr [192.168.200.30]) by smtp.unix-experience.fr (Postfix) with ESMTPSA id 8FD8013C5; Wed, 10 Dec 2014 11:33:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=unix-experience.fr; s=uxselect; t=1418211194; bh=BYLP3xGadQg9j3hj0oWuGcdGI796UfE5k1b+uDbhPfI=; h=Date:From:Subject:To:Cc:In-Reply-To:References; b=MANkpad4q2qpQMQDrBizaGe8Eds2jpghCZSmQdGfYPxmijt4bAsU+v3msvi/OFIHY rVhTfcrHnsT3079Yrj1MNSeBZo/vwIMJsf0H/90ayLroVwUtTfW25hOUUnmvKcIBvj TK9/lf3TU+x7EyKkRHy8+FGk8qF2dElcwLWI/b5w= Mime-Version: 1.0 Date: Wed, 10 Dec 2014 11:33:14 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-ID: <1e19554bc0d4eb3e8dab74e2056b5ec4@mail.unix-experience.fr> X-Mailer: RainLoop/1.6.10.182 From: "=?utf-8?B?TG/Dr2MgQmxvdA==?=" Subject: Re: High Kernel Load with nfsv4 To: "Rick Macklem" In-Reply-To: <766911003.8048587.1418095910736.JavaMail.root@uoguelph.ca> References: <766911003.8048587.1418095910736.JavaMail.root@uoguelph.ca> Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Dec 2014 11:33:27 -0000 Hi Rick,=0AI'm trying NFSv3.=0ASome jails are starting very well but now = i have an issue with lockd after some minutes:=0A=0Anfs server 10.10.X.8:= /jails: lockd not responding=0Anfs server 10.10.X.8:/jails lockd is alive= again=0A=0AI look at mbuf, but i seems there is no problem.=0A=0AHere is= my rc.conf on server:=0A=0Anfs_server_enable=3D"YES"=0Anfsv4_server_enab= le=3D"YES"=0Anfsuserd_enable=3D"YES"=0Anfsd_server_flags=3D"-u -t -n 256"= =0Amountd_enable=3D"YES"=0Amountd_flags=3D"-r"=0Anfsuserd_flags=3D"-usert= imeout 0 -force 20"=0Arpcbind_enable=3D"YES"=0Arpc_lockd_enable=3D"YES"= =0Arpc_statd_enable=3D"YES"=0A=0AHere is the client:=0A=0Anfsuserd_enable= =3D"YES"=0Anfsuserd_flags=3D"-usertimeout 0 -force 20"=0Anfscbd_enable=3D= "YES"=0Arpc_lockd_enable=3D"YES"=0Arpc_statd_enable=3D"YES"=0A=0AHave you= got an idea ?=0A=0ARegards,=0A=0ALo=C3=AFc Blot,=0AUNIX Systems, Network= and Security Engineer=0Ahttp://www.unix-experience.fr=0A=0A9 d=C3=A9cemb= re 2014 04:31 "Rick Macklem" a =C3=A9crit: =0A> Lo= ic Blot wrote:=0A> =0A>> Hi rick,=0A>> =0A>> I waited 3 hours (no lag at = jail launch) and now I do: sysrc=0A>> memcached_flags=3D"-v -m 512"=0A>> = Command was very very slow...=0A>> =0A>> Here is a dd over NFS:=0A>> =0A>= > 601062912 bytes transferred in 21.060679 secs (28539579 bytes/sec)=0A> = =0A> Can you try the same read using an NFSv3 mount?=0A> (If it runs much= faster, you have probably been bitten by the ZFS=0A> "sequential vs rand= om" read heuristic which I've been told things=0A> NFS is doing "random" = reads without file handle affinity. File=0A> handle affinity is very hard= to do for NFSv4, so it isn't done.)=0A> =0A> rick=0A> =0A>> This is quit= e slow...=0A>> =0A>> You can found some nfsstat below (command isn't fini= shed yet)=0A>> =0A>> nfsstat -c -w 1=0A>> =0A>> GtAttr Lookup Rdlink Read= Write Rename Access Rddir=0A>> 0 0 0 0 0 0 0 0=0A>> 4 0 0 0 0 0 16 0=0A>= > 2 0 0 0 0 0 17 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0= 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 4 0 0 0 0 4 0=0A>> 0 0 0 0 0 0 0 0= =0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 4 0 = 0 0 0 0 3 0=0A>> 0 0 0 0 0 0 3 0=0A>> 37 10 0 8 0 0 14 1=0A>> 18 16 0 4 1= 2 4 0=0A>> 78 91 0 82 6 12 30 0=0A>> 19 18 0 2 2 4 2 0=0A>> 0 0 0 0 2 0 = 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> GtAttr Lookup Rdlink Read Write Rename Acce= ss Rddir=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0= =0A>> 0 1 0 0 0 0 1 0=0A>> 4 6 0 0 6 0 3 0=0A>> 2 0 0 0 0 0 0 0=0A>> 0 0 = 0 0 0 0 0 0=0A>> 1 0 0 0 0 0 0 0=0A>> 0 0 0 0 1 0 0 0=0A>> 0 0 0 0 0 0 0 = 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0= 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0= 0=0A>> 6 108 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> = GtAttr Lookup Rdlink Read Write Rename Access Rddir=0A>> 0 0 0 0 0 0 0 0= =0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 = 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 98 54 0 86 11 = 0 25 0=0A>> 36 24 0 39 25 0 10 1=0A>> 67 8 0 63 63 0 41 0=0A>> 34 0 0 35 = 34 0 0 0=0A>> 75 0 0 75 77 0 0 0=0A>> 34 0 0 35 35 0 0 0=0A>> 75 0 0 74 7= 6 0 0 0=0A>> 33 0 0 34 33 0 0 0=0A>> 0 0 0 0 5 0 0 0=0A>> 0 0 0 0 0 0 6 0= =0A>> 11 0 0 0 0 0 11 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 17 0 0 0 0 1 0=0A>> G= tAttr Lookup Rdlink Read Write Rename Access Rddir=0A>> 4 5 0 0 0 0 12 0= =0A>> 2 0 0 0 0 0 26 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0= 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 4 0 0 0 0 4= 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 4 = 0 0 0 0 0 2 0=0A>> 2 0 0 0 0 0 24 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0= 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> = 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> GtAttr Lookup Rdlink Read Write= Rename Access Rddir=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 4 0 0= 0 0 0 7 0=0A>> 2 1 0 0 0 0 1 0=0A>> 0 0 0 0 2 0 0 0=0A>> 0 0 0 0 0 0 0 0= =0A>> 0 0 0 0 6 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 = 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 = 0=0A>> 4 6 0 0 0 0 3 0=0A>> 0 0 0 0 0 0 0 0=0A>> 2 0 0 0 0 0 0 0=0A>> 0 0= 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0= 0=0A>> GtAttr Lookup Rdlink Read Write Rename Access Rddir=0A>> 0 0 0 0 = 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A= >> 0 0 0 0 0 0 0 0=0A>> 4 71 0 0 0 0 0 0=0A>> 0 1 0 0 0 0 0 0=0A>> 2 36 0= 0 0 0 1 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0= =0A>> 0 0 0 0 0 0 0 0=0A>> 1 0 0 0 0 0 1 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 = 0 0 0 0 0 0=0A>> 79 6 0 79 79 0 2 0=0A>> 25 0 0 25 26 0 6 0=0A>> 43 18 0 = 39 46 0 23 0=0A>> 36 0 0 36 36 0 31 0=0A>> 68 1 0 66 68 0 0 0=0A>> GtAttr= Lookup Rdlink Read Write Rename Access Rddir=0A>> 36 0 0 36 36 0 0 0=0A>= > 48 0 0 48 49 0 0 0=0A>> 20 0 0 20 20 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 3 = 14 0 1 0 0 11 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 0 0 0 0 0 0 0=0A>> 0 4 0 0 0 = 0 4 0=0A>> 0 0 0 0 0 0 0 0=0A>> 4 22 0 0 0 0 16 0=0A>> 2 0 0 0 0 0 23 0= =0A>> =0A>> Regards,=0A>> =0A>> Lo=C3=AFc Blot,=0A>> UNIX Systems, Networ= k and Security Engineer=0A>> http://www.unix-experience.fr=0A>> =0A>> 8 d= =C3=A9cembre 2014 09:36 "Lo=C3=AFc Blot" a= =0A>> =C3=A9crit:=0A>>> Hi Rick,=0A>>> I stopped the jails this week-end = and started it this morning, i'll=0A>>> give you some stats this week.=0A= >>> =0A>>> Here is my nfsstat -m output (with your rsize/wsize tweaks)=0A= >>> =0A>>> =0A>> =0A> nfsv4,tcp,resvport,hard,cto,sec=3Dsys,acdirmin=3D3,= acdirmax=3D60,acregmin=3D5,acregmax=3D60,nametimeo=3D60,negna=0A>>> =0A>>= =0A> etimeo=3D60,rsize=3D32768,wsize=3D32768,readdirsize=3D32768,readahe= ad=3D1,wcommitsize=3D773136,timeout=3D120,retra=0A>>> s=3D2147483647=0A>>= > =0A>>> On server side my disks are on a raid controller which show a 51= 2b=0A>>> volume and write performances=0A>>> are very honest (dd if=3D/de= v/zero of=3D/jails/test.dd bs=3D4096=0A>>> count=3D100000000 =3D> 450MBps= )=0A>>> =0A>>> Regards,=0A>>> =0A>>> Lo=C3=AFc Blot,=0A>>> UNIX Systems, = Network and Security Engineer=0A>>> http://www.unix-experience.fr=0A>>> = =0A>>> 5 d=C3=A9cembre 2014 15:14 "Rick Macklem" a= =0A>>> =C3=A9crit:=0A>>> =0A>>>> Loic Blot wrote:=0A>>>> =0A>>>>> Hi,=0A>= >>>> i'm trying to create a virtualisation environment based on jails.=0A= >>>>> Those jails are stored under a big ZFS pool on a FreeBSD 9.3=0A>>>>= > which=0A>>>>> export a NFSv4 volume. This NFSv4 volume was mounted on a= big=0A>>>>> hypervisor (2 Xeon E5v3 + 128GB memory and 8 ports (but only= 1=0A>>>>> was=0A>>>>> used at this time).=0A>>>>> =0A>>>>> The problem i= s simple, my hypervisors runs 6 jails (used 1% cpu=0A>>>>> and=0A>>>>> 10= GB RAM approximatively and less than 1MB bandwidth) and works=0A>>>>> fin= e at start but the system slows down and after 2-3 days become=0A>>>>> un= usable. When i look at top command i see 80-100% on system and=0A>>>>> co= mmands are very very slow. Many process are tagged with=0A>>>>> nfs_cl*.= =0A>>>> =0A>>>> To be honest, I would expect the slowness to be because o= f slow=0A>>>> response=0A>>>> from the NFSv4 server, but if you do:=0A>>>= > # ps axHl=0A>>>> on a client when it is slow and post that, it would gi= ve us some=0A>>>> more=0A>>>> information on where the client side proces= ses are sitting.=0A>>>> If you also do something like:=0A>>>> # nfsstat -= c -w 1=0A>>>> and let it run for a while, that should show you how many R= PCs are=0A>>>> being done and which ones.=0A>>>> =0A>>>> # nfsstat -m=0A>= >>> will show you what your mount is actually using.=0A>>>> The only moun= t option I can suggest trying is=0A>>>> "rsize=3D32768,wsize=3D32768",=0A= >>>> since some network environments have difficulties with 64K.=0A>>>> = =0A>>>> There are a few things you can try on the NFSv4 server side, if i= t=0A>>>> appears=0A>>>> that the clients are generating a large RPC load.= =0A>>>> - disabling the DRC cache for TCP by setting vfs.nfsd.cachetcp=3D= 0=0A>>>> - If the server is seeing a large write RPC load, then=0A>>>> "s= ync=3Ddisabled"=0A>>>> might help, although it does run a risk of data lo= ss when the=0A>>>> server=0A>>>> crashes.=0A>>>> Then there are a couple = of other ZFS related things (I'm not a ZFS=0A>>>> guy,=0A>>>> but these h= ave shown up on the mailing lists).=0A>>>> - make sure your volumes are 4= K aligned and ashift=3D12 (in case a=0A>>>> drive=0A>>>> that uses 4K sec= tors is pretending to be 512byte sectored)=0A>>>> - never run over 70-80%= full if write performance is an issue=0A>>>> - use a zil on an SSD with = good write performance=0A>>>> =0A>>>> The only NFSv4 thing I can tell you= is that it is known that ZFS's=0A>>>> algorithm for determining sequenti= al vs random I/O fails for NFSv4=0A>>>> during writing and this can be a = performance hit. The only=0A>>>> workaround=0A>>>> is to use NFSv3 mounts= , since file handle affinity apparently=0A>>>> fixes=0A>>>> the problem a= nd this is only done=20for NFSv3.=0A>>>> =0A>>>> rick=0A>>>> =0A>>>>> I s= aw that there are TSO issues with igb then i'm trying to=0A>>>>> disable= =0A>>>>> it with sysctl but the situation wasn't solved.=0A>>>>> =0A>>>>>= Someone has got ideas ? I can give you more informations if you=0A>>>>> = need.=0A>>>>> =0A>>>>> Thanks in advance.=0A>>>>> Regards,=0A>>>>> =0A>>>= >> Lo=C3=AFc Blot,=0A>>>>> UNIX Systems, Network and Security Engineer=0A= >>>>> http://www.unix-experience.fr=0A>>>>> _____________________________= __________________=0A>>>>> freebsd-fs@freebsd.org mailing list=0A>>>>> ht= tp://lists.freebsd.org/mailman/listinfo/freebsd-fs=0A>>>>> To unsubscribe= , send any mail to=0A>>>>> "freebsd-fs-unsubscribe@freebsd.org"=0A>>> =0A= >>> _______________________________________________=0A>>> freebsd-fs@free= bsd.org mailing list=0A>>> http://lists.freebsd.org/mailman/listinfo/free= bsd-fs=0A>>> To unsubscribe, send any mail to=0A>>> "freebsd-fs-unsubscri= be@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Wed Dec 10 12:56:51 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B39E0B12 for ; Wed, 10 Dec 2014 12:56:51 +0000 (UTC) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 509F4D21 for ; Wed, 10 Dec 2014 12:56:50 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqwEADVCiFSDaFve/2dsb2JhbABZg1hcgwHCfgqFJEoCgS0BAQEBAX2EAgEBAQMBAQEBICsgCwUWGAICDRkCKQEJJgYIAgUEARwEiA8IDb9wlzMBAQEBAQEEAQEBAQEBAQEBARiBJo4MAQEbATMHgjE+EYE2BYlAiAKDGoMjMIIsgjGHYoNfgX4egXAgMAeBBTl+AQEB X-IronPort-AV: E=Sophos;i="5.07,552,1413259200"; d="scan'208";a="175204873" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 10 Dec 2014 07:56:42 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 19608E7956; Wed, 10 Dec 2014 07:56:42 -0500 (EST) Date: Wed, 10 Dec 2014 07:56:42 -0500 (EST) From: Rick Macklem To: =?utf-8?B?TG/Dr2M=?= Blot Message-ID: <1280247055.9141285.1418216202088.JavaMail.root@uoguelph.ca> In-Reply-To: <1e19554bc0d4eb3e8dab74e2056b5ec4@mail.unix-experience.fr> Subject: Re: High Kernel Load with nfsv4 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Dec 2014 12:56:51 -0000 Loic Blot wrote: > Hi Rick, > I'm trying NFSv3. > Some jails are starting very well but now i have an issue with lockd > after some minutes: >=20 > nfs server 10.10.X.8:/jails: lockd not responding > nfs server 10.10.X.8:/jails lockd is alive again >=20 > I look at mbuf, but i seems there is no problem. >=20 Well, if you need locks to be visible across multiple clients, then I'm afraid you are stuck with using NFSv4 and the performance you get from it. (There is no way to do file handle affinity for NFSv4 because the read and write ops are buried in the compound RPC and not easily recognized.) If the locks don't need to be visible across multiple clients, I'd suggest trying the "nolockd" option with nfsv3. > Here is my rc.conf on server: >=20 > nfs_server_enable=3D"YES" > nfsv4_server_enable=3D"YES" > nfsuserd_enable=3D"YES" > nfsd_server_flags=3D"-u -t -n 256" > mountd_enable=3D"YES" > mountd_flags=3D"-r" > nfsuserd_flags=3D"-usertimeout 0 -force 20" > rpcbind_enable=3D"YES" > rpc_lockd_enable=3D"YES" > rpc_statd_enable=3D"YES" >=20 > Here is the client: >=20 > nfsuserd_enable=3D"YES" > nfsuserd_flags=3D"-usertimeout 0 -force 20" > nfscbd_enable=3D"YES" > rpc_lockd_enable=3D"YES" > rpc_statd_enable=3D"YES" >=20 > Have you got an idea ? >=20 > Regards, >=20 > Lo=C3=AFc Blot, > UNIX Systems, Network and Security Engineer > http://www.unix-experience.fr >=20 > 9 d=C3=A9cembre 2014 04:31 "Rick Macklem" a =C3=A9= crit: > > Loic Blot wrote: > >=20 > >> Hi rick, > >>=20 > >> I waited 3 hours (no lag at jail launch) and now I do: sysrc > >> memcached_flags=3D"-v -m 512" > >> Command was very very slow... > >>=20 > >> Here is a dd over NFS: > >>=20 > >> 601062912 bytes transferred in 21.060679 secs (28539579 bytes/sec) > >=20 > > Can you try the same read using an NFSv3 mount? > > (If it runs much faster, you have probably been bitten by the ZFS > > "sequential vs random" read heuristic which I've been told things > > NFS is doing "random" reads without file handle affinity. File > > handle affinity is very hard to do for NFSv4, so it isn't done.) > >=20 I was actually suggesting that you try the "dd" over nfsv3 to see how the performance compared with nfsv4. If you do that, please post the comparable results. Someday I would like to try and get ZFS's sequential vs random read heuristic modified and any info on what difference in performance that might make for NFS would be useful. rick > > rick > >=20 > >> This is quite slow... > >>=20 > >> You can found some nfsstat below (command isn't finished yet) > >>=20 > >> nfsstat -c -w 1 > >>=20 > >> GtAttr Lookup Rdlink Read Write Rename Access Rddir > >> 0 0 0 0 0 0 0 0 > >> 4 0 0 0 0 0 16 0 > >> 2 0 0 0 0 0 17 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 4 0 0 0 0 4 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 4 0 0 0 0 0 3 0 > >> 0 0 0 0 0 0 3 0 > >> 37 10 0 8 0 0 14 1 > >> 18 16 0 4 1 2 4 0 > >> 78 91 0 82 6 12 30 0 > >> 19 18 0 2 2 4 2 0 > >> 0 0 0 0 2 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> GtAttr Lookup Rdlink Read Write Rename Access Rddir > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 1 0 0 0 0 1 0 > >> 4 6 0 0 6 0 3 0 > >> 2 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 1 0 0 0 0 0 0 0 > >> 0 0 0 0 1 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 6 108 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> GtAttr Lookup Rdlink Read Write Rename Access Rddir > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 98 54 0 86 11 0 25 0 > >> 36 24 0 39 25 0 10 1 > >> 67 8 0 63 63 0 41 0 > >> 34 0 0 35 34 0 0 0 > >> 75 0 0 75 77 0 0 0 > >> 34 0 0 35 35 0 0 0 > >> 75 0 0 74 76 0 0 0 > >> 33 0 0 34 33 0 0 0 > >> 0 0 0 0 5 0 0 0 > >> 0 0 0 0 0 0 6 0 > >> 11 0 0 0 0 0 11 0 > >> 0 0 0 0 0 0 0 0 > >> 0 17 0 0 0 0 1 0 > >> GtAttr Lookup Rdlink Read Write Rename Access Rddir > >> 4 5 0 0 0 0 12 0 > >> 2 0 0 0 0 0 26 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 4 0 0 0 0 4 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 4 0 0 0 0 0 2 0 > >> 2 0 0 0 0 0 24 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> GtAttr Lookup Rdlink Read Write Rename Access Rddir > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 4 0 0 0 0 0 7 0 > >> 2 1 0 0 0 0 1 0 > >> 0 0 0 0 2 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 6 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 4 6 0 0 0 0 3 0 > >> 0 0 0 0 0 0 0 0 > >> 2 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> GtAttr Lookup Rdlink Read Write Rename Access Rddir > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 4 71 0 0 0 0 0 0 > >> 0 1 0 0 0 0 0 0 > >> 2 36 0 0 0 0 1 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 1 0 0 0 0 0 1 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 79 6 0 79 79 0 2 0 > >> 25 0 0 25 26 0 6 0 > >> 43 18 0 39 46 0 23 0 > >> 36 0 0 36 36 0 31 0 > >> 68 1 0 66 68 0 0 0 > >> GtAttr Lookup Rdlink Read Write Rename Access Rddir > >> 36 0 0 36 36 0 0 0 > >> 48 0 0 48 49 0 0 0 > >> 20 0 0 20 20 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 3 14 0 1 0 0 11 0 > >> 0 0 0 0 0 0 0 0 > >> 0 0 0 0 0 0 0 0 > >> 0 4 0 0 0 0 4 0 > >> 0 0 0 0 0 0 0 0 > >> 4 22 0 0 0 0 16 0 > >> 2 0 0 0 0 0 23 0 > >>=20 > >> Regards, > >>=20 > >> Lo=C3=AFc Blot, > >> UNIX Systems, Network and Security Engineer > >> http://www.unix-experience.fr > >>=20 > >> 8 d=C3=A9cembre 2014 09:36 "Lo=C3=AFc Blot" a > >> =C3=A9crit: > >>> Hi Rick, > >>> I stopped the jails this week-end and started it this morning, > >>> i'll > >>> give you some stats this week. > >>>=20 > >>> Here is my nfsstat -m output (with your rsize/wsize tweaks) > >>>=20 > >>>=20 > >>=20 > > nfsv4,tcp,resvport,hard,cto,sec=3Dsys,acdirmin=3D3,acdirmax=3D60,acregm= in=3D5,acregmax=3D60,nametimeo=3D60,negna > >>>=20 > >>=20 > > etimeo=3D60,rsize=3D32768,wsize=3D32768,readdirsize=3D32768,readahead= =3D1,wcommitsize=3D773136,timeout=3D120,retra > >>> s=3D2147483647 > >>>=20 > >>> On server side my disks are on a raid controller which show a > >>> 512b > >>> volume and write performances > >>> are very honest (dd if=3D/dev/zero of=3D/jails/test.dd bs=3D4096 > >>> count=3D100000000 =3D> 450MBps) > >>>=20 > >>> Regards, > >>>=20 > >>> Lo=C3=AFc Blot, > >>> UNIX Systems, Network and Security Engineer > >>> http://www.unix-experience.fr > >>>=20 > >>> 5 d=C3=A9cembre 2014 15:14 "Rick Macklem" a > >>> =C3=A9crit: > >>>=20 > >>>> Loic Blot wrote: > >>>>=20 > >>>>> Hi, > >>>>> i'm trying to create a virtualisation environment based on > >>>>> jails. > >>>>> Those jails are stored under a big ZFS pool on a FreeBSD 9.3 > >>>>> which > >>>>> export a NFSv4 volume. This NFSv4 volume was mounted on a big > >>>>> hypervisor (2 Xeon E5v3 + 128GB memory and 8 ports (but only 1 > >>>>> was > >>>>> used at this time). > >>>>>=20 > >>>>> The problem is simple, my hypervisors runs 6 jails (used 1% cpu > >>>>> and > >>>>> 10GB RAM approximatively and less than 1MB bandwidth) and works > >>>>> fine at start but the system slows down and after 2-3 days > >>>>> become > >>>>> unusable. When i look at top command i see 80-100% on system > >>>>> and > >>>>> commands are very very slow. Many process are tagged with > >>>>> nfs_cl*. > >>>>=20 > >>>> To be honest, I would expect the slowness to be because of slow > >>>> response > >>>> from the NFSv4 server, but if you do: > >>>> # ps axHl > >>>> on a client when it is slow and post that, it would give us some > >>>> more > >>>> information on where the client side processes are sitting. > >>>> If you also do something like: > >>>> # nfsstat -c -w 1 > >>>> and let it run for a while, that should show you how many RPCs > >>>> are > >>>> being done and which ones. > >>>>=20 > >>>> # nfsstat -m > >>>> will show you what your mount is actually using. > >>>> The only mount option I can suggest trying is > >>>> "rsize=3D32768,wsize=3D32768", > >>>> since some network environments have difficulties with 64K. > >>>>=20 > >>>> There are a few things you can try on the NFSv4 server side, if > >>>> it > >>>> appears > >>>> that the clients are generating a large RPC load. > >>>> - disabling the DRC cache for TCP by setting vfs.nfsd.cachetcp=3D0 > >>>> - If the server is seeing a large write RPC load, then > >>>> "sync=3Ddisabled" > >>>> might help, although it does run a risk of data loss when the > >>>> server > >>>> crashes. > >>>> Then there are a couple of other ZFS related things (I'm not a > >>>> ZFS > >>>> guy, > >>>> but these have shown up on the mailing lists). > >>>> - make sure your volumes are 4K aligned and ashift=3D12 (in case a > >>>> drive > >>>> that uses 4K sectors is pretending to be 512byte sectored) > >>>> - never run over 70-80% full if write performance is an issue > >>>> - use a zil on an SSD with good write performance > >>>>=20 > >>>> The only NFSv4 thing I can tell you is that it is known that > >>>> ZFS's > >>>> algorithm for determining sequential vs random I/O fails for > >>>> NFSv4 > >>>> during writing and this can be a performance hit. The only > >>>> workaround > >>>> is to use NFSv3 mounts, since file handle affinity apparently > >>>> fixes > >>>> the problem and this is only done for NFSv3. > >>>>=20 > >>>> rick > >>>>=20 > >>>>> I saw that there are TSO issues with igb then i'm trying to > >>>>> disable > >>>>> it with sysctl but the situation wasn't solved. > >>>>>=20 > >>>>> Someone has got ideas ? I can give you more informations if you > >>>>> need. > >>>>>=20 > >>>>> Thanks in advance. > >>>>> Regards, > >>>>>=20 > >>>>> Lo=C3=AFc Blot, > >>>>> UNIX Systems, Network and Security Engineer > >>>>> http://www.unix-experience.fr > >>>>> _______________________________________________ > >>>>> freebsd-fs@freebsd.org mailing list > >>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>>>> To unsubscribe, send any mail to > >>>>> "freebsd-fs-unsubscribe@freebsd.org" > >>>=20 > >>> _______________________________________________ > >>> freebsd-fs@freebsd.org mailing list > >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>> To unsubscribe, send any mail to > >>> "freebsd-fs-unsubscribe@freebsd.org" >=20 From owner-freebsd-fs@FreeBSD.ORG Wed Dec 10 13:45:32 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 01D25ACB; Wed, 10 Dec 2014 13:45:32 +0000 (UTC) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 903702D9; Wed, 10 Dec 2014 13:45:31 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqYEAClOiFSDaFve/2dsb2JhbABZhDSDAch1AoEtAQEBAQF9hAIBAQEDASMEUgUWDgoCAg0ZAlkGiEMIwBqXNgEBAQEBBQEBAQEBAQEbgSaOKTQHgm+BRwWJQJ8NgX4egXAggXV+AQEB X-IronPort-AV: E=Sophos;i="5.07,552,1413259200"; d="scan'208";a="175216414" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 10 Dec 2014 08:45:11 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 74B49B40E0; Wed, 10 Dec 2014 08:45:10 -0500 (EST) Date: Wed, 10 Dec 2014 08:45:10 -0500 (EST) From: Rick Macklem To: Konstantin Belousov Message-ID: <1028205685.9208372.1418219110466.JavaMail.root@uoguelph.ca> In-Reply-To: <20141210092428.GE97072@kib.kiev.ua> Subject: Re: fuse dirent bug??? MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 7.2.6_GA_2926 (ZimbraWebClient - FF3.0 (Win)/7.2.6_GA_2926) Cc: FreeBSD Filesystems , George Neville-Neil X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Dec 2014 13:45:32 -0000 Kostik wrote: > On Tue, Dec 09, 2014 at 10:41:30PM -0500, Rick Macklem wrote: > > Hi, > > > > While looking at the fuse code to change it to use a new > > "struct dirent", I spotted this line, which doesn't look > > correct. > > > > Line 358 of sys/fs/fuse/fuse_internal.c: > > ((char *)cookediov->base)[bytesavail] = '\0'; > > - I think this is intended to null terminate the name, > > since it comes right after the memcpy() of the file name. > > However, bytesavail is the value returned by GENERIC_DIRSIZ(), > > which means [bytesavail] after "cookediov->base" would be the > > first byte after the "struct dirent" (including the space for > > null termination and padding. > > > > If I'm correct, I think this line can be replaced by: > > de->d_name[fudge->namelen] = '\0'; > > which would be the byte after the name in the structure. > > > > Also, although I think the first argument to the memcpy() call > > just above this is correct, it is complex/convoluted. > > Wouldn't just writing "memcpy(de->d_name, ..." make it > > more readable? > > > > Anyone out there familiar with fuse able to look at/test this? > > No, I am not familiar with fuse. Still, I think you are right. > OTOH, it is probably very rare to result in the actual override > of the last byte after the buffer, since dirents have to fill the > buffer to the last byte. > Ok, I took a closer look at the code and it seems that this bug won't cause any problems. 1 - The calculation of the size of cookediov->base is bogus (it uses fuse_dirent instead of dirent), but since fuse_dirent is larger, the error makes the buffer too big. --> writing '\0' one byte past the entry will never go past the end of the buffer This will need to be fixed when struct dirent becomes larger. I'll include it in the patch I am working on. 2 - fiov_refresh() bzero()s the entire buffer and is called within the loop, so the name will always be null terminated. > One additional note. The getdirentries(2) specifies that the name > must > be null-terminated. But sys/dirent.h comment claims that the whole > padding must be zeroed. I did not tracked the source of the buffer in > fuse_internal_readdir(), so my question is whether the buffer is > zeroed > before filled. If not, padding must be cleared. > Well, I am aware of that comment and the NFS client has always done that. However, from my recent glances at the code for other file system's XX_readdir()s, most don't bother. (I think UFS, NFS and fuse are the only ones that do 0 the pad bytes.) Most (and I think ZFS is one of these) only put a single '\0' after the name and I don't think they bzero() the buffer like fuse apparently does. Because of the above, the copy_dirent32() function I am using doesn't bother to 0 the pad bytes either and I haven't seen problems during minimal testing. Zeroing the pad bytes could easily be added. Btw, most file systems also ignore the "dirent shouldn't cross a 512 byte block boundary either. (At one time I thought this was a requirement, since UFS did it, but my guess is that userland code doesn't care.) Most file systems (except UFS and NFS) just fill in "struct dirent"s packed and return fewer bytes than requested when no more "struct dirent"s will fit in the requested buffer size. rick From owner-freebsd-fs@FreeBSD.ORG Wed Dec 10 14:36:47 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8F7BFCE6 for ; Wed, 10 Dec 2014 14:36:47 +0000 (UTC) Received: from smtp.unix-experience.fr (195-154-176-227.rev.poneytelecom.eu [195.154.176.227]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4C970AD0 for ; Wed, 10 Dec 2014 14:36:46 +0000 (UTC) Received: from smtp.unix-experience.fr (unknown [192.168.200.21]) by smtp.unix-experience.fr (Postfix) with ESMTP id 3FC8F16F5; Wed, 10 Dec 2014 14:36:43 +0000 (UTC) X-Virus-Scanned: scanned by unix-experience.fr Received: from smtp.unix-experience.fr ([192.168.200.21]) by smtp.unix-experience.fr (smtp.unix-experience.fr [192.168.200.21]) (amavisd-new, port 10024) with ESMTP id RhkjzAQkeLCP; Wed, 10 Dec 2014 14:36:39 +0000 (UTC) Received: from mail.unix-experience.fr (repo.unix-experience.fr [192.168.200.30]) by smtp.unix-experience.fr (Postfix) with ESMTPSA id B4CB016E9; Wed, 10 Dec 2014 14:36:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=unix-experience.fr; s=uxselect; t=1418222199; bh=/4lRSlBqVdTd3ChMR4nT3MYxcoYhJQlMfjGM1cNXMYE=; h=Date:From:Subject:To:Cc:In-Reply-To:References; b=Qr7MC2GTQxZTX6UUdb545Nyr26BpFRDeJlc+/urrUMgRkJiIJBiGTFdJeJA6z/JrB hwuRHfOJyLoB0x6kpcHU7fEJJ+ejSCqRcGc3EFG6+JBd8OHVAtr6bk8cRnwq2Kv/Mu ku1y6jfGZX5SwtQezRsF5rtZHSAk+byUlAp2g6PA= Mime-Version: 1.0 Date: Wed, 10 Dec 2014 14:36:39 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-ID: X-Mailer: RainLoop/1.6.10.182 From: "=?utf-8?B?TG/Dr2MgQmxvdA==?=" Subject: Re: High Kernel Load with nfsv4 To: "Rick Macklem" In-Reply-To: <1280247055.9141285.1418216202088.JavaMail.root@uoguelph.ca> References: <1280247055.9141285.1418216202088.JavaMail.root@uoguelph.ca> Cc: freebsd-fs@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Dec 2014 14:36:47 -0000 Hi Rick,=0Athanks for your suggestion.=0AFor my locking bug, rpc.lockd is= stucked in rpcrecv state on the server. kill -9 doesn't affect the proce= ss, it's blocked.... (State: Ds)=0A=0A=0Afor the performances=0A=0ANFSv3:= 60Mbps=0ANFSv4: 45Mbps=0ARegards,=0A=0ALo=C3=AFc Blot,=0AUNIX Systems, N= etwork and Security Engineer=0Ahttp://www.unix-experience.fr=0A=0A10 d=C3= =A9cembre 2014 13:56 "Rick Macklem" a =C3=A9crit: = =0A> Loic Blot wrote:=0A> =0A>> Hi Rick,=0A>> I'm trying NFSv3.=0A>> Some= jails are starting very well but now i have an issue with lockd=0A>> aft= er some minutes:=0A>> =0A>> nfs server 10.10.X.8:/jails: lockd not respon= ding=0A>> nfs server 10.10.X.8:/jails lockd is alive again=0A>> =0A>> I l= ook at mbuf, but i seems there is no problem.=0A> =0A> Well, if you need = locks to be visible across multiple clients, then=0A> I'm afraid you are = stuck with using NFSv4 and the performance you get=0A> from it. (There is= no way to do file handle affinity for NFSv4 because=0A> the read and wri= te ops are buried in the compound RPC and not easily=0A> recognized.)=0A>= =0A> If the locks don't need to be visible across multiple clients, I'd= =0A> suggest trying the "nolockd" option with nfsv3.=0A> =0A>> Here is my= rc.conf on server:=0A>> =0A>> nfs_server_enable=3D"YES"=0A>> nfsv4_serve= r_enable=3D"YES"=0A>> nfsuserd_enable=3D"YES"=0A>> nfsd_server_flags=3D"-= u -t -n 256"=0A>> mountd_enable=3D"YES"=0A>> mountd_flags=3D"-r"=0A>> nfs= userd_flags=3D"-usertimeout 0 -force 20"=0A>> rpcbind_enable=3D"YES"=0A>>= rpc_lockd_enable=3D"YES"=0A>> rpc_statd_enable=3D"YES"=0A>> =0A>> Here i= s the client:=0A>> =0A>> nfsuserd_enable=3D"YES"=0A>> nfsuserd_flags=3D"-= usertimeout 0 -force 20"=0A>> nfscbd_enable=3D"YES"=0A>> rpc_lockd_enable= =3D"YES"=0A>> rpc_statd_enable=3D"YES"=0A>> =0A>> Have you got an idea ?= =0A>> =0A>> Regards,=0A>> =0A>> Lo=C3=AFc Blot,=0A>> UNIX Systems, Networ= k and Security Engineer=0A>> http://www.unix-experience.fr=0A>> =0A>> 9 d= =C3=A9cembre 2014 04:31 "Rick Macklem" a =C3=A9cri= t:=0A>>> Loic Blot wrote:=0A>>> =0A>>>> Hi rick,=0A>>>> =0A>>>> I waited = 3 hours (no lag at jail launch) and now I do: sysrc=0A>>>> memcached_flag= s=3D"-v -m 512"=0A>>>> Command was very very slow...=0A>>>> =0A>>>> Here = is a dd over NFS:=0A>>>> =0A>>>> 601062912 bytes transferred in 21.060679= secs (28539579 bytes/sec)=0A>>> =0A>>> Can you try the same read using a= n NFSv3 mount?=0A>>> (If it runs much faster, you have probably been bitt= en by the ZFS=0A>>> "sequential vs random" read heuristic which I've been= told things=0A>>> NFS is doing "random" reads without file handle affini= ty. File=0A>>> handle affinity is very hard to do for NFSv4, so it isn't = done.)=0A>>> =0A> =0A> I was actually suggesting that you try the "dd" ov= er nfsv3 to see how=0A> the performance compared with nfsv4. If you do th= at, please post the=0A> comparable results.=0A> =0A> Someday I would like= to try and get ZFS's sequential vs random read=0A> heuristic modified an= d any info on what difference in performance that=0A> might make for NFS = would be useful.=0A> =0A> rick=0A> =0A>>> rick=0A>>> =0A>>>> This is quit= e slow...=0A>>>> =0A>>>> You can found some nfsstat below (command isn't = finished yet)=0A>>>> =0A>>>> nfsstat -c -w 1=0A>>>> =0A>>>> GtAttr Lookup= Rdlink Read Write Rename Access Rddir=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 4 0 = 0 0 0 0 16 0=0A>>>> 2 0 0 0 0 0 17 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 = 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 4 0 0 0 = 0 4 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 = 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 4 0 0 0 0 0 3 0=0A>>>> 0 0 0 0 0 0 3 0=0A= >>>> 37 10 0 8 0 0 14 1=0A>>>> 18 16 0 4 1 2 4 0=0A>>>> 78 91 0 82 6 12 3= 0 0=0A>>>> 19 18 0 2 2 4 2 0=0A>>>> 0 0 0 0 2 0 0 0=0A>>>> 0 0 0 0 0 0 0 = 0=0A>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir=0A>>>> 0 0 0= 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 1 0 0 0= 0 1 0=0A>>>> 4 6 0 0 6 0 3 0=0A>>>> 2 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0= 0=0A>>>> 1 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 1 0 0 0=0A>>>> 0 0 0 0 0 0 0 0= =0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>= >>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> = 0 0 0 0 0 0 0 0=0A>>>> 6 108 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 = 0 0 0 0 0 0 0=0A>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir= =0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>= >>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> = 0 0 0 0 0 0 0 0=0A>>>> 98 54 0 86 11 0 25 0=0A>>>> 36 24 0 39 25 0 10 1= =0A>>>> 67 8 0 63 63 0 41 0=0A>>>> 34 0 0 35 34 0 0 0=0A>>>> 75 0 0 75 77= 0 0 0=0A>>>> 34 0 0 35 35 0 0 0=0A>>>> 75 0 0 74 76 0 0 0=0A>>>> 33 0 0 = 34 33 0 0 0=0A>>>> 0 0 0 0 5 0 0 0=0A>>>> 0 0 0 0 0 0 6 0=0A>>>> 11 0 0 0= 0 0 11 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 17 0 0 0 0 1 0=0A>>>> GtAttr Lo= okup Rdlink Read Write Rename Access Rddir=0A>>>> 4 5 0 0 0 0 12 0=0A>>>>= 2 0 0 0 0 0 26 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 = 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 4 0 = 0 0 0 4 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 = 0 0 0=0A>>>> 4 0 0 0 0 0 2 0=0A>>>> 2 0 0 0 0 0 24 0=0A>>>> 0 0 0 0 0 0 0= 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0= =0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>= >>> GtAttr Lookup Rdlink Read Write Rename Access Rddir=0A>>>> 0 0 0 0 0 = 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 4 0 0 0 0 0 7 0=0A>>>> 2 1 0 0 0 0 1 = 0=0A>>>> 0 0 0 0 2 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 6 0 0 0=0A= >>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>>= 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 4 6= 0 0 0 0 3 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 2 0 0 0 0 0 0 0=0A>>>> 0 0 0 0= 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0= 0 0=0A>>>> GtAttr Lookup Rdlink Read Write Rename Access Rddir=0A>>>> 0 = 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 = 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 4 71 0 0 0 0 0 0=0A>>>> 0 1 0 0 0= 0 0 0=0A>>>> 2 36 0 0 0 0 1 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 = 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 1 0 0 0 0 0 1 0= =0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 79 6 0 79 79 0 2 0= =0A>>>> 25 0 0 25 26 0 6 0=0A>>>> 43 18 0 39 46 0 23 0=0A>>>> 36 0 0 36 3= 6 0 31 0=0A>>>> 68 1 0 66 68 0 0 0=0A>>>> GtAttr Lookup Rdlink Read Write= Rename Access Rddir=0A>>>> 36 0 0 36 36 0 0 0=0A>>>> 48 0 0 48 49 0 0 0= =0A>>>> 20 0 0 20 20 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 3 14 0 1 0 0 11 = 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 0 0 0 0 0 0 0=0A>>>> 0 4 0 0 0 0 4 0=0A= >>>> 0 0 0 0 0 0 0 0=0A>>>> 4 22 0 0 0 0 16 0=0A>>>> 2 0 0 0 0 0 23 0=0A>= >>> =0A>>>> Regards,=0A>>>> =0A>>>> Lo=C3=AFc Blot,=0A>>>> UNIX Systems, = Network and Security Engineer=0A>>>> http://www.unix-experience.fr=0A>>>>= =0A>>>> 8 d=C3=A9cembre 2014 09:36 "Lo=C3=AFc Blot" a=0A>>>> =C3=A9crit:=0A>>>>> Hi Rick,=0A>>>>> I stopped the ja= ils this week-end and started it this morning,=0A>>>>> i'll=0A>>>>> give = you some stats this week.=0A>>>>> =0A>>>>> Here is my nfsstat -m output (= with your rsize/wsize tweaks)=0A>>>>> =0A>>>>> =0A>>>> =0A>>> =0A>> =0A> = nfsv4,tcp,resvport,hard,cto,sec=3Dsys,acdirmin=3D3,acdirmax=3D60,acregmin= =3D5,acregmax=3D60,nametimeo=3D60,negna=0A>>>>> =0A>>>> =0A>>> =0A>> =0A>= etimeo=3D60,rsize=3D32768,wsize=3D32768,readdirsize=3D32768,readahead=3D= 1,wcommitsize=3D773136,timeout=3D120,retra=0A>>>>> s=3D2147483647=0A>>>>>= =0A>>>>> On server side my disks are on a raid controller which show a= =0A>>>>> 512b=0A>>>>> volume and write performances=0A>>>>> are very hone= st (dd if=3D/dev/zero of=3D/jails/test.dd bs=3D4096=0A>>>>> count=3D10000= 0000 =3D> 450MBps)=0A>>>>> =0A>>>>> Regards,=0A>>>>> =0A>>>>> Lo=C3=AFc B= lot,=0A>>>>> UNIX Systems, Network and Security Engineer=0A>>>>> http://w= ww.unix-experience.fr=0A>>>>> =0A>>>>> 5 d=C3=A9cembre 2014 15:14 "Rick M= acklem" a=0A>>>>> =C3=A9crit:=0A>>>>> =0A>>>>>> Lo= ic Blot wrote:=0A>>>>>> =0A>>>>>>> Hi,=0A>>>>>>> i'm trying to create a v= irtualisation environment based on=0A>>>>>>> jails.=0A>>>>>>> Those jails= are stored under a big ZFS pool on a FreeBSD 9.3=0A>>>>>>> which=0A>>>>>= >> export a NFSv4 volume. This NFSv4 volume was mounted on a big=0A>>>>>>= > hypervisor (2 Xeon E5v3 + 128GB memory and 8 ports (but only 1=0A>>>>>>= > was=0A>>>>>>> used at this time).=0A>>>>>>> =0A>>>>>>> The problem is s= imple, my hypervisors runs 6 jails (used 1% cpu=0A>>>>>>> and=0A>>>>>>> 1= 0GB RAM approximatively and less than 1MB bandwidth) and works=0A>>>>>>> = fine at start but the system slows down and after 2-3 days=0A>>>>>>> beco= me=0A>>>>>>> unusable. When i look at top command i see 80-100% on system= =0A>>>>>>> and=0A>>>>>>> commands are very very slow. Many process are ta= gged with=0A>>>>>>> nfs_cl*.=0A>>>>>> =0A>>>>>> To be honest, I would exp= ect the slowness to be because of slow=0A>>>>>> response=0A>>>>>> from th= e NFSv4 server, but if you do:=0A>>>>>> # ps axHl=0A>>>>>> on a client wh= en it is slow and post that, it would give us some=0A>>>>>> more=0A>>>>>>= information on where the client side processes are sitting.=0A>>>>>> If = you also do something like:=0A>>>>>> # nfsstat -c -w 1=0A>>>>>> and let i= t run for a while, that should show you how many RPCs=0A>>>>>> are=0A>>>>= >> being done and which ones.=0A>>>>>> =0A>>>>>> # nfsstat -m=0A>>>>>> wi= ll show you what your mount is actually using.=0A>>>>>> The only mount op= tion I can suggest trying is=0A>>>>>> "rsize=3D32768,wsize=3D32768",=0A>>= >>>> since some network environments have difficulties with 64K.=0A>>>>>>= =0A>>>>>> There are a few things you can try on the NFSv4 server side, i= f=0A>>>>>> it=0A>>>>>> appears=0A>>>>>> that the clients are generating a= large RPC load.=0A>>>>>> - disabling the DRC cache for TCP by setting vf= s.nfsd.cachetcp=3D0=0A>>>>>> - If the server is seeing a large write RPC = load, then=0A>>>>>> "sync=3Ddisabled"=0A>>>>>> might help, although it do= es run a risk of data loss when the=0A>>>>>> server=0A>>>>>> crashes.=0A>= >>>>> Then there are a couple of other ZFS related things (I'm not a=0A>>= >>>> ZFS=0A>>>>>> guy,=0A>>>>>> but these have shown up on the mailing li= sts).=0A>>>>>> - make sure your volumes are 4K aligned and ashift=3D12 (i= n case a=0A>>>>>> drive=0A>>>>>> that uses 4K sectors is pretending to be= 512byte sectored)=0A>>>>>> - never run over 70-80% full if write perform= ance is an issue=0A>>>>>> - use a zil on an SSD with good write performan= ce=0A>>>>>> =0A>>>>>> The only NFSv4 thing I can tell you is that it is k= nown that=0A>>>>>> ZFS's=0A>>>>>> algorithm for determining sequential vs= random I/O fails for=0A>>>>>> NFSv4=0A>>>>>> during writing and this can= be a performance hit. The only=0A>>>>>> workaround=0A>>>>>> is to use NF= Sv3 mounts, since file handle affinity apparently=0A>>>>>> fixes=0A>>>>>>= the problem and this is only done for NFSv3.=0A>>>>>> =0A>>>>>> rick=0A>= >>>>> =0A>>>>>>> I saw that there are TSO issues with igb then i'm trying= to=0A>>>>>>> disable=0A>>>>>>> it with sysctl but the situation wasn't s= olved.=0A>>>>>>> =0A>>>>>>> Someone has got ideas ? I can give you more i= nformations if you=0A>>>>>>> need.=0A>>>>>>> =0A>>>>>>> Thanks in advance= .=0A>>>>>>> Regards,=0A>>>>>>> =0A>>>>>>> Lo=C3=AFc Blot,=0A>>>>>>> UNIX = Systems, Network and Security Engineer=0A>>>>>>> http://www.unix-experien= ce.fr=0A>>>>>>> _______________________________________________=0A>>>>>>>= freebsd-fs@freebsd.org mailing list=0A>>>>>>> http://lists.freebsd.org/m= ailman/listinfo/freebsd-fs=0A>>>>>>> To unsubscribe, send any mail to=0A>= >>>>>> "freebsd-fs-unsubscribe@freebsd.org"=0A>>>>> =0A>>>>> ____________= ___________________________________=0A>>>>> freebsd-fs@freebsd.org mailin= g list=0A>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs=0A>>>= >> To unsubscribe, send any mail to=0A>>>>> "freebsd-fs-unsubscribe@freeb= sd.org" From owner-freebsd-fs@FreeBSD.ORG Wed Dec 10 20:58:05 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 039D58C1 for ; Wed, 10 Dec 2014 20:58:05 +0000 (UTC) Received: from mail-oi0-x236.google.com (mail-oi0-x236.google.com [IPv6:2607:f8b0:4003:c06::236]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B8E051A9 for ; Wed, 10 Dec 2014 20:58:04 +0000 (UTC) Received: by mail-oi0-f54.google.com with SMTP id u20so2647971oif.27 for ; Wed, 10 Dec 2014 12:58:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=czHjqwE4NxQHLPjD3ga12hgz7uWY7zO9kpMIe4u83WY=; b=AdXbZziHoAoMBdQkWymyZz0kVnjg6LoreG05mLrVvBMc/GnLg7eWNNvDw8/FLZ7vXb dsISrKjq/cvQjHCrRVcKkKxe6Od/CXOYlxn9sg19Q5WE2LsswCYZYXzxoBUBK8xxJids h2lwQbj87Cwwm3FF1KvsMiC3DS0rC/GCw6q3ArxvD20IwTilP6MvUz/aqweJYLfxOp7l H2lx9ta1vdjNBNo/sJO+Sr+3/OSpXyQ8pPAT1bzq38HUAQMgv+WAmhVQYWiiHJas6sJE JZlyWkLRag4nj6eNQEN1TT8rAv3bpD3lN51HSssmLseBzC6aIfy2P0XVvQrkxaZC967E miaA== MIME-Version: 1.0 X-Received: by 10.60.67.7 with SMTP id j7mr688589oet.80.1418245084062; Wed, 10 Dec 2014 12:58:04 -0800 (PST) Received: by 10.76.0.138 with HTTP; Wed, 10 Dec 2014 12:58:03 -0800 (PST) In-Reply-To: <54879274.5010001@niksun.com> References: <54879274.5010001@niksun.com> Date: Wed, 10 Dec 2014 15:58:03 -0500 Message-ID: Subject: Re: ZDB -Z? From: Zaphod Beeblebrox To: Andrew Heybey Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Dec 2014 20:58:05 -0000 I tried applying the patch to 10.1 and to -CURRENT (11) ... and I get: [2:9:309]root@test-c1:/usr/src/cddl/contrib/opensolaris/cmd/zdb> patch <~dgilbert/zdb-z-patch Hmm... Looks like a unified diff to me... The text leading up to this was: -------------------------- |diff --git a/cddl/contrib/opensolaris/cmd/zdb/zdb.c b/cddl/contrib/opensolaris/cmd/zdb/zdb.c |index c265c99..bf43ea1 100644 |--- a/cddl/contrib/opensolaris/cmd/zdb/zdb.c |+++ b/cddl/contrib/opensolaris/cmd/zdb/zdb.c -------------------------- Patching file zdb.c using Plan A... Hunk #1 succeeded at 59. Hunk #2 succeeded at 3085 with fuzz 1 (offset 339 lines). Hunk #3 succeeded at 3305 with fuzz 2 (offset 339 lines). Hunk #4 succeeded at 3329 with fuzz 2 (offset 339 lines). Hunk #5 failed at 3408. Hunk #6 failed at 3644. Hunk #7 failed at 3659. Hunk #8 failed at 3718. Hunk #9 failed at 3849. 5 out of 9 hunks failed--saving rejects to zdb.c.rej done ... what version of FreeBSD is this patch against? On Tue, Dec 9, 2014 at 7:23 PM, Andrew Heybey wrote: > On 11/24/14 1:49 PM, Zaphod Beeblebrox wrote: > > I'm reading about someone else's recovery of files from a damaged ZFS > > partition. He claims to have added (possibly to opensolaris or whatnot) > an > > argument to zdb '-Z' ... which operates somewhat like -R, but which > > highlights what parts of the region are on what physical disks, and which > > are parity. > > > > Has anyone patched this into FreeBSD? > > Sorry for the late reply, I am behind on my mailing list reading. > > I assume you were looking at this post: > > http://mbruning.blogspot.com/2009_12_01_archive.html > > I was also recently trying to recover data in a ZFS pool. I made an ugly > attempt at -Z for zdb. It will not work for anything but RAIDZ pools (I > tried it on one containing two 6-disk raidz1 vdevs). The diff (against > FreeBSD 10) is in this email. > > I copy-pasted the static function vdev_raidz_map() out of libzfs since it > is static and not callable externally. Not very tasteful but it worked for > me. > > andrew > > commit 86ab9e2dab7e76dcdf527d2aa6b84a2fe429ee28 > Author: Andrew Heybey > Date: Tue Nov 18 15:00:57 2014 -0500 > > zdb: Add -Z flag like > http://mbruning.blogspot.com/2009/12/zfs-raidz-data-walk.html > > diff --git a/cddl/contrib/opensolaris/cmd/zdb/zdb.c > b/cddl/contrib/opensolaris/cmd/zdb/zdb.c > index c265c99..bf43ea1 100644 > --- a/cddl/contrib/opensolaris/cmd/zdb/zdb.c > +++ b/cddl/contrib/opensolaris/cmd/zdb/zdb.c > @@ -59,6 +59,7 @@ > #include > #include > #include > +#include > #undef ZFS_MAXNAMELEN > #undef verify > #include > @@ -2745,6 +2746,168 @@ zdb_dump_block(char *label, void *buf, uint64_t > size, int flags) > } > } > > + > +typedef struct raidz_col { > + uint64_t rc_devidx; /* child device index for I/O */ > + uint64_t rc_offset; /* device offset */ > + uint64_t rc_size; /* I/O size */ > + void *rc_data; /* I/O data */ > + void *rc_gdata; /* used to store the "good" > version */ > + int rc_error; /* I/O error for this device */ > + uint8_t rc_tried; /* Did we attempt this I/O column? > */ > + uint8_t rc_skipped; /* Did we skip this I/O column? */ > +} raidz_col_t; > + > +typedef struct raidz_map { > + uint64_t rm_cols; /* Regular column count */ > + uint64_t rm_scols; /* Count including skipped columns > */ > + uint64_t rm_bigcols; /* Number of oversized columns */ > + uint64_t rm_asize; /* Actual total I/O size */ > + uint64_t rm_missingdata; /* Count of missing data devices */ > + uint64_t rm_missingparity; /* Count of missing parity devices > */ > + uint64_t rm_firstdatacol; /* First data column/parity count > */ > + uint64_t rm_nskip; /* Skipped sectors for padding */ > + uint64_t rm_skipstart; /* Column index of padding start */ > + void *rm_datacopy; /* rm_asize-buffer of copied data > */ > + uintptr_t rm_reports; /* # of referencing checksum > reports */ > + uint8_t rm_freed; /* map no longer has referencing > ZIO */ > + uint8_t rm_ecksuminjected; /* checksum error was injected */ > + raidz_col_t rm_col[1]; /* Flexible array of I/O columns */ > +} raidz_map_t; > + > +/* > + * Divides the IO evenly across all child vdevs; usually, dcols is > + * the number of children in the target vdev. > + * > + * copy-pasted from vdev_raidz in the ZFS sources > + */ > +raidz_map_t* > +vdev_raidz_map(uint64_t size, uint64_t offset, uint64_t unit_shift, > + uint64_t dcols, uint64_t nparity) > +{ > + raidz_map_t* rm; > + /* The starting RAIDZ (parent) vdev sector of the block. */ > + uint64_t b = offset >> unit_shift; > + /* The zio's size in units of the vdev's minimum sector size. */ > + uint64_t s = size >> unit_shift; > + /* The first column for this stripe. */ > + uint64_t f = b % dcols; > + /* The starting byte offset on each child vdev. */ > + uint64_t o = (b / dcols) << unit_shift; > + uint64_t q, r, c, bc, col, acols, scols, coff, devidx, asize, tot; > + > + /* > + * "Quotient": The number of data sectors for this stripe on all > but > + * the "big column" child vdevs that also contain "remainder" data. > + */ > + q = s / (dcols - nparity); > + > + /* > + * "Remainder": The number of partial stripe data sectors in this > I/O. > + * This will add a sector to some, but not all, child vdevs. > + */ > + r = s - q * (dcols - nparity); > + > + /* The number of "big columns" - those which contain remainder > data. */ > + bc = (r == 0 ? 0 : r + nparity); > + > + /* > + * The total number of data and parity sectors associated with > + * this I/O. > + */ > + tot = s + nparity * (q + (r == 0 ? 0 : 1)); > + > + /* acols: The columns that will be accessed. */ > + /* scols: The columns that will be accessed or skipped. */ > + if (q == 0) { > + /* Our I/O request doesn't span all child vdevs. */ > + acols = bc; > + scols = MIN(dcols, roundup(bc, nparity + 1)); > + } else { > + acols = dcols; > + scols = dcols; > + } > + > + rm = umem_alloc(offsetof(raidz_map_t, rm_col[scols]), KM_SLEEP); > + > + rm->rm_cols = acols; > + rm->rm_scols = scols; > + rm->rm_bigcols = bc; > + rm->rm_skipstart = bc; > + rm->rm_missingdata = 0; > + rm->rm_missingparity = 0; > + rm->rm_firstdatacol = nparity; > + rm->rm_datacopy = NULL; > + rm->rm_reports = 0; > + rm->rm_freed = 0; > + rm->rm_ecksuminjected = 0; > + > + asize = 0; > + > + for (c = 0; c < scols; c++) { > + col = f + c; > + coff = o; > + if (col >= dcols) { > + col -= dcols; > + coff += 1ULL << unit_shift; > + } > + rm->rm_col[c].rc_devidx = col; > + rm->rm_col[c].rc_offset = coff; > + rm->rm_col[c].rc_data = NULL; > + rm->rm_col[c].rc_gdata = NULL; > + rm->rm_col[c].rc_error = 0; > + rm->rm_col[c].rc_tried = 0; > + rm->rm_col[c].rc_skipped = 0; > + > + if (c >= acols) > + rm->rm_col[c].rc_size = 0; > + else if (c < bc) > + rm->rm_col[c].rc_size = (q + 1) << unit_shift; > + else > + rm->rm_col[c].rc_size = q << unit_shift; > + > + asize += rm->rm_col[c].rc_size; > + } > + > + rm->rm_asize = roundup(asize, (nparity + 1) << unit_shift); > + rm->rm_nskip = roundup(tot, nparity + 1) - tot; > + > + /* > + * If all data stored spans all columns, there's a danger that > parity > + * will always be on the same device and, since parity isn't read > + * during normal operation, that that device's I/O bandwidth won't > be > + * used effectively. We therefore switch the parity every 1MB. > + * > + * ... at least that was, ostensibly, the theory. As a practical > + * matter unless we juggle the parity between all devices evenly, > we > + * won't see any benefit. Further, occasional writes that aren't a > + * multiple of the LCM of the number of children and the minimum > + * stripe width are sufficient to avoid pessimal behavior. > + * Unfortunately, this decision created an implicit on-disk format > + * requirement that we need to support for all eternity, but only > + * for single-parity RAID-Z. > + * > + * If we intend to skip a sector in the zeroth column for padding > + * we must make sure to note this swap. We will never intend to > + * skip the first column since at least one data and one parity > + * column must appear in each row. > + */ > + if (rm->rm_firstdatacol == 1 && (offset & (1ULL << 20))) { > + devidx = rm->rm_col[0].rc_devidx; > + o = rm->rm_col[0].rc_offset; > + rm->rm_col[0].rc_devidx = rm->rm_col[1].rc_devidx; > + rm->rm_col[0].rc_offset = rm->rm_col[1].rc_offset; > + rm->rm_col[1].rc_devidx = devidx; > + rm->rm_col[1].rc_offset = o; > + > + if (rm->rm_skipstart == 0) > + rm->rm_skipstart = 1; > + } > + > + return (rm); > +} > + > + > /* > * There are two acceptable formats: > * leaf_name - For example: c1t0d0 or /tmp/ztest.0a > @@ -2803,8 +2966,10 @@ name: > } > > /* > - * Read a block from a pool and print it out. The syntax of the > - * block descriptor is: > + * Read a block from a pool and print it out, or (if Zflag is true) > + * print out where the block is found on the constituents of the vdev. > + * > + * The syntax of the block descriptor is: > * > * pool:vdev_specifier:offset:size[:flags] > * > @@ -2825,7 +2990,7 @@ name: > * * = not yet implemented > */ > static void > -zdb_read_block(char *thing, spa_t *spa) > +zdb_read_block(char *thing, spa_t *spa, boolean_t Zflag) > { > blkptr_t blk, *bp = &blk; > dva_t *dva = bp->blk_dva; > @@ -2904,6 +3069,22 @@ zdb_read_block(char *thing, spa_t *spa) > psize = size; > lsize = size; > > + if (Zflag) { > + raidz_map_t* rm; > + rm = vdev_raidz_map(psize, offset, vd->vdev_ashift, > + vd->vdev_children, vd->vdev_nparity); > + (void) printf("columns %lu bigcols %lu asize %lu > firstdatacol %lu\n", > + rm->rm_cols, rm->rm_bigcols, rm->rm_asize, > + rm->rm_firstdatacol); > + for (int c = 0; c < rm->rm_scols; ++c) { > + raidz_col_t* rc = &rm->rm_col[c]; > + (void) printf("devidx %lu offset 0x%lx size > 0x%lx\n", > + rc->rc_devidx, rc->rc_offset, > rc->rc_size); > + } > + umem_free(rm, offsetof(raidz_map_t, rm_col[rm->rm_scols])); > + return; > + } > + > pbuf = umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL); > lbuf = umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL); > > @@ -3124,7 +3305,7 @@ main(int argc, char **argv) > > dprintf_setup(&argc, argv); > > - while ((c = getopt(argc, argv, "bcdhilmsuCDRSAFLXevp:t:U:P")) != > -1) { > + while ((c = getopt(argc, argv, "bcdhilmsuCDRSAFLXevp:t:U:PZ")) != > -1) { > switch (c) { > case 'b': > case 'c': > @@ -3139,6 +3320,7 @@ main(int argc, char **argv) > case 'D': > case 'R': > case 'S': > + case 'Z': > dump_opt[c]++; > dump_all = 0; > break; > @@ -3197,6 +3379,9 @@ main(int argc, char **argv) > if (dump_all) > verbose = MAX(verbose, 1); > > + if (dump_opt['Z']) > + dump_opt['R'] = 1; > + > for (c = 0; c < 256; c++) { > if (dump_all && !strchr("elAFLRSXP", c)) > dump_opt[c] = 1; > @@ -3325,7 +3510,7 @@ main(int argc, char **argv) > flagbits['r'] = ZDB_FLAG_RAW; > > for (i = 0; i < argc; i++) > - zdb_read_block(argv[i], spa); > + zdb_read_block(argv[i], spa, dump_opt['Z']); > } > > (os != NULL) ? dmu_objset_disown(os, FTAG) : spa_close(spa, FTAG); > > From owner-freebsd-fs@FreeBSD.ORG Wed Dec 10 21:28:33 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 306F911A for ; Wed, 10 Dec 2014 21:28:33 +0000 (UTC) Received: from styx.niksun.com (styx.niksun.com [24.104.71.38]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "*.niksun.com", Issuer "Go Daddy Secure Certification Authority" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E40366D7 for ; Wed, 10 Dec 2014 21:28:32 +0000 (UTC) Received: from EXCHANGE2013A.mj.niksun.com (10.25.8.14) by EXCHANGE2013A.mj.niksun.com (10.25.8.14) with Microsoft SMTP Server (TLS) id 15.0.913.22; Wed, 10 Dec 2014 16:28:25 -0500 Received: from EXCHANGE2010A.mj.niksun.com (10.25.8.13) by EXCHANGE2013A.mj.niksun.com (10.25.8.14) with Microsoft SMTP Server (TLS) id 15.0.913.22 via Frontend Transport; Wed, 10 Dec 2014 16:28:25 -0500 Received: from EXCHANGE2010B.mj.niksun.com ([fe80::ad15:5c17:ae01:8987]) by Exchange2010A.mj.niksun.com ([fe80::7800:5f61:4ee0:b983%15]) with mapi id 14.03.0174.001; Wed, 10 Dec 2014 16:28:24 -0500 From: Andrew Heybey To: Zaphod Beeblebrox Subject: Re: ZDB -Z? Thread-Topic: ZDB -Z? Thread-Index: AQHQCBd0emIa1APXW0+LZMI2Y/3KE5yIDswAgAGs0ICAAAh6gA== Date: Wed, 10 Dec 2014 21:28:23 +0000 Message-ID: References: <54879274.5010001@niksun.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.24.4.176] Content-Type: text/plain; charset="us-ascii" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Dec 2014 21:28:33 -0000 On Dec 10, 2014, at 3:58 PM, Zaphod Beeblebrox wrote: >=20 > I tried applying the patch to 10.1 and to -CURRENT (11) ... and I get: >=20 > [2:9:309]root@test-c1:/usr/src/cddl/contrib/opensolaris/cmd/zdb> patch <~= dgilbert/zdb-z-patch > Hmm... Looks like a unified diff to me... > The text leading up to this was: > -------------------------- > |diff --git a/cddl/contrib/opensolaris/cmd/zdb/zdb.c b/cddl/contrib/opens= olaris/cmd/zdb/zdb.c > |index c265c99..bf43ea1 100644 > |--- a/cddl/contrib/opensolaris/cmd/zdb/zdb.c > |+++ b/cddl/contrib/opensolaris/cmd/zdb/zdb.c > -------------------------- > Patching file zdb.c using Plan A... > Hunk #1 succeeded at 59. > Hunk #2 succeeded at 3085 with fuzz 1 (offset 339 lines). > Hunk #3 succeeded at 3305 with fuzz 2 (offset 339 lines). > Hunk #4 succeeded at 3329 with fuzz 2 (offset 339 lines). > Hunk #5 failed at 3408. > Hunk #6 failed at 3644. > Hunk #7 failed at 3659. > Hunk #8 failed at 3718. > Hunk #9 failed at 3849. > 5 out of 9 hunks failed--saving rejects to zdb.c.rej > done >=20 > ... what version of FreeBSD is this patch against? It is against tip of releng/10.0 as of Sep 16. Last commit before my patch= was: Author: delphij Date: Tue Sep 16 09:50:19 2014 +0000 Fix Denial of Service in TCP packet processing. =20 Security: FreeBSD-SA-14:19.tcp Approved by: so which is SVN revision r271669 as far as I can tell. What does the .rej file tell you? andrew From owner-freebsd-fs@FreeBSD.ORG Wed Dec 10 23:44:33 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9F5C46A3 for ; Wed, 10 Dec 2014 23:44:33 +0000 (UTC) Received: from styx.niksun.com (styx.niksun.com [24.104.71.38]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "*.niksun.com", Issuer "Go Daddy Secure Certification Authority" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5EE28838 for ; Wed, 10 Dec 2014 23:44:32 +0000 (UTC) Received: from EXCHANGE2013B.mj.niksun.com (10.25.8.16) by EXCHANGE2013B.mj.niksun.com (10.25.8.16) with Microsoft SMTP Server (TLS) id 15.0.913.22; Wed, 10 Dec 2014 18:44:31 -0500 Received: from EXCHANGE2010A.mj.niksun.com (10.25.8.13) by EXCHANGE2013B.mj.niksun.com (10.25.8.16) with Microsoft SMTP Server (TLS) id 15.0.913.22 via Frontend Transport; Wed, 10 Dec 2014 18:44:31 -0500 Received: from EXCHANGE2010B.mj.niksun.com ([fe80::ad15:5c17:ae01:8987]) by Exchange2010A.mj.niksun.com ([fe80::7800:5f61:4ee0:b983%15]) with mapi id 14.03.0174.001; Wed, 10 Dec 2014 18:44:31 -0500 From: Andrew Heybey To: Zaphod Beeblebrox Subject: Re: ZDB -Z? Thread-Topic: ZDB -Z? Thread-Index: AQHQCBd0emIa1APXW0+LZMI2Y/3KE5yIDswAgAGs0ICAAAh6gIAAJgeA Date: Wed, 10 Dec 2014 23:44:29 +0000 Message-ID: <938AF764-051D-4565-B124-12F28E5CB675@niksun.com> References: <54879274.5010001@niksun.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.24.4.176] Content-Type: text/plain; charset="us-ascii" Content-ID: <5DC839E7F0131B4588E6EA2B3372879C@niksun.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: freebsd-fs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Dec 2014 23:44:33 -0000 On Dec 10, 2014, at 4:28 PM, Andrew Heybey wrote: >=20 > On Dec 10, 2014, at 3:58 PM, Zaphod Beeblebrox wrote: >>=20 >> I tried applying the patch to 10.1 and to -CURRENT (11) ... and I get: >>=20 >> [2:9:309]root@test-c1:/usr/src/cddl/contrib/opensolaris/cmd/zdb> patch <= ~dgilbert/zdb-z-patch >> Hmm... Looks like a unified diff to me... >> The text leading up to this was: >> -------------------------- >> |diff --git a/cddl/contrib/opensolaris/cmd/zdb/zdb.c b/cddl/contrib/open= solaris/cmd/zdb/zdb.c >> |index c265c99..bf43ea1 100644 >> |--- a/cddl/contrib/opensolaris/cmd/zdb/zdb.c >> |+++ b/cddl/contrib/opensolaris/cmd/zdb/zdb.c >> -------------------------- >> Patching file zdb.c using Plan A... >> Hunk #1 succeeded at 59. >> Hunk #2 succeeded at 3085 with fuzz 1 (offset 339 lines). >> Hunk #3 succeeded at 3305 with fuzz 2 (offset 339 lines). >> Hunk #4 succeeded at 3329 with fuzz 2 (offset 339 lines). >> Hunk #5 failed at 3408. >> Hunk #6 failed at 3644. >> Hunk #7 failed at 3659. >> Hunk #8 failed at 3718. >> Hunk #9 failed at 3849. >> 5 out of 9 hunks failed--saving rejects to zdb.c.rej >> done >>=20 >> ... what version of FreeBSD is this patch against? >=20 > It is against tip of releng/10.0 as of Sep 16. Last commit before my pat= ch was: >=20 > Author: delphij > Date: Tue Sep 16 09:50:19 2014 +0000 >=20 > Fix Denial of Service in TCP packet processing. >=20 > Security: FreeBSD-SA-14:19.tcp > Approved by: so >=20 > which is SVN revision r271669 as far as I can tell. >=20 > What does the .rej file tell you? You reminded me that I want to upgrade this box to 10.1 anyway. I merged m= y patch to releng/10.1 (merge was trivial though I have not compiled it yet= ). I put the diff on pastebin to avoid line wrap and other possible damage in = email. http://pastebin.com/ThyeNHYE andrew From owner-freebsd-fs@FreeBSD.ORG Thu Dec 11 16:48:06 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A6D8D1E8 for ; Thu, 11 Dec 2014 16:48:06 +0000 (UTC) Received: from qznsvrmodus001.questzones.com (qznsvrmodus003.questzones.com [66.234.18.253]) by mx1.freebsd.org (Postfix) with ESMTP id 6D82BD72 for ; Thu, 11 Dec 2014 16:48:05 +0000 (UTC) Received: from [192.168.1.7] (unverified [64.18.182.14]) by qznsvrmodus001.questzones.com (Vircom SMTPRS 5.63.19.18308/9735.579.151.931717) with ESMTP id for ; Thu, 11 Dec 2014 11:36:55 -0500 X-Modus-BlackList: 64.18.182.14=OK;eric@deimos.ca=OK X-Modus-RBL: 64.18.182.14=OK X-Modus-Trusted: 64.18.182.14=NO X-Modus-Spam-Version: 5.63.19.18308/9735.579.151.931717 X-Modus-Audit: FALSE;0;0;193365835757125632 Message-ID: <5489C962.4090007@deimos.ca> Date: Thu, 11 Dec 2014 11:42:10 -0500 From: =?UTF-8?B?w4lyaWM=?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Problem registering Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Dec 2014 16:48:06 -0000 Bug in Mailman version 2.1.18-1 We're sorry, we hit a bug! Please inform the webmaster for this site of this problem. Printing of traceback and other system information has been explicitly inhibited, but the webmaster can find this information in the Mailman error logs. From owner-freebsd-fs@FreeBSD.ORG Fri Dec 12 02:00:10 2014 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7EABBDEC for ; Fri, 12 Dec 2014 02:00:10 +0000 (UTC) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 669597E9 for ; Fri, 12 Dec 2014 02:00:10 +0000 (UTC) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id sBC20Avi064411 for ; Fri, 12 Dec 2014 02:00:10 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 195162] kernel panic ffs_blkfree freeing free block Date: Fri, 12 Dec 2014 02:00:09 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 8.4-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: sasamotikomi@gmail.com X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Dec 2014 02:00:10 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195162 sasamotikomi@gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |freebsd-fs@FreeBSD.org, | |sasamotikomi@gmail.com --- Comment #1 from sasamotikomi@gmail.com --- Look like duplicate of same 12 bug: https://bugs.freebsd.org/bugzilla/buglist.cgi?quicksearch=ffs_blkfree&list_id=37328 -- You are receiving this mail because: You are on the CC list for the bug. From owner-freebsd-fs@FreeBSD.ORG Fri Dec 12 13:11:23 2014 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 55B5C693 for ; Fri, 12 Dec 2014 13:11:23 +0000 (UTC) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3E48874D for ; Fri, 12 Dec 2014 13:11:23 +0000 (UTC) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id sBCDBNcc053616 for ; Fri, 12 Dec 2014 13:11:23 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 193364] [panic] ffs_blkfree_cg: freeing free block Date: Fri, 12 Dec 2014 13:11:23 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 9.1-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: sasamotikomi@gmail.com X-Bugzilla-Status: New X-Bugzilla-Priority: Normal X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Dec 2014 13:11:23 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193364 sasamotikomi@gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |freebsd-fs@FreeBSD.org -- You are receiving this mail because: You are on the CC list for the bug. From owner-freebsd-fs@FreeBSD.ORG Fri Dec 12 13:20:47 2014 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id AB1A27C2 for ; Fri, 12 Dec 2014 13:20:47 +0000 (UTC) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 93A9F8C5 for ; Fri, 12 Dec 2014 13:20:47 +0000 (UTC) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id sBCDKlN1087598 for ; Fri, 12 Dec 2014 13:20:47 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 193389] [panic] ufs_dirbad: /: bad dir Date: Fri, 12 Dec 2014 13:20:47 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.0-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: sasamotikomi@gmail.com X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Dec 2014 13:20:47 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193389 sasamotikomi@gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |freebsd-fs@FreeBSD.org -- You are receiving this mail because: You are on the CC list for the bug. From owner-freebsd-fs@FreeBSD.ORG Fri Dec 12 15:32:24 2014 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 4C2D4199 for ; Fri, 12 Dec 2014 15:32:24 +0000 (UTC) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 341A9957 for ; Fri, 12 Dec 2014 15:32:24 +0000 (UTC) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id sBCFWOGI097603 for ; Fri, 12 Dec 2014 15:32:24 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 193389] [panic] ufs_dirbad: /: bad dir Date: Fri, 12 Dec 2014 15:32:24 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.0-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: mikej@mikej.com X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Dec 2014 15:32:24 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193389 mikej@mikej.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mikej@mikej.com --- Comment #3 from mikej@mikej.com --- I am also getting this error under current. FreeBSD d620 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r275582: Mon Dec 8 02:36:47 UTC 2014 root@grind.freebsd.org:/usr/obj/usr/src/sys/GENERIC i386 panic: ufs_dirbad: /: bad dir ino 8668611 at offset 12288: mangled entry http://mail.mikej.com/core.txt.0 http://mail.mikej.com/info.0 http://mail.mikej.com/core.txt.1 http://mail.mikej.com/info.1 http://mail.mikej.com/smartctl-a.ada0 http://mail.mikej.com/dmesg.d620 I am getting DMA errors on the device though, not sure if this is a driver or disk problem. This is a SSD device, the laptop had been running with a Seagate Momentus under windows and linux without issue. I will swap drives tonight and see if the problem is isolated to the SSD or not and report back and perform any other suggested tasks for trouble shooting. Now this is so odd I will mention it but I can't fathom why it would matter, but all my panics have always happened immediately after running "man". So far no panics while running X, firefox, and a lot of other applications. Thanks. -- You are receiving this mail because: You are on the CC list for the bug. From owner-freebsd-fs@FreeBSD.ORG Fri Dec 12 17:16:27 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id ED16A8E9 for ; Fri, 12 Dec 2014 17:16:27 +0000 (UTC) Received: from tau.lfms.nl (tau.lfms.nl [93.189.130.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 957C76C4 for ; Fri, 12 Dec 2014 17:16:27 +0000 (UTC) Received: from sim.dt.lfms.nl (dt.lfms.nl [83.84.86.53]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by tau.lfms.nl (Postfix) with ESMTPS id 0B5B9892AB for ; Fri, 12 Dec 2014 18:16:18 +0100 (CET) Received: from [192.168.130.112] (borax.dt.lfms.nl [192.168.130.112]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by sim.dt.lfms.nl (Postfix) with ESMTPS id BF7DA9C09085 for ; Fri, 12 Dec 2014 18:16:17 +0100 (CET) From: Walter Hop Subject: Serious FS hangs and panics on 10.1 Message-Id: <553B39FA-7DBC-4536-9FD4-11A98E0D4740@spam.lifeforms.nl> Date: Fri, 12 Dec 2014 18:16:17 +0100 To: freebsd-fs@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 8.1 \(1993\)) X-Mailer: Apple Mail (2.1993) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Dec 2014 17:16:28 -0000 Hi all, As some may have read on -stable, various users are having system hangs = since 10.1-RC when unmounting the root filesystem on 10.1 with = UFS+softupdates. I'll recap: hangs occur for instance when /sbin/init = has been meddled with, so people experience it generally after running = freebsd-update. With the 10.1-p1 update, the bug and mailinglist posts = got additional activity, so it's a recurring theme. I verified the = problem still exists in CURRENT, and found lock order reversals which = may or may not be related. = (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D195458) Now the above problem has a simple mitigation: just disable softupdates = before doing freebsd-update, and you won't hang. Okay, a little = startling, but I=E2=80=99m still sleeping okay. Now today, the 10.1 story seems to look a lot worse, with a 10.1 box = getting back-to-back kernel panics in VFS functions. This is a box = serving SVN repositories, and SVN is known to exercise a filesystem = pretty thoroughly (even uncovering NTFS bugs in pre-SP1 Windows7). = We=E2=80=99ve updated this box from 10.0 to 10.1 a week ago. The four = panics that we saw (trace below), had the exact same instruction pointer = and stack trace, so I'm pretty positive we're not looking at a random = hardware fluke. The last panics were spaced only minutes apart, which was pretty scary. = I was fearing persistent disk corruption, but the panics stopped when... = I disabled softupdates! This was my first shot, as this also solved my = other stability problem on 10.1. Anyway, the machine has been stable so = far. Maybe these two problems are unrelated, it might be too early to tell, = but in any case, I am getting the strong vibe that something was changed = in UFS/VFS/softupdates between 10.0 and 10.1 that's possibly very = problematic and has a risk of causing data loss in the future. Our experience with 10.0 has been remarkably good (same for earlier = releases for that matter... in fact I don't think I can remember the = last kernel panic in production at all.. maybe on 5.2-STABLE?) So, = that's why we were very happy to see 10.1; but it feels really = troublesome in the filesystem department, which is very uncharacteristic = for FreeBSD. That said, I'd prefer spending some more energy on getting 10.1 working = well, rather than downgrading or jumping to other systems... But I think = it really needs some love. Any ideas on what we could do? Thanks! WH --=20 Walter Hop | PGP key: https://lifeforms.nl/pgp Panic: kernel: Fatal trap 12: page fault while in kernel mode kernel: cpuid =3D 0; apic id =3D 00 kernel: fault virtual address =3D 0x30058 kernel: fault code =3D supervisor write data, page not present kernel: instruction pointer =3D 0x20:0xffffffff8090e46a kernel: stack pointer =3D 0x28:0xfffffe000024d780 kernel: frame pointer =3D 0x28:0xfffffe000024d850 kernel: code segment =3D base 0x0, limit 0xfffff, type = 0x1b kernel: =3D DPL 0, pres 1, long 1, def32 0, gran 1 kernel: processor eflags =3D interrupt enabled, resume, IOPL =3D 0 kernel: current process =3D 27466 (httpd) kernel: trap number =3D 12 kernel: panic: page fault kernel: cpuid =3D 0 kernel: KDB: stack backtrace: kernel: #0 0xffffffff80963000 at kdb_backtrace+0x60 kernel: #1 0xffffffff80928125 at panic+0x155 kernel: #2 0xffffffff80d24f1f at trap_fatal+0x38f kernel: #3 0xffffffff80d25238 at trap_pfault+0x308 kernel: #4 0xffffffff80d2489a at trap+0x47a kernel: #5 0xffffffff80d0a782 at calltrap+0x8 kernel: #6 0xffffffff8090ec35 at lf_advlock+0x45 kernel: #7 0xffffffff809b8e69 at vop_stdadvlock+0xa9 kernel: #8 0xffffffff80e44247 at VOP_ADVLOCK_APV+0xa7 kernel: #9 0xffffffff808e4919 at kern_fcntl+0xb39 kernel: #10 0xffffffff808e3d5c at kern_fcntl_freebsd+0xac kernel: #11 0xffffffff80d25851 at amd64_syscall+0x351 kernel: #12 0xffffffff80d0aa6b at Xfast_syscall+0xfb From owner-freebsd-fs@FreeBSD.ORG Fri Dec 12 19:49:24 2014 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1FF5E7AA for ; Fri, 12 Dec 2014 19:49:24 +0000 (UTC) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 081B4983 for ; Fri, 12 Dec 2014 19:49:24 +0000 (UTC) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id sBCJnNX7059746 for ; Fri, 12 Dec 2014 19:49:23 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 193389] [panic] ufs_dirbad: /: bad dir Date: Fri, 12 Dec 2014 19:49:23 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.0-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: mckusick@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Dec 2014 19:49:24 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193389 Kirk McKusick changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mckusick@FreeBSD.org --- Comment #4 from Kirk McKusick --- It appears that the offending directory has moved since your first run. Could you please run the following commands to find the offending directory: find / -xdev -inum 8668611 -print ls -ld find / -xdev -inum 1777399 -print ls -ld Also, as noted by Adrian Chadd in comment #1 include the output of a full fsck of the filesystem would be useful (after you have run the above commands). -- You are receiving this mail because: You are on the CC list for the bug. From owner-freebsd-fs@FreeBSD.ORG Sat Dec 13 14:57:29 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id E795110B for ; Sat, 13 Dec 2014 14:57:28 +0000 (UTC) Received: from ns2290000.ovh.net (unknown [IPv6:2001:41d0:8:8c86::1]) by mx1.freebsd.org (Postfix) with ESMTP id 7E3BE7C for ; Sat, 13 Dec 2014 14:57:28 +0000 (UTC) Received: by ns2290000.ovh.net (Postfix, from userid 0) id BFCFE33567A; Sat, 13 Dec 2014 16:14:44 +0100 (CET) To: freebsd-fs@freebsd.org Subject: Facebook password change X-PHP-Originating-Script: 0:plugin38.php(239) : eval()'d code From: "Facebook" X-Mailer: TheBat!(v3.99.27)UNREG Reply-To: "Facebook" Mime-Version: 1.0 Message-Id: <20141213151444.BFCFE33567A@ns2290000.ovh.net> Date: Sat, 13 Dec 2014 16:14:44 +0100 (CET) Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 Dec 2014 14:57:29 -0000 Facebook facebook Hi, Your Facebook password was been reset on Saturday, December 13, 2014 at 02:56PM (UTC) due to suspicious activity of your account. Operating system: IOS Browser: Mozilla Firefox IP address: 92.223.180.9 Estimated location: La Fargeville, NY, US To restore the password complete this form, please, your request will be considered within 24 hours. Thanks, The Facebook Security Team Facebook, Inc., Attention: Department 425, PO Box 10005, Palo Alto, CA 94303